introduction to bioinformatics - tutorial no. 5 meme – discovering motifs in sequences mast –...

23
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription Factor DB

Post on 21-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

Introduction to Bioinformatics - Tutorial no. 5

MEME – Discovering motifs in sequences

MAST – Searching for motifs in databanks

TRANSFAC – The Transcription Factor DB

Page 2: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

http://weblogo.berkeley.edu

WebLogo - InputAligned

Sequences(e.g. output of

ClulatlW)

RUN !

Page 3: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

Genes:

WebLogo - Output

Proteins:

Page 4: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MEME

http://meme.sdsc.edu/ Motif discovery from unaligned sequences

Genomic or protein sequences Identifies profile motifs

Multiple motifs for any input Flexible model of motif presence

Motif can be absent in some sequences Can appear several times in one sequence

Page 5: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MEME InputEmail address Multiple input sequences

How many times in each sequence?

How many motifs?

How many sites?

Range of motif lengths

Page 6: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MEME Output (1)

Motif length

Number of times

Like BLAST

“Position-Specific Probability Matrix”

= Motif Profile

Diversion of motif position

from background

Most popular symbols

Page 7: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MEME Output (2)

Sequence names

Reverse complement (genomic input only)

Position in sequence

Strength of match

Motif within sequence

Page 8: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MEME Output (3)

Overall strength of motif matches

Original sequence lengths

Motif instance

Page 9: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MAST Searches for motifs (one or more) in

sequence databases: Like BLAST but motifs for input Similar to iterations of PSI-BLAST

Profile defines strength of match Multiple motif matches per sequence Combined E value for all motifs

MEME uses MAST to summarize results: Each MEME result is accompanied by the MAST

result for searching the discovered motifs on the given sequences.

Page 10: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MAST InputEmail address

Database (like BLAST)

Motif file (e.g. MEME output)

Consider matched sequence length

E value threshold

Page 11: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MAST Output (1)

Matched accession

Match E value

Length of sequence

Link to GenBank

Page 12: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MAST Output (2)Motif

diagram

Page 13: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

MAST Output (3)

Position of each instance

P value of instance

Matched parts of

sequence

Motif ‘consensus’

Motif and orientation

Page 14: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFACDatabase of eukaryotic DNA transcription regulation: Individual regulatory sites (SITES table)

Genes to which they belong Proteins which bind them

Proteins which bind sites (FACTORS table) Cellular source of protein Nucleotide motif profile for binding Some grouping and classification

Classification of factors (CLASS table) Position-specific matrices for select factors

(MATRIX table) Cell localization (CELL table)

Page 15: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

Searching TRANSFAC www.gene-regulation.com Search a single table

By identifier, factor name, gene name By species, author

Browse your way from table to table Search within a sequence

MatInspector, TFScan (EMBOSS package)

Page 16: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC FactorDT Date; authorFA Factor nameGE Encoding geneSF Structural featuresCP Cell specificity (positive)CN Cell specificity (negative)EX Expression patternFF Functional featuresIN Interacting factors MX MatrixBS Binding SITE DR External databases

References: RN Reference no.RX MEDLINE IDRA Reference authorsRT Reference titleRL Reference data

Page 17: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC MatrixAccession

Position Specific Matrix

Statistical basis

Concensus (IUPAC subset

symbols)

Page 18: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC Site (1)

Accession number

DNA or

RNA

Gene

Gene region

Sequence of regulatory element

Position range of factor

binding site

Page 19: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC Site (2)

Binding factor

accession

Factor name

Binding ‘quality’1 functionally confirmed

2 binding of pure protein

3immunologically

characterized extract

4via known binding

sequence

5extract protein binding to

bona fide element

6 unassigned

Organism

Cellular source

Methods of identifying site

External links

Page 20: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC Factor (1)

AC: Accession number

FA: Factor name

SX: Other names

OS: OrganismOC: Taxonomy

HO: Homologs

CL: Classification

SZ: SizeSX: Amino

acid sequence

Page 21: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

TRANSFAC Factor (2)

Protein sequence reference

Features and positions

Structural featuresCell specificity

Page 22: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

Question

A biologist at your university has found 15 target genes that she thinks are co-regulated. She gives you 15 upstream regions of length 50 base pairs in FASTA format, file DNASample50.txt, and asks you to identify the motif, and - if possible - the potential regulating protein. She tells you the sequences are from Homo sapiens, and by intuition feels the motifs of length 8. She wants you to suggest only the best possible candidate motif.

Page 23: Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription

QuestionAfter you ran all the programs your biologist friend confesses that she is not sure if her intuition about the motif length was correct. Re-run the tool without knowledge of motif length. Do you get the same results?

Determine a potential DNA binding protein using TRANSFAC