motif discovery
DESCRIPTION
Tutorial 5. Motif discovery. Agenda. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM in motif DBs. Cool story of the day: How NOT to be a bioinformatician. Motif – definition. Motif - PowerPoint PPT PresentationTRANSCRIPT
Motif discovery
Tutorial 5
Motif discovery•MEME
Creates motif PSSM de-novo (unknown motif)•MAST
Searches for a PSSM in a DB•TOMTOM
Searches for a PSSM in motif DBs
Agenda
Cool story of the day: How NOT to be a bioinformatician
Motif – definition
Motifa widespread pattern with a biological significance.
Sequence motif
PTB (RNA binding protein)
UCUU
CAP (DNA binding protein)
TGTGAXXXXXXTCACAXT
Sequence motif – definition
1 2 3 4 5 6 7 8 9 10
A 0 0 0 0 0 3/6 1/6 2/6 0 0
D 0 3/6 2/6 0 0 1/6 5/6 1/6 0 1/6
E 0 0 4/6 1 0 0 0 0 1 5/6
G 0 1/6 0 0 1 1/3 0 0 0 0
H 0 1/6 0 0 0 0 0 0 0 0
N 0 1/6 0 0 0 0 0 0 0 0
Y 1 0 0 0 0 0 3/6 3/6 0 0
..YDEEGGDAEE....YDEEGGDAEE....YGEEGADYED....YDEEGADYEE....YNDEGDDYEE....YHDEGAADEE..
Motifa nucleotide or amino-acid sequence pattern that is widespread
and has a biological significance
PSSM - position-specific scoring matrix
Can we find motifs using multiple sequence alignment (MSA)?
YES! NO
Local multiple sequence alignment is a hard problem to solve
Motif search: from de-novo motifs to motif annotation
gapped motifs
Large DNA data
http://meme.sdsc.edu/
MEME
MEME – Multiple EM* for Motif finding
• Motif discovery from unaligned sequences - genomic or protein sequences
• Flexible model of motif presence (Motif can be absent in some sequences or appear several times in one sequence)
*Expectation-maximization
http://meme.sdsc.edu/
MEME - Input
Input file (fasta file)
How many times in each
sequence?
How many motifs?
How many
sites?
Range of motif lengths
MEME - Output
Motif e-value
MEME – Sequence logo
Motif length
Number of appearnces
Motif e-value
A graphical representation of the sequence motif
MEME – Sequence logoHigh information content = High confidence
The relative sizes of the letters indicates their frequency in the sequences The total height of the letters depicts the information content of the position, in bits of information.
Multilevel Consensus
MEME – Sequence logo
Patterns can be presented as regular expressions
[AG]-x-V-x(2)-{YW}
[] - Either residuex - Any residuex(2) - Any residue in the next 2 positions{} - Any residue except these
Examples: AYVACM, GGVGAA
Sequence names
Position in sequence
Strength of match
Motif within sequence
MEME – motif alignment
Overall strength of motif matches
Motif location in the input sequence
MEME – motif locationsSequence names
What can we do with motifs?
• MAST - Search for them in non annotated sequence databases (protein and DNA).
• TOMTOM - Find the protein which binds the DNA motifs.
MAST
MAST
• Searches for motifs (one or more) in sequence databases:– Like BLAST but motifs for input– Similar to iterations of PSI-BLAST
• Profile defines strength of match– Multiple motif matches per sequence
• MEME uses MAST to summarize results: – Each MEME result is accompanied by the MAST result for
searching the discovered motifs on the given sequences.
http://meme.sdsc.edu/meme4_4_0/cgi-bin/mast.cgi
MAST - Input
Input file (motifs)
Database
If you wish to use motifs discovered by MEME
MAST - OutputInput motifs
Presence of the motifs in a given database
MAST – Output (another example, global view)
MAST – Output (another example, global view)
TOMTOM
TOMTOM
• Searches one or more query DNA motifs against one or more databases of target motifs, and reports for each query a list of target motifs, ranked by p-value.
• The output contains results for each query, in the order that the queries appear in the input file.
http://meme.sdsc.edu/meme/doc/tomtom.html
TOMTOM - Input
Input motif
Background frequencies
Database
TOMTOM - OutputInput motif
Matching motifs
TOMTOM – OutputWrong input (RNA sequence of RNA binding protein NOVA1)
“OK” results
MAST vs. TOMTOM
MAST TOMTOMComparison Profile against DB Profile against
ProfileDB General DBs Known motif DBs
Cool Story of the day
How NOT to be a bioinformatician