position-dependent motif characterization using non-negative matrix factorization (nmf)
DESCRIPTION
Position-dependent motif characterization using Non-negative matrix Factorization (NMF). In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts. Joel H Graber Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/1.jpg)
Position-dependent motif characterization using Non-negative
matrix Factorization (NMF)Joel H Graber
Lucie N. Hutchins, Erik McCarthy, Sean Murphy, Priyam Singh
The Jackson Laboratory
In collaboration with: Thomas Blumenthal, University of Colorado David Kulp, University of Massachusetts
Funding Sources Current: NIH GM 072706, NIH HD037102
Previous: NIH RR 16463 (INBRE-Maine) NSF 2010 Project DBI 0331497
![Page 2: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/2.jpg)
Motifs are often constrained in positioningAUGCACAUAGAGGCAAUUGUGUAUCAAUAUUAAAAAUAAAGUAAAACUUA AAGCAUGUGUAGACCGUGUGAUGAAUCCUUGUAUAAGCAACUGCCAAUGAAAUCGGGCUCGCUGUGGUCA UCCGUGAGUGCUUAUCAUUCUGGUAAUACCGUGGUCUAUUUAUACAAAUAUUAAAAGUGCUGUUUAUAGA GCCUGUGUCAUGUGGCAACUUCCUGUGUCAUGACCUCAGGAAAUAAAUUUCCUUGACUUUAUAAAAGCCA AAACGUUUGCCCUCUUCCUUGGAAUUUGAAAUUACUCCAAUUUAAAAUAAAUUACUGGACUGUGGAAAUA ACAUGUAGAAUUGCAGUUUUACACUGUAACAGUUGCUUCUGCCUACCUUAUAAAUAAAGAAUCACUAAGA AAAAGAGUUCUCAGGUCUCCCUGAGCUCAGACUGAGGGGAAACGGAGGCAAAUAAAGCUGAGUUUUGAGA ACUCGGUGGCCUGUGUUCCUAGCCUGUACUCACCCCUUCCCUUAAUAAUAAUAAAACAACAACUUUGUGA AUUUGAGUUUUCCUUAGAGCUCAACAGAUCAUAUUCAGUGUCUUGAAUAAAUUGCUCUAUUUUGAUAUUA GAGAACAUAGUGACUGUGUUUGGUACGAUUAUUUUUUUUAACUAAAAUGAGAUAAAAUUCUAUAUUCUUA UGUGUGUGUGGUUUUUGAUGGGUGAAACUGUCUCAAUUUGAAUAAAUAUUUUUAUUGCAAUUCUGAACCA AUUUUAAAAGAAAAGAUACAAAUGUCCUUCCAAAUAGAGCCUUUUUAUUAAUAAAGGGCCUUGUACUUCA CUUGGAACAAAGGACGUUUCAUUUCAUUGUGUUAAAUGUAUACUUGUAAAUAAAAUAGCUGCAAACCUUA AGCCUUUGAGCUACUUGGUGUAUCUCACUCGGUAUUACGUGCUCUGCAAUAGAAGUUGGUGUGAACAUUC CCAGGUGACAUGCAGUGUUACCACCACCCCUCCAUCAGUAAGCCACUAAUAAAGUGCAUCUAUGCAGCCA CAGGUCUGUCUGCCUCUUUUGGCUGGGCACCUUAAAAGAGAAGUCAAUAAACUGGGCUACACAGUACUUA AAACGCUGAACUGGCUAAGAUGUGUAUUUAUGAAUAUUAAUGAAUAAAAACUGCUUGGAUGGUUUACCUA ACUACUGCAUGAGGUUUUUUUCCUUUCUUUUCUCUCCACUCAAUAAAUACUUUAAAGCACAUUUGGAAUA AAGGAAGAGACUUUUAAGUGGUGCUUAAUGAUAAGGUUUUGACUUGUUAAAUUAAACCAUUUGGAAUAUA UUGUGUGUUUGUAGUAGUCAGUGCCUUUGUUUGUAAACCAAAAAGUAAUAAAUGAAUCCCUAUAUUUCUA UUAUAGCAUCUAUUGUAUUUAAUAUAGUAUUUUAUUUAAGAAAAUAAACUUUGCAGUUUUUGCAUUGUGA AUUCUCUCUCUUCCCGCCCACUGCCAUGAAAAAUGUUGUUUAUGGAAUAAAAAAAAUGUAACUGCCUUUA AAUUUCCUGGUGGCUGUGUU
Functional site
N position counts
Msequence
words
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
PWCMatrix
![Page 3: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/3.jpg)
NMF decomposes the PWC matrix into characteristic patterns (motifs)
€
V =W ⋅HCounts (M x N) Bases (M x r) Weights (r x N)
Wik = weight of ith word in the kth motif
Hkj = abundance of kth motif at the jth position
(content)
(positioning)
r = number of basis functions (patterns)
![Page 4: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/4.jpg)
Synthetic data verifies NMF performance
![Page 5: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/5.jpg)
RSS provides a robust estimate for the optimal number of vectors (r)
0
10
20
30
40
50
60
3 4 5 6 7 8 9 10 11 12 13
Basis Vector Count (r)
Residue (Test Matrixes)
Test matrix 1
Test matrix 2
120
140
160
180
200
220
240
3 4 5 6 7 8 9 10 11 12 13
Basis Vector Count (r)
Residue (Test Sequences)
0
500
1000
1500
2000
2500
3000
3500
Residue (Human PolyA Sites)Artificial sequences
Human polyA sequences
€
RSS =
Vij − WH( )ij( )2
∑
![Page 6: Position-dependent motif characterization using Non-negative matrix Factorization (NMF)](https://reader033.vdocuments.site/reader033/viewer/2022051116/568157ed550346895dc56374/html5/thumbnails/6.jpg)
NMF can characterize complex control sequences
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Mouse 3’-processing sequencesHuman transcription start sites