transfer string kernel for cross-context sequence specific ...specific dna-protein binding...
TRANSCRIPT
![Page 1: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/1.jpg)
Transfer String Kernel for Cross-Context Sequence Specific DNA-Protein Binding Prediction
by Ritambhara Singh
IIIT-Delhi June 10, 2016
1
![Page 2: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/2.jpg)
Biology in a Slide
2
DNA RNA PROTEIN CELL
ORGANISM
![Page 3: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/3.jpg)
DNA and Diseases
3
DNA RNA PROTEIN CELL
ORGANISM
• Down Syndrome • Parkinson’s Disease • Autism • Muscular Atrophy • Sickle Cell Disease
………. ………..
![Page 4: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/4.jpg)
Transcription Factors
4
DNA RNA PROTEIN CELL
ORGANISM
Gene
Transcription Factor
Transcription Factor
Binding Site
ATCGCGTAGCTAGGGATGACAGACACACATAATTCTAGATA
![Page 5: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/5.jpg)
ChIP-seq Maps TF binding
5
Transcription Factor
Gene
Genome ATATCGTATCTTTTAAACCGGGTTGGCCACTAGA ATATCGTATCTAAACCGCCTCGG
ChIP-seq Map for TF Peak
Transcription Factor
Binding Site
CHIP-SEQ
DNA
![Page 6: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/6.jpg)
TF Binding Differs Across Contexts
6
ATATCGTATCTTTTAAACCGGGTATGTAATGCAT ATATCGTATCTAAACCGCCCGTGT
ATATCGTATCTTTTAAACCGGGTTGGCCAGTATA ATATCGTATCTAAACCGCCCTGCA
![Page 7: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/7.jpg)
7
? ?
(Blood Cell)
(Stem Cell)
(Leukemia)
(Lung Cancer)
(Cervical Cancer)
(Nerve Cell)
(Immunity related)
Current Challenge: ENCODE Data Gap
Source : http://genome.ucsc.edu/ENCODE/dataMatrix/encodeChipMatrixHuman.html
![Page 8: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/8.jpg)
Case for Computational Tools
8
![Page 9: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/9.jpg)
Existing Computational Tools
9
Generative Approaches
Discriminative Approaches
MEME CISFINDER
STRING KERNEL+SVM
![Page 10: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/10.jpg)
Generative : PWM Based approach
10
Genome ATATCGTATAACAATAACCGGGAACTAATAGC ATATCGTATCTAACAAATCCTACT
ChIP-seq Map for TF Peak
Sequence Logo
1 2 3 4 5 6 7 8 9 10 11 12 A 14 0 0 14 28 40 9 45 42 13 15 9 T 12 3 4 12 11 10 9 6 5 38 12 3 C 3 0 1 8 2 2 36 2 2 0 1 0 G 0 1 0 16 10 1 2 3 2 0 7 11
Position Weight Matrix
![Page 11: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/11.jpg)
Genome ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC ATATCGTATCTAAACCGCCCTACT
? ?
Generative Approach : Output
11 Source : http://www.cbil.upenn.edu/EpoDB/release/version_2.2/meme/meme-output.html#sample
![Page 12: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/12.jpg)
Generative Approach: Limitations
– Output: Long list of potential TFs – Work well for only well preserved motifs or large
training datasets
– PWMs for all ~2000 TFs not available
– Lower prediction performance than discriminative approaches
12
![Page 13: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/13.jpg)
Genome ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC ATATCGTATCTAAACCGCCCTACT
? ?
ATATCGTATCTTTTAAACCGGGTTGGCCAATAGC
Peak
ATATCGTATCTAAACCGCCCTACT Genome
+1 -1
Discriminative Approach : Output
13
![Page 14: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/14.jpg)
Discriminative : String Kernel Approach
14
Support Vector Machine
![Page 15: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/15.jpg)
Discriminative Approach : Limitation
Assumption: Training/test data follow same distribution regardless of context.
15
![Page 16: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/16.jpg)
Aim
• Improve prediction of Transcription Factor Binding sites across contexts using knowledge transfer.
16
![Page 17: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/17.jpg)
Proposed Solution : Cross-Context Knowledge Transfer
17
![Page 18: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/18.jpg)
Transfer String Kernel : Overview
Feature Conversion
Feature Conversion
Knowledge Transfer
Classification
SourceContext
TargetContext
Training
(KMM)
ATCGATGTATAC
ATACATGCTTAC
Xs Xt
18
![Page 19: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/19.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
19
![Page 20: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/20.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
20
![Page 21: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/21.jpg)
String Kernel : Spectrum Kernel
21
Feature map indexed by all k-length subsequences (“k-mers”) from alphabet Σ of amino acids, |Σ|=20
![Page 22: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/22.jpg)
String Kernel : Mismatch Kernel
22
For k-mer s, the mismatch neighborhood N(k,m)(s) is the set of all k-mers t within m mismatches from s.
![Page 23: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/23.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
23
![Page 24: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/24.jpg)
Support Vector Machine
24 Negative Instances (y = -1) Positive Instances (y = +1)
w . x + b ≤ -1
w . x + b ≥ +1
w . x + b = 0
![Page 25: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/25.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
25
![Page 26: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/26.jpg)
Transfer Learning (KMM)
26
True densities
Ratios
ptr(x)
pte(x)
r(x)
r(x)
![Page 27: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/27.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
27
![Page 28: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/28.jpg)
Importance Re-weighting
28
Original Weights KMM Weights
![Page 29: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/29.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
29
![Page 30: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/30.jpg)
Transfer String Kernel (TSK)
30
Feature Conversion!
Feature Conversion!
Knowledge Transfer!
Classification!
Source!Context!
Target!Context!
Training!
ATCGATCGATCGATCG%
CCCGATCGCTCGCTCC%
Mismatch String Kernel
Mismatch String Kernel
Kernel Mean
Matching (KMM)
Importance Re-weighting SVM
![Page 31: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/31.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
31
![Page 32: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/32.jpg)
Experimental Setup
• 14 Transcription Factors (ENCODE ChIP-seq) • Top 1000 positive sequences (500 training and
500 testing) • 1000 random negative sequences • Hyper-parameter tuning for k=(8,10,12) and
m=(1,2,3) • Dictionary size = 4 {A,T,C,G}
32
![Page 33: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/33.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
33
![Page 34: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/34.jpg)
Results
34
![Page 35: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/35.jpg)
Results – Cross Context
0.8
0.82
0.84
0.86
0.88
0.9
Sin3a Max Mxi1 Chd2 Ctcf
AU
C S
core
Transcription Factors
TSK SK
35
![Page 36: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/36.jpg)
Outline • Method – String Kernel – Support Vector Machine – Transfer Learning (KMM) – Importance re-weighting – Transfer String Kernel
• Evaluation – Experimental Setup – Cross-context TFBS prediction – Cross-context Protein Binding prediction
36
![Page 37: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/37.jpg)
Results – Cross context
37
![Page 38: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/38.jpg)
Summary
• TSK overall improves the cross-context TFBS predictions;
• String kernel based approaches perform better than the state-of-
the-art Position Weight/Frequency Matrix based TFBS tools; • TSK approach is generalizable for performance improvement of
any cross-context sequence prediction task.
Presented in BIOKDD ’15 38
![Page 39: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/39.jpg)
Acknowledgements
Dr. Mazhar Adli Adli Lab : Department of Biochemistry and
Molecular Genetics @Uva
Nipun Batra IIIT-Delhi
39
![Page 40: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/40.jpg)
Machine Learning Lab @ UVa
Dr. Yanjun Qi (Advisor)
Jack Lanchantin
Beilun Wang
Weilin Xu
Ji Gao
40
![Page 41: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/41.jpg)
Future Directions
• Deep Learning : – Gene expression prediction using histone
modification data (ECCB 2016) – Improving TFBS prediction using DNA sequences
(ICLR Workshop 2016, ICML Workshop 2016)
• String Kernels: Improving efficiency!! (on-going work)
41
![Page 42: Transfer String Kernel for Cross-Context Sequence Specific ...Specific DNA-Protein Binding Prediction by Ritambhara Singh IIIT-Delhi June 10, 2016 1 . Biology in a Slide 2 DNA RNA](https://reader033.vdocuments.site/reader033/viewer/2022041906/5e635115a6f4eb1167243e0f/html5/thumbnails/42.jpg)
Thank You
42