review by hamid bolouri, //research.fhcrc.org/content/dam/stripe/... · eve mardis, nature methods,...
TRANSCRIPT
![Page 1: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/1.jpg)
Review by Hamid Bolouri, http://labs.fhcrc.org/bolouri
![Page 2: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/2.jpg)
![Page 3: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/3.jpg)
![Page 4: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/4.jpg)
Nature Biotechnology 23, 1249 - 1256 (2005)
![Page 5: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/5.jpg)
![Page 6: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/6.jpg)
Luscombe et al, Genome Biology 2000, 1(1):reviews001.1–001.37
RNA polymerase
initiation complex
RNAregulatory complex
![Page 7: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/7.jpg)
Eve Mardis, Nature Methods, 2007, (4):613
1. Cross-link proteins to DNA
2. Fragment chromatin to 100-150bp
3. Immunoprecipitate antibody-bound DNA fragments
4. Reverse cross-links and sequence fragment ends
5. Map sequence reads to genome
6. Identify genomic regions with enriched number of mapped reads
ChIP-seq Overview
Barbara Wold’s lab, Science, 2007, 16(5830):1497-502
![Page 8: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/8.jpg)
bone marrow
macrophages
microglia
osteoblasts
follicularB cells
Predicted Hes1 binding site
http
://b
iogps.
gnf.o
rg/
Ly9 expression is repressed in the bone marrow
Dat
a Su
zan
ne
Furu
yam
a&
Irw
in B
ern
stei
n, F
HC
RC
![Page 9: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/9.jpg)
Kharchenko, Tolstorukov and Park
![Page 10: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/10.jpg)
Torres, Metta, Ottenwälder & Schlötterer, Genome Research, 2007, 18:000
Ordahl, Johnson & CaplanNAR, 1976, 3(11):2985-2999
0 500 1000 1500 2000
DNA fragment length (base pairs)
expected fragment frequency
sequenced fragment frequency
Norm
aliz
ed
fra
gm
ent
co
un
t
Fragment length (X100bp)
100bp
200bp
300bp400bp
N1 Hes1
DNA fragments (excluding 120bp adapters) are
asymmetrically distributed ~ 50bp–280bp
(data: Suzanne Furuyama, Bernstein lab, FHCRC)
![Page 11: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/11.jpg)
Zhang et al, PLoS Comp. Biol. Aug. 2008
![Page 12: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/12.jpg)
Part (1): Calling ChIP-seq peaks with PICS (Probablistic Inference of ChIP-seq)
1. Segment the genome into Bound Regions
a. Sliding window of length ~ ½ to 1 x average fragment length
b. Move window in steps of ~ expected motif length (say 10bp)
c. Mark as Bound if number of reads > Tmin (~ 1f, 1r)
2. For each putative peak in a Region
3. Overall reads distribution in region
![Page 13: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/13.jpg)
Parameters:
relative proportion of reads / peak
average fragment length
location of peak
f SD of forward reads
r SD of reverse reads
: parameter controlling spread of k
▲ Bayesian conjugate prior for
![Page 14: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/14.jpg)
♦ It can be seen that if the degrees of freedom i is fixed in advance for
each component, then the M-step exists in closed form.
▲ deterministic data fitting with ECM
♦ fast analytic maximization
For each Region;
- Fit multiple models with 1 to 15 peaks
- Use max(BIC) to select ‘best’ model
log likelihood number of peaksnumber of observed
reads
![Page 15: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/15.jpg)
Post-processing step
- Merge peaks that overlap
- Remove peaks that fail:
(assuming ~100bp window size)
unfiltered
filtered
![Page 16: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/16.jpg)
1/n
n = no. k-meroccurrences in genome
after
before
mappability=0
![Page 17: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/17.jpg)
PICS score =
)(N
N
N
control
IP
ersenoReadsRev , wardnoReadsFor where
IP in reads of number total
control in reads of number total
min
1
1.
For FDR, re-run PICS with IP and control swapped, then
FDR =
control vs IP in T score withpeaks of number
IP vs control in T score withpeaks of number
![Page 18: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/18.jpg)
GABP FOXA1
Zhang et al, Biometrics 2010, Jun 1st [Epub ahead of print]
![Page 19: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/19.jpg)
![Page 20: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/20.jpg)
1. Select a set of dyads as starting points
a. Enumerate all (3, 4, 5, 6) k-mers
b. Rank k-mers by over-representation in data
c. Form ~100 dyad seeds as (k-mer1, spacer(length=l ), k-mer2)
(k-mers are selected with prob∝ rank order
2. Compute dyad PWMs from occurrences of matches in data
3. Use EM to maximize (~25% of) matches to the candidate PWM
4. Align each dyad’s predicted binding sites and calculate the llr
positions all bases, all ndb,backgrou
b,pb,p
f
f.fMoodRatioLogLikelih log M=total number of sites
![Page 21: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/21.jpg)
5. Estimate p-value from above llr given dyad’s length and number of sites
(GADEM uses MEME method)
6. Matches with p-value < threshold are declared binding sites
7. Calculate E-value = p-value corrected for the size of the search space
8. Calculate fitness as:
9. Accept top 10% fittest motifs, mutate or cross-over the rest
a. mutate: swap in 1 element of (k-mer1, spacer(length=l ), k-mer2) from a random dyad
b. cross-over: swap elements between selected dyads
10.Extend motifs 10bp on each side, then prune from ends till
Information Content = < threshold (+di-,tri-mers)
11.Repeat from Step 2 (typically 5-10 times)
value)(EFitness
log
1
![Page 22: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/22.jpg)
![Page 23: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/23.jpg)
See also PLoS Comp Biol, March 2007 | Volume 3 | Issue 3 | e61
benoslab.pitt.edu/stamp/
![Page 24: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/24.jpg)
*** ***
***
![Page 25: Review by Hamid Bolouri, //research.fhcrc.org/content/dam/stripe/... · Eve Mardis, Nature Methods, 2007, (4):613 1. Cross-link proteins to DNA 2. Fragment chromatin to 100-150bp](https://reader034.vdocuments.site/reader034/viewer/2022050605/5faca4686d3af92b807a6f10/html5/thumbnails/25.jpg)
B/W Venn diagrams: Matches to FOXA1 motifs from different algorithms.