evolutionary and genomic approaches to find gene regulatory sequences penn state university, center...
TRANSCRIPT
![Page 1: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/1.jpg)
Evolutionary and genomic approaches to find gene regulatory sequences
Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller, Francesca Chiaromonte, Anton Nekrutenko, Kateryna Makova, Stephan Schuster, Ross Hardison
University of California at Santa Cruz: David Haussler, Jim Kent
Children’s Hospital of Philadelphia: Mitch WeissNimbleGen: Roland Green
University of Nebraska, Lincoln February 14. 2007
![Page 2: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/2.jpg)
Major goals of comparative genomics
• Identify all DNA sequences in a genome that are functional– Selection to preserve function– Adaptive selection
• Determine the biological role of each functional sequence
• Elucidate the evolutionary history of each type of sequence
• Provide bioinformatic tools so that anyone can easily incorporate insights from comparative genomics into their research
![Page 3: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/3.jpg)
Known types of gene regulatory regions
G.A. Maston, S.K. Evans, M.R. Green (2006) Ann. Rev. Genomics & Human Genetics 7:29-59.
![Page 4: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/4.jpg)
Regulatory regions tend to be clusters of transcription factor
binding sites
Sequence-specific
SV40 promoters and enhancer
![Page 5: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/5.jpg)
Properties of known regulatory regions
• Binding sites for transcription factors, many with sequence specificity
• Clusters of binding sites• Conventional promoters encompass major start sites for transcription
• Conserved over evolutionary time???
![Page 6: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/6.jpg)
Structures involved in transcription are probably more
complex
Peter R. Cook, Oxford University, http://users.path.ox.ac.uk/~pcook/images/Images.html
Middle image: Green: active transcription (Br-UTP label) Red: all nucleic acids HeLa cellSides: EM spreads of transcripts
![Page 7: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/7.jpg)
Domain opening is associated with movement to non-heterochromatic regions
Schubeler, Francastel, Cimbora, Reik, Martin, Groudine (2000) Genes & Dev. 14: 940-950
![Page 8: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/8.jpg)
Other possible activities for sequences involved in gene
regulation• Opening or closing a chromosomal domain• Move a gene to or away from a transcription factory
• Control how long a gene is in a transcription factory– Long association
• High level expression• Really long gene
– Short association• Lower level expression• Rapid regulation
• Are these conserved over evolutionary time?
![Page 9: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/9.jpg)
3 modes of evolution
Sequence matches at longer phylogenetic distances could reflect purifying selectionSequence differences at closer phylogenetic distances could reflect adaptive evolution.
![Page 10: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/10.jpg)
Conservation vs. Constraint
• Conserved sequences are those that align between two species thought to be descended from a common ancestor
• Constrained sequences show evidence in their alignments of negative (purifying) selection– E.g. change at a rate significantly slower than “neutral” DNA
![Page 11: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/11.jpg)
Ideal cases for interpretation
Neutral DNASimilarity
Human vs mouse
Position along chromosome
DNA segments with a function common to divergent species.
DNA segments in which change is beneficial to at least one of the two species.
Negative selection(purifying)
P (not neutral)Neutral DNA
Similarity
Positive selection(adaptive)
Neutral DNA
Human vs rhesus
![Page 12: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/12.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 13: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/13.jpg)
Finding all gene regulatory regions is a challenge for comparative
genomics
• Known regulatory regions for the HBB complex• 23 total• 19 conserved (align) between human and mouse• Many others show no significant difference in a measure of constraint (phastCons) from the bulk or neutral DNA
![Page 14: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/14.jpg)
Two extremes of
constraint in TRRs
![Page 15: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/15.jpg)
ENCODE projects
• ENCODE (ENCyclopedia Of DNA Elements): consortium aiming to find function for all human DNA sequences– Phase I focused on 1% of human DNA– 30 Mb, 44 regions
• About 10 regions had known genes of interest (CFTR, HOX)
• Others were chosen to get a sampling of regions varying in gene density and alignability with mouse
• Major areas– Genes and transcripts– Transcriptional regulation– Chromatin structure– Multiple sequence alignment– Variation in human populations
![Page 16: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/16.jpg)
Biochemical assays for protein-binding sites in DNA
Purified protein& Naked DNA
Chromatin Immunoprecipitation:DNA sites occupied by a protein inside cells.
![Page 17: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/17.jpg)
ChIP-on-chip to examine many sites
![Page 18: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/18.jpg)
Putative transcriptional regulatory regions = pTRRs
• Antibodies vs 10 sequence-specific factors: – Sp1, Sp3, E2F1, E2F4, cMyc, STAT1, cJun, CEBPe, PU1, RA Receptor A
– High resolution ChIP-chip platforms: Affymetrix and NimbleGen
– Data from several different labs in ENCODE consortium
• High likelihood hits for ChIP-chip– 5% false discovery rate
• Supported by chromatin modification data– Modified histones in chromatin: H4Ac, H3Ac, H3K4me, H3K4me2, H3K4me3, etc.
– DNase hypersensitive sites (DHSs) or nucleosome depleted sites
• Result: set of 1369 pTRRs
![Page 19: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/19.jpg)
A small fraction of cis-regulatory modules are conserved from human to
chicken
310
450
91
173
Millions ofyears
• About 4% of pTRRs, 4% of DNase HSs, 4-7% of promoters active in multiple cell lines
• Tend to regulate genes whose products control transcription and development
David King
![Page 20: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/20.jpg)
Most pTRRs are conserved in eutherian mammals
310
450
91
173
Millions ofyears
Within aligned noncoding DNA of eutherians, need to distinguish constrained DNA (purifying selection) from neutral DNA.
Percentage of class that align no further than:
Primates: 3%
Eutherians: 71%
Marsupials: 21%
Tetrapods: 4%
Vertebrates: 1%
pTRRs DNase HSs Promoters
11%
70%
14%
4%
1%
1-13%
63%
16-28%
4-7%
2-4%
![Page 21: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/21.jpg)
Measures of conservation and constraint capture only a subset of
pTRRs
Fraction overlappingan MCS
phastCons (background rate corrected)
Composite alignability (background rate corrected)
Stringent constraint Allows a range of constraint
Aligns, but no inference about purifying selection
![Page 22: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/22.jpg)
Different measures perform better on specific functional regions
Sensitivity
1-Specificity
![Page 23: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/23.jpg)
Examples of clade-specific pTRRs
![Page 24: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/24.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 25: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/25.jpg)
Regulatory potential (RP) to distinguish functional classes
![Page 26: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/26.jpg)
Good performance of ESPERR for gene regulatory regions (RP)
-
James TaylorFrancesca Chiaromonte
![Page 27: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/27.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 28: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/28.jpg)
Conservation of predicted binding sites for
transcription factorsBinding site for GATA-1
![Page 29: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/29.jpg)
Genes Co-expressed in Late Erythroid Maturation
G1E-ER cells: proerythroblast line lacking the transcription factor GATA-1. Can rescue by expressing an estrogen-responsive form of GATA-1Rylski et al., Mol Cell Biol. 2003
![Page 30: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/30.jpg)
Predicted cis-Regulatory Modules (preCRMs) Around Erythroid Genes
B:Yong Cheng, Ross, Yuepin Zhou, David KingF:Ying Zhang, Joel Martin, Christine Dorman, Hao Wang
![Page 31: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/31.jpg)
preCRMs with conserved consensus GATA-1 BS tend to be active on transfected
plasmids
![Page 32: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/32.jpg)
preCRMs with conserved consensus GATA-1 BS tend to be active after integration into a chromosome
![Page 33: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/33.jpg)
Examples of validated preCRMs
![Page 34: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/34.jpg)
Correlation of Enhancer Activity with RP Score
![Page 35: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/35.jpg)
Validation status for 99 tested fragments
![Page 36: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/36.jpg)
preCRMs with High RP and Conserved Consensus GATA-1 Tend To Be
Validated
![Page 37: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/37.jpg)
Compare the outputs
C C N C M C C C W
Consensus for EKLF binding site:
All validated preCRMs
All nonvalidated preCRMs
Same parameters
CCNCMCCCWCCNCMCCCW
CACC box helps distinguish validated from nonvalidated preCRMs
Ying Zhang
![Page 38: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/38.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 39: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/39.jpg)
preCRMs with conserved consensus GATA-1 binding sites are usually occupied by
that protein: ChIP assay
![Page 40: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/40.jpg)
Design of ChIP-chip for occupancy by GATA-1
1. Non-overlapping tiling array with 50bp probe and 100bp resolution (NimbleGen)
2. Cover range Mouse chr7:57225996-123812258 (~70Mbp)3. Antibody against the ER portion of
GATA-1-ER protein in rescued G1E-ER4 cells
50 50
100
Yong Cheng, with Mitch Weiss & Lou Dore (CHoP), Roland Green (NimbleGen)
![Page 41: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/41.jpg)
Signals in known occupied sites in Hbb LCR
1) Cluster of high signals2) “hill” shape of the signals
HS1 HS2 HS3
![Page 42: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/42.jpg)
Peak Finding Programs
• TAMALPAISMark Bieda from Peggy Farmham’s lab Focus more on the cluster of the signals4 thresholds based on number of consecutive probes with signals in the 98th or 95th percentiles
• MPEAKBing Ren’s labFocus more one the “hill” shape of the signal4 thresholds, for a series of probes with at least one that is 3, 2.5, 2 or 1 standard deviations above the mean
![Page 43: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/43.jpg)
ChIP-chip hits for GATA-1 occupancy
Mpeak TAMALPAIS
275 hits in both 276 hits in both216 6059
321 total ChIP-chip hits
Technical replicates of ChIP-chip with antibody against GATA1-ER
![Page 44: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/44.jpg)
ChIP-chip hits validate at a high rate
Validation determined by quantitative PCR.19 of the 321 hits were tested.13 (~70%) were validated.
9 regions were “hits” in only one of the two technical replicates.None were validated.
Validation rate is similar at different thresholds
ChIP DNA
![Page 45: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/45.jpg)
Association of WGATAR and conservation with ChIP-chip
Hits
1. 249 out of the 321 (78%) have WGATAR motifs, binding site for GATA-1
2. Of the GATA-1 binding motifs in those 249 hits, 112 (45%) are conserved between mouse and at least one non-rodent species.
![Page 46: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/46.jpg)
Expected and unexpected ChIP-chip hits
![Page 47: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/47.jpg)
Distribution of ChIP-chip hits on 70Mb of mouse chr7
Yong Cheng, Yuepin Zhou and Christine Dorman
![Page 48: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/48.jpg)
Almost half the GATA-1 ChIP-chip hits increase expression of a
transgene, K562 cells
0
1
2
3
4
GHP181GHP10GHP7GHP182GHP309
GHP1GHP186GHP205
GHP4GHP314GHP172GHP167GHP74GHP193GHP27GHP9
GHP170GHP18GHP16GHP243GHP15GHP28GHP17GHP31GHP11GHP198GHP169GHP14GHP173GHP29GHP199GHP12GHP3GHP24GHP164GHP13GHP30GHP19GHP26GHP161GHP191GHP197GHP183GHP184GHP6GHP23GHP206GHP194GHP202
GHP0GHP200
GHP8GHP185GHP118GHP20GHP204GHN534GHN006GHN133GHN037GHN322
YC3
GHN213
Fold change over parent
GATA-1 occupied sites by ChIP-chip No GATA-1
15 6 6
24 validated out of 56 fragments with ChIP-chip hits tested 43%
![Page 49: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/49.jpg)
Conserved and nonconserved ChIP-chip hits can be active
as enhancers
Conserved, active
Conserved, not active Not conserved, active
![Page 50: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/50.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 51: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/51.jpg)
Polymorphism as a transient phase of evolution
Slide from Dr. Hiroshi Akashi
![Page 52: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/52.jpg)
Test of neutrality using polymorphism and divergence data
![Page 53: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/53.jpg)
Test for recent selection in human noncoding DNA
• McDonald-Kreitman test• Use ancestral repeats as neutral model (MKAR test)• Count polymorphisms in human using dbSNP126• Count divergence of human from
– Chimpanzee (great Ape, diverged from human lineage 6 Myr ago)
– Rhesus macaque (Old World Monkey, diverged from human lineage 23 Myr ago)
• Tiled windows, most analysis on 10kb windows• Compute p-value for neutrality by chi-square test• Ratio of polymorphism to divergence ratios gives
indication of direction of inferred selection
Heather Lawson, Anthropology, PSU
![Page 54: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/54.jpg)
pTRR apparently under positive selection
![Page 55: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/55.jpg)
A promoter distal to the beta-like globin genes has a signal for recent
purifying selection
![Page 56: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/56.jpg)
Selection on a primate-specific promoter
![Page 57: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/57.jpg)
The distal promoter is close to the locus control region for beta-globin
genes
![Page 58: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/58.jpg)
Messages about evolutionary approaches to predicting regulatory
regions• Regulatory regions are conserved, but not all to the same phylogenetic distance.
• Incorporation of pattern and composition information along with with conservation can lead to effective discrimination of functional classes (regulatory potential).
• Regulatory potential in combination with conservation of a GATA-1 binding motif is an effective predictor of enhancer activity.
• In vivo occupancy by GATA-1 suggests other activities in addition to enhancers.
• Comparison of polymorphism and divergence from closely related species can reveal regulatory regions that are under recent selection.
![Page 59: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/59.jpg)
Many thanks …
B:Yong Cheng, Ross, Yuepin Zhou, David KingF:Ying Zhang, Joel Martin, Christine Dorman, Hao Wang
PSU Database crew: Belinda Giardine, Cathy Riemer, Yi Zhang, Anton Nekrutenko
Alignments, chains, nets, browsers, ideas, …Webb Miller, Jim Kent, David Haussler
RP scores and other bioinformatic input:Francesca Chiaromonte, James Taylor, Shan Yang, Diana Kolbe, Laura Elnitski
Funding from NIDDK, NHGRI, Huck Institutes of Life Sciences at PSU
![Page 60: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/60.jpg)
Computing Regulatory Potential (RP)
Alignment seq1 G T A C C T A C T A C G C A seq2 G T G T C G - - A G C C C A seq3 A T G T C A - - A A T G T ACollapsed alphabet 1 2 1 3 4 5 7 7 6 8 3 6 3 9
• A 3-way alignment has 124 types of columns. Collapse these to a smaller alphabet with characters s (for example, 1-9).
•Train two order t Markov models for the probability that t alignment columns are followed by a particular column in training sets:
–positive (alignments in known regulatory regions)–negative (alignments in ancestral repeats, a model for neutral DNA)–E.g. Frequency that 3 4 is followed by 5:
0.001 in regulatory regions0.0001 in ancestral repeats•RP of any 3-way alignment is the sum of the log likelihood ratios of
finding the strings of alignment characters in known regulatory regions vs. ancestral repeats.
€
RP = logpREG (sa | sa−1...sa−t )
pAR (sa | sa−1...sa−t )
⎛
⎝ ⎜
⎞
⎠ ⎟
a in segment
∑
![Page 61: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/61.jpg)
Stage 1: Reduced representations
G
T
gap
ESPERR: Evolutionary Sequence and Pattern Extraction using Reduced Representations
![Page 62: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/62.jpg)
Stage 2: Improve encoding
![Page 63: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/63.jpg)
Train models for classification
Note that many different columns are reduced to single “encoding” (a number in the figure). E.g. Four different columns are each called “3”.
6 6 2 may occur frequently in positive training set and rarely in the negative training set, and thus contribute to discrimination.If the positive training set is known regulatory regions, this would contribute to a positive RP.
![Page 64: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/64.jpg)
Categories of Tested DNA Segments
![Page 65: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/65.jpg)
Example that suggests turnover
GATA-1 BSs
![Page 66: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/66.jpg)
All validated preCRMs All nonvalidated preCRMs
Background:
Mouse chr 19 (42.8% C+G) - NCBI Build 30
CLOVER (Zlab)
EKLF PWM(Dr. Perkins)
ELPH (UMaryland)
Hexamer Counting
Motif P(mm_chr19.m)EKLF 0.0008
Motif P(mm_chr19.m)
none none
Output for validated preCRMs
Output for nonvalidated preCRMs
validated non-validated6-mer TTATYT GGCAGR7-mer CCWCAGM RGRCAGR8-mer CASCCWGC CAGGGAWR9-mer CCWGGCWGM CWGRGAWRA
counts validated nonvalidatedNCACCC 60 32CACCCW 56 27expected validated nonvalidatedNCACCC 16.31 5.81CACCCW 11.74 4.36
Additional methods find CACC box as distinctive for validation
![Page 67: Evolutionary and genomic approaches to find gene regulatory sequences Penn State University, Center for Comparative Genomics and Bioinformatics: Webb Miller,](https://reader030.vdocuments.site/reader030/viewer/2022032600/56649dab5503460f94a9a52b/html5/thumbnails/67.jpg)
Using Galaxy to find predicted CRMs