promoter analysis using bioinformatics, putting the predictions to the test amy creekmore ansci 490m...

26
Promoter Analysis using Promoter Analysis using Bioinformatics, Putting the Bioinformatics, Putting the Predictions to the Predictions to the Test Test Amy Creekmore Amy Creekmore Ansci 490M Ansci 490M November 19, 2002 November 19, 2002

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Promoter Analysis using Promoter Analysis using Bioinformatics, Putting the Bioinformatics, Putting the

Predictions to the Predictions to the TestTest

Amy CreekmoreAmy Creekmore

Ansci 490MAnsci 490M

November 19, 2002November 19, 2002

Page 2: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Problems in predicting Problems in predicting promoters/ transcription promoters/ transcription factor binding sitesfactor binding sites

• Transcription factors often recognize relatively short and degenerate sequences.• These sequences are commonly

found through out the genome of the species.

• Induction often depends on the spacing/ frequency of transcription binding sites within a sequences.

• Binding sites are not always in the upstream region.

Markstein et al., 2002, figure1

Markstein et al., 2002, figure 2

Page 3: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Different Approaches to Different Approaches to Promoter AnalysisPromoter Analysis

• Saeed Tavazoie, et al. “Systematic determination of genetic network architecture” Nature Genetics 22: 281-285. – Discovery of transcriptional regulation sub-networks,

or genes that are under the control of similar promoters.

– De novo discovery of cis-regulatory elements in yeast using expression clustering of microarray data and AlignACE.

Page 4: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Different Approaches to Different Approaches to Promoter AnalysisPromoter Analysis

• Michele Markstein, et al. “Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo” Proceeding of the National Academy of Scinces USA 99 (2): 763-768.

– Identify genes (known and unknown) that are regulated by the characterized transcription factor Dorsal.

– Used FLYENHANCER to screen for clusters of known Dorsal response elements.

Page 5: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Different Approaches to Different Approaches to Promoter AnalysisPromoter Analysis

• Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. “ Proceedings of the National Academy of Sciences USA 99(2): 757-762.– Evaluated the extent to which the clustering of transcription factor

binding sites can be used as the computational basis to identify cis-regulatory modules.

– Used the program PASTER to search the genome for consensus binding sites of five different developmental transcription factors and then used CIS-ANALYST to visualize and compute results.

Page 6: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

First ApproachFirst Approach

• Systematic determination of genetic network architecture

Saeed Tavazoie, Jason D. Hughes, Michael J. Cambell, Raymond J. Cho, and George

M. Church Nature Genetics 22: 281-285.

Page 7: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

MethodMethod• Used microarray data by Cho et al. 1998 that consisted of

expression data for 6000 genes at 15 times points during two S. cerevisiae mitotic cell cycles.

• Analyzed 3000 “most variable ORFs” and normalized data by subtracting the mean expression level value across all time points for each gene.

• Clustered genes by expression pattern using euclidean distance metric values in the k-means algorithm.

Page 8: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

• Partitioned the 3000 ORFs into 30 clusters and the genes to functional categories.

• Determined the statistical significance for enrichment of a particular functional category.

Page 9: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Used AlignACE to align 600bp upstream regions in order to determined common nucleotide motifs.

Page 10: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002
Page 11: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002
Page 12: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002
Page 13: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

ResultsResults• Found 18 motifs in 12 different clusters

– Seven characterized transcription factor binding sites that are known to regulate many of the genes in their respective cluster.

• Clusters with known regulons have cis-regulatory elements emerged as the highest scoring motif in every case.– examples include MCB box and SCB cell-cycle box.

• Motifs that have not been previously described demonstrate strong correlation with clusters that are enrichement for genes with specific functions. – Cluster 3 motifs M3a and M3b and their association with RNA

and translation related genes within and outside of cluster 3.

• “Half of the 30 clusters were significantly enriched for functional categories or had significant motifs.”

Page 14: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Second ApproachSecond Approach

• Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in

the Drosophila embryo

• Michele Markstein, Peter Markstein, Vicky Markstein, and Michael S. Levine.

Proceeding of the National Academy of Scinces USA 99 (2): 763-768.

Page 15: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Dorsal Transcription Dorsal Transcription FactorFactor

• Drosophila transcription factor involved in dorsal-ventral patterning in development.

• Transcription can be inhibited or induced by Dorsal depending on the promoter. Also, transcription induction is concentration dependent.

Zen Sog

Page 16: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

• Used a degenerate Dorsal consensus sequences to scan entire Drosophila genome using FLY ENHANCER.

Page 17: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Ady and Phn expression Ady and Phn expression patternspatterns

Page 18: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

ResultsResults

• Computational searches successfully identified genes that are activated at high (Phm), intermediate (Ady), and low (Sog) levels of Dorsal.

• At least 33% are known, or indicated, to be regulated by dorsal (5/15).

Page 19: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Third ApproachThird Approach

• Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern

formation in the Drosophila genome.

• Benjamin P. Berman, Yutaka Nibu, Barret D. Pfeiffer, Pavel Tomancak, Susan E. Celniker, Michael Levine,

Gerald M. Rubin, and Michael B. Eisen Proceedings of the National Academy of Sciences USA

99(2): 757-762.

Page 20: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

MethodsMethods

• Assigned consensus sequences to each of the five transcription factors using MEME and previously described binding sites.

• Used the program PASTER to search the genome for sequences that matched and visualized with the program CIS-ANALYST (developed by the authors).

• Using CIS-ANALYST analyzed the distribution of the sites and define windows that contained clusters of transcription factor binding sites.

Page 21: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Test of the MethodsTest of the Methods

Page 22: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

ResultsResults

• Examined novel clusters with 15binding sites (or more) per 700bp.

• Identified 28 clusters that met this criteria - these sites contain binding sites for at least two of the factors.– 23 fall in upstream regions

– 3 fall in intron regions

Page 23: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

ResultsResults

• Examined the 49 genes that could be regulated by these sites using in situ hybridization and DNA microarray analysis.

• Ten of the 28 sites were upstream of in the first intron of anterior-posterior pattern expressed genes.– ~35% correct predictions

Page 24: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

The Giant (Gnt) geneThe Giant (Gnt) gene

Page 25: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Conclusions (from Conclusions (from papers)papers)

• Clustering can be used to successfully determine cis-regulatory elements and can be applied to other systems.

• Clustering is more efficient when done using prior knowledge of transcription factor binding site(s).

• Computational identifications of cis-regulatory DNA regions improves when using two or more different classes of recognition sequences (motifs).

• “The grammar of the cis-regulatory code is clearly more complex than simply the density of transcription factor binding sites.” Berman et al. 2002

Page 26: Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002

Conclusions (overall)Conclusions (overall)

• Promoter prediction is a powerful tool that can be used for low cost screens for transcription regulatory sites.

• Success is going to depend on a number of factors:– the specific transcription factor (specificity of binding)

– previous characterization

– parameters used (window size)

– annotation of the genome being used