large-scale mining of gene expression patterns
DESCRIPTION
Large-scale mining of gene expression patterns. Paul Pavlidis [email protected]. VanBUG September 2007. Students Leon French Meeta Mistry Vaneet Lotay Postdoc Jesse Gillis Undergraduates Raymond Lim Suzanne Lane Programmers Kelsey Hamer Luke McCarthy. Genome. Synapse. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/1.jpg)
Large-scale mining of Large-scale mining of gene expression gene expression
patternspatterns
Paul PavlidisPaul [email protected]@bioinformatics.ubc.ca
VanBUG September 2007
![Page 2: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/2.jpg)
StudentsStudentsLeon FrenchLeon FrenchMeeta MistryMeeta MistryVaneet LotayVaneet Lotay
PostdocPostdocJesse GillisJesse Gillis
UndergraduatesUndergraduatesRaymond LimRaymond LimSuzanne LaneSuzanne Lane
ProgrammersProgrammersKelsey HamerKelsey HamerLuke McCarthyLuke McCarthy
![Page 3: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/3.jpg)
Synapse Genome
Signal transduction
Synaptic modulation
InjuryStress
DiseaseAging
Development
![Page 4: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/4.jpg)
![Page 5: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/5.jpg)
TopicsTopics
• Connectivity database and analysis• Gene expression data re-use system• Scaling up gene coexpression analysis• Applications and ongoing work
![Page 6: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/6.jpg)
Another ‘omeAnother ‘ome
![Page 7: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/7.jpg)
Leon French, Suzanne Lane
![Page 8: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/8.jpg)
![Page 9: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/9.jpg)
Growth of GEO
0
20000
40000
60000
80000
100000
120000
Dec-99 Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08
Date
Su
bm
iss
ion
s
![Page 10: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/10.jpg)
Age
Genes
SamplesWith JJ Mann, V Arango, E Sibille et al.
![Page 11: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/11.jpg)
![Page 12: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/12.jpg)
Age
Genes
SamplesData from http://national_databank.mclean.harvard.edu/
![Page 13: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/13.jpg)
![Page 14: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/14.jpg)
GEO
![Page 15: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/15.jpg)
Goals for a systemGoals for a system
• Researchers should be able to put their new expression data in a wider context of previous studies without extraordinary effort.
• Move analyzing multiple microarray data sets from a niche activity to the mainstream
• Integration of other data types, domain specific information.
![Page 16: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/16.jpg)
CoexpressionDifferential expression
Public data sources
![Page 17: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/17.jpg)
Challenges to comparing data Challenges to comparing data setssets
• Need to match genes/transcripts across platforms• Data from third parties not always easy to handle• Varying scales, normalization, etc.• Varying data quality• Varying levels of “raw data” available• Selecting appropriate data to compare
![Page 18: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/18.jpg)
With Cincinnati Children’s Hospital (D.Glass, M. Barnes et al.)
![Page 19: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/19.jpg)
Fraction of probes with alignments
Fre
qu
en
cy
0.0 0.2 0.4 0.6 0.8 1.0
05
10
15
20
Fraction non-specific probes
Fre
qu
en
cy
0.0 0.2 0.4 0.6 0.8 1.0
02
46
81
01
21
4
Probe specificity (or lack Probe specificity (or lack thereof)thereof)
![Page 20: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/20.jpg)
Which data sets are reasonable to Which data sets are reasonable to compare?compare?
All mouse data sets
Mouse brain data sets
Mouse neocortex data sets
Mouse neocortex data sets examining stress
Mouse neocortex data sets examining hypoxic stress
Mouse neocortex data sets examining hypoxic stress after 3 hours of hypoxia
Too general, but lots of power
Very specific, low power
![Page 21: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/21.jpg)
![Page 22: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/22.jpg)
![Page 23: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/23.jpg)
Expression experiments 519 Mus musculus 254 Homo Sapiens 203
Rattus norvegicus 62 Array Designs: 178 Assays (i.e., chips): 20837 Coexpression links (probe-level): >100 million
![Page 24: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/24.jpg)
Scaling up analysis of gene Scaling up analysis of gene coexpressioncoexpression
• Genes that are coexpressed tend to have related function• Needed at the same place at the same time• “Guilt by association”
• Reasonable to compare across studies
Samples
Exp
ress
ion
Eisen et al., 1998 PNAS
Two ribosomal protein genes.
![Page 25: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/25.jpg)
Biological noiseBiological noise• Induced gene expression effects are often small.• Gene expression varies between “replicates” in
biologically-meaningful ways. • Allows us to repurpose data
Sample type
![Page 26: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/26.jpg)
Functional coexpression should be Functional coexpression should be (somewhat) generalized(somewhat) generalized
• If two genes are coexpressed under one condition, they will probably be coexpressed under at least some other conditions (or data sets).
• Coexpression seen “only once” needs special care in interpretation.• We shouldn’t expect coexpression to be perfectly reproducible (for biological
and technical reasons)
Correlation Correlation
![Page 27: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/27.jpg)
Genome Research, June 2004
A simple approach:
Count Recurring patterns
![Page 28: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/28.jpg)
Pipeline for one datasetPipeline for one dataset
![Page 29: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/29.jpg)
![Page 30: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/30.jpg)
Proof of concept analysisProof of concept analysis
• 60 human data sets, 15700 RefSeq genes.• 70% cancer data• 11 million “links”• About 9.7 million different links
![Page 31: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/31.jpg)
Many links are replicated across Many links are replicated across studiesstudies
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1 10 100
Minimum number of data sets link is seen in
Nu
mb
er
of
lin
ks
Observed
Shuff led database (mean)
![Page 32: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/32.jpg)
Evaluation on biological Evaluation on biological groundsgrounds
![Page 33: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/33.jpg)
Cluster involving NMDAR1 Cluster involving NMDAR1 (GRIN1)(GRIN1)
![Page 34: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/34.jpg)
ATP6V0A1PLD3
GRIN1
Allen Brain Institute
![Page 35: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/35.jpg)
Application: analysis of imprinted Application: analysis of imprinted genesgenes
Laurent Journot, INSERM – Universités Montpellier
![Page 36: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/36.jpg)
Ewing et al, 2007 Molecular Systems Biology
Cor
rela
tion
p-va
lue
LYAR interacting proteinsLYAR interacting proteins
LYAR-interactors
![Page 37: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/37.jpg)
Vote counting limitationsVote counting limitations
• Weak evidence distributed across data sets will not be picked up.
• This example meets strict “vote counting” criteria in only 2/23 data sets
Correlation
![Page 38: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/38.jpg)
2 4 6 8 10 12 14
-1.0
-0.5
0.0
0.5
1.0
Support (datasets)
Glo
ba
l effe
ct s
ize
Cor
rela
tion
(Glo
bal)
Support (# of datasets)
![Page 39: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/39.jpg)
Gen
es p
airs
Datasets
Related work: Zhou XJ et al., Nat.Biotech 2005
![Page 40: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/40.jpg)
SummarySummary
• Reuse of public data: ‘adding value’• Meta-analysis of coexpression• Some applications
• Functional prediction• Candidate identification• Platform evaluation
![Page 41: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/41.jpg)
Ongoing and future workOngoing and future work• Applications and analyses
• Protein interactions and hubs• Prediction of gene function at the synapse• Differential expression analysis
• Regionalization• Mouse models of brain injury• Mouse models of psychosis
• Expanding our public database and softwarehttp://www.bioinformatics.ubc.ca/GemmaWeb-based tools for biologists; web services coming soon
• Integration with other information sources
![Page 42: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/42.jpg)
ThanksThanksGemmaXiang Wan Kelsey HamerLuke McCarthyKiran KeshavSuzanne LaneMeeta MistraJesse Gillis
Joseph SantosGozde CozenDavid QuigleyAnshu SinhaSpiro PantazatosWei-Keat Lim
TmmHomin LeeAmy HsuJon SajdakJie QinTzu-Lin Hsaio
And to:
NCBI GEO team
Groups who made data available
Collaborators who provided data prior to publication
Conrad Gilliam
Abraham Palmer
Andreas Kottmann
Etienne Sibille
CollaboratorsBarclay MorrisonJoseph GogosMichael HaydenBlair LeavittTony BlauPanos Papapanou
![Page 43: Large-scale mining of gene expression patterns](https://reader036.vdocuments.site/reader036/viewer/2022062305/56814bd3550346895db8ab7d/html5/thumbnails/43.jpg)
Answers to FAQsAnswers to FAQs
• No, they don’t have to be time course experiments.• Yes, we’re using cDNA as well as Affymetrix etc.• Yes, we see reproducible negative correlations.• Yes, we’re interested in finding differences as well as
similarities between data sets.• No, we aren’t necessarily inferring regulatory relationships• Yes, we know that RNA is just one way of measuring cell
state.• No, we don’t have {worm,fly,yeast…} data, but we’d like to.