how will we efficiently understand the interactions of ~20,000 genes,

Download How will we efficiently understand the interactions of ~20,000 genes,

If you can't read please download the document

Upload: wanda

Post on 19-Mar-2016

30 views

Category:

Documents


0 download

DESCRIPTION

How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions?. Minimally, we need to use the information that exists. June 1979: 2 relevant papers. S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans - PowerPoint PPT Presentation

TRANSCRIPT

  • How will we efficiently understandthe interactions of ~20,000 genes,with ~200 million potential pairwise interactions?Minimally, we need to use the information that exists

  • June 1979: 2 relevant papersS. Brenner (Genetics 1974) The genetics of Caenorhabditis elegansJ. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegansJan 2008: >200,000 relevant papers

  • 21Predicting Gene Interactions from information available in public databasesPrioritizing high resolution genetic interaction tests by knowledge miningFull text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos ChanWeiwei Zhong

  • Scientists spend more time skimming for information than reading papers.

    Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms.We designed Textpresso to do automated skimming for researchers and database curators.

    The output can be used for more sophisticated Natural Language Processing.www.textpresso.orgTextpresso Literature Search Engine

  • Full TextSentenceOntologyPubMedGoogle Scholar(-)+++---MeSHTaxonomyGene OntologyCustomizedNeuroscience Information FrameworkTextpressoCan we do better than PubMed and Google Scholar?

  • precursorupstream cascade descendantsReporter GenesDrosophilaanatomyFOXO HOXA1 pax2PKD1 denticle wing

    MP2 neuronGFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherryCategories are bags of words

  • ARTICLE TEXTTEXTPRESSO CATEGORIESegl-38 regulates lin-3 transcription in vulF in L3 larvaegeneregulationprocesslife stageanatomyIndividual sentences in full text are marked up with CategoriesAutomatically mark up the whole corpus of papers with terms of categories, and index for rapid searchinggene

  • What Arabidopsis genes are expressed in the meristem based on reporter genes? 14,930 A.t. paperswww.textpresso.org/arabidopsis

  • Is a nicotinic receptor associated with Drugs of Abuse other than nicotine?www.textpresso.org/neuroscience 15,786 papers

  • The problem with clever fly namesGene nameabbreviationforagerforascuteasweeweWashed eyeWeTrain system to recognize gene names by context use italics from PDF~70%~85%Michael Mller, Arun Rangarajan

  • What reporter genes have been used with Drosophila genes to study human disease? 20,099 full-text fly paperswww.textpresso.org/fly

  • Find all sentences that contain 2 gene names and 1 association or regulation word:

    26,000 sentences out of 4.400 articlessimple interface to check off sentences100 sentences per hourDatabase curation: e.g. Gene-Gene Interactionsoutput into database

  • 21Predicting Gene Interactions from information available in public databasesPrioritizing high resolution genetic interaction tests by knowledge miningFull text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos ChanWeiwei Zhong

  • Training SetTraining set4775 Positive InteractionsGenetic, Literature curation (1909) Yeast two-hybrid screen (2933)

    3296 Negative Genetic Interactionscis doubles in genetic mapping

    Benchmark5515 Positives: KEGG database5000 Negatives: Randomly selected

  • Algorithmworm gene pairyeast orthologstotal scorefly orthologsfly scoreworm scoreyeast scoreinteractionGOexpressionphenotypemicroarrayGOexpressionphenotype microarrayinteractionGOlocalizationphenotypemicroarray

  • Scoring and score integrationn: number of predictorsLi: likelihood ratio of each predictorsum the logs of the Ls

  • lin-3let-23sem-5sos-1let-60lin-45mek-2mpk-1lip-1ksr-1gap-1v1.6v1.4 & v1.6

  • Testing let-60 ras InteractorsN2let-60(gf)let-60(gf);tax-6(RNAi) 87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) inductionnot Multivulvastrong Multivulvaweak Multivulva

  • let-60(gf) VPC InductionUnder Various RNAi12 hits (p 0.9Score < 0.6

  • let-60 ras interactors (suppressors)tax-6calcineurincsn-5COP-9 signalosomequa-1hedgehog-related proteinC01G8.9SWI/SNF-related (eyelid)C05D10.3ABC transporter (white)pfa-3profilinnhr-4transcription factor

  • C. elegans InteractionsInput 4,726 known interactions among 2,713 genesPredict additional 18,863 for total of 23,589 interactions among 4,408 genes

  • for Drosophila

  • D. melanogaster interactionsInput 4,180 known interactions among 1,262 genes,Predict 13,126 for 17,306 interactions among 6,044 genes

  • Automated, Quantitative PhenotypingChris Cronin: movement analysisBMC-Genetics 2005generative graphicslocomotionplate demographics (Weiwei Zhong)morphologysexual behaviorE. Fontaine, A. Whittaker, Joel Burdick

  • 21Predicting Gene Interactions from information available in public databasesPrioritizing high resolution genetic interaction tests by knowledge miningFull text information retrievalHans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos ChanWeiwei Zhong