amia tb-review-10

Post on 21-Mar-2017

149 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Translational Bioinformatics 2010: The Year in Review

Russ B. Altman, MD, PhDStanford University

Goals

• Provide an overview of the major scientific events, trends and publications in translational bioinformatics

• Create a “snapshot” of what seems to be important in March, 2010 for the amusement of future generations.

• Marvel at the progress made and the opportunities ahead.

Process

1. Think about what has had early impact

2. Think about sources to trust

3. Solicit advice from colleagues

4. Surf online resources

5. Select papers to highlight in ~2 slides and some to highlight in < 1 slide.

Caveats

• Considered 2009 to present

• Focused on human biology and clinical implications: molecules, clinical data, informatics.

• Considered both data sources and informatics methods (and combination)

• Tried to avoid simply following crowd mentality.

Final list

• ~70 semi-finalist papers

• 24 presented here (briefly!)

• This talk and semi-finalist bibliography will be made available on the conference website.

Thanks!• George Hripcsak

• Brian Athey

• Peter Tarczy-Hornoch

• Alain Laederach

• Soumya Raychaudhuri

• Yves Lussier

• Dan Masys

• Emidio Capriotti

• Andrea Califano

• Liping Wei

• Atul Butte

• Nick Tatonetti

• Joel Dudley

• Gill Omenn

Public Health Translational Informatics

“Geographic dependence, surveillance, and origins of the 2009 Influenza A (H1N1) Virus” (Trifonov et al, NEJM)

• Goal: understand the origin and recent history of new strains from viral DNA sequences.

• Method: Sequence analysis and comparison of eight key influenza genes in current and historical samples.

• Result: Evolutionary map of recombination events leading to current H1N1 variant.

• Conclusion: Aggressive sampling of multiple species may allow us to anticipate novel flu in the future.

Whole or Mostly Whole Genome Sequencing

“Exome sequencing identifies the cause of a mendelian disorder” (Ng et al, Nat. Gen.)

• Goal: find the cause of Miller syndrome.

• Miller syndrome = facial and limb anomalies.

• Method: exon-only sequencing of 4 affected individuals in three kindreds.

• Result: DHODH gene (enzyme for pyrimidine synthesis) mutations in these and 3 other families.

Miller Syndrome

Mutations in DHODH

“Analysis of genetic inheritance in a family quartet by whole-genome sequencing” (Roach et al, Science Express)

• Goal: understand relationship between rare disease and corresponding genetic changeas.

• Miller syndrome & cilia dyskinesia = both recessive.

• Method: whole genome sequencing of parents and 2 affected sibs.

• Result: 4 genes identified with SNPs explaining pattern of inheritence (CES1, DHODH, DNAH5, KIAA056)

Recombination landscape defined

“Whole-genome sequencing in a patient with Charcot-Marie-Tooth Neuropathy” (Lupski et al, NEJM)

• Goal: understand relationship between rare disease and corresponding genetic changes.

• CMT neuropathy = recessive, demyelinating disease.

• Method: whole genome sequencing of (big!) family (parents, 4 affected sibs, 4 unaffected sibs). Negative for previous CMT common screens.

• Result: causative alleles in gene SH3TC2, het

Y169H & R954X alleles in affected

Genetic associations and mechanisms (!)

“Autoimmune disease classification by inverse association with SNP alleles” (Sirota et al, PLoS Genetics)

• Goal: Compare genetic variation profiles across six autoimmune diseases.

• MS, AS, ATD, RA, CD, T1D + 5 non-autoimmne

• Method: Cluster diseases based on allele occurrences from GWAS studies.

• Result: RS/AS cluster separates from MS/ATD cluster with someone “opposite” allele profile. May yield information about disease-specific differences.

Y169H & R954X alleles in affected

“Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions” (Raychaudhuri et al, PLoS Genetics)

• Goal: Map associations to potential mechanisms using literature mining.

• Method: Test associated disease regions with medical literature, looking for connectionss = pathways

• Result: Able to filter candidate mutations in Crohn’s disease and schizophrenia, and map them to subset of mutations for which there is a biological pathway related to the disease.

9 rare causative variants create signal in GWAS

Results for Crohn’s & Schizophrenia

“Rare variants create synthetic genome-wide associations” (Dickson et al, PLoS Biology)

• Goal: understand the impact of rare variants on common SNP association studies.

• Method: Simulation of effect of LD between rare SNPs and common ones

• Result: Correlations are not possible but inevitable, so GWAS may work for wrong reason. F/U sequencing is key.

• Many positive GWAS studies, especially with differential results in geographically disperse populations, may be affected by this phenomenon.

9 rare causative variants create signal in GWAS

“In Silico functional profiling of human disease-associated and polymorphic amino acid substitutions” (Mort et al, Human Mutation)

• Goal: Understand how variation in proteins leads to complex disease phenotypes.

• Method: Compare amino acid substitutions associated with disease and neutral, looking for differences in protein chemical features.

• Results: Associated UMLS disease areas with different sets of predictive protein features

• Conclusion: The types of proteins used in different disease areas are sensitive to different types of mutations.

Network biomedicine

“Exploring the human genome with functional maps” (Huttenhower et al, Genome Research)

• Goal: Systems-level understanding of genetic contributions to human phenotypes.

• Method: Bayesian integration of 30K experiments on 25K genes. Creation of data-driven functional maps weighted by reliability for individual functional categories.

• Result: 200 context-specific interaction networks. Experimentally validated 5 novel predictions for genes involved in macroautophagy.

5 query genes + 1 context

“Genome-wide identification of post-translational modulators of transcription factor activity in human B cells” (Wang et al, Nat. Biotech.)

• Goal: Understand TF regulation via proteins.

• Method: Mutual information analysis to identify protein modulators of TF function on chosen targets.

• Result: Able to detect molecules that transduce signal from TF to target either as positive modulator (create correlation) or negative modulator (destroy correlation). Successful application to MYC to find ~50 significant modulators, experimentally verified.

“MiR-204 suppresses tumor invasion by regulating networks of cell adhesion and extracellular matrix remodeling ” (Lee et al, PLoS Comp. Bio, in press)

• Goal: Identify microRNA regulators of cancer and opportunities for new therapies

• Method: Integrate expression, genetics, and cancer molecular phenotypes.

• Result: 18 validated targets of miR-204, experimental evidence showing that miRNA-204 replacement reduces tumor aggressiveness.

• Conclusion: Integrated analysis of miRNA with experimental validation yields new cancer leads.

Drugs and Genes and their relationships

“Drug discovery using chemical systems biology: repositioning the safe medicine comtan to treat multidrug and extensively drug resistant Tuberculosis” (Kinnings et al, PLoS Comp. Bio)

• Goal: Identify off-targets of major pharmaceuticals to find new uses for old drugs.

• Method: Use protein structure to characterize binding site of drug, and then look for cryptic similar sites in other proteins, including TB proteome.

• Result: Comtan (for Parkinson’s) binds InhA in TB, & inhibits TB growth--they also found evidence that Parkinson’s patients improve with TB treatment!

“Generating genome-scale candidate gene lists for pharmacogenomics” (Hansen et al, Clin. Pharm. & Ther.)

• Goal: Identify genes likely to modulate drug response.

• Method: Associate drugs with network representation of genetic interactions, rank genes based on likelihood of interacting with drugs.

• Result: AUC of 82% on independent test set. Novel gene candidates for warfarin, gefitinib, carboplatin and gemcitabine.

“Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets” (Suthram et al, PLoS Comp. Bio.)

• Goal: Create molecular relationships between diseases, use this to find new drug opportunities.

• Method: Define 4600 co-expressed functional modules, and cluster diseases using these.

• Result: A novel disease clustering, and functional modules including known drug targets that participate in many diseases.

“Predicting new molecular targets for known drugs” (Keiser et al, Nature)

• Goal: Find new uses for old drugs

• Method: Represent drug targets by the company they keep: the drugs that bind them. Compare the list of drugs for similarity. Targets with similar lists may have cross-reactivity. Find drugs that are most similar with a new list. Careful statistics.

• Result: An off-target network that relates drugs to new targets. 5 potent new associations, e.g. Prozac as beta-blocker, Vadilex as serotonin blocker.

Infrastructure for translational

bioinformatics

“Ontology-driven indexing of public datasets for translational bioinformatics” (Shah et al, BMC Bioinf.)

• Goal: develop infrastructure for applying controlled descriptors to datasets.

• Method: Annotate and index multiple biomedical data resources with UMLS concepts, create index, and federate these together.

• Result: Integration of multiple data sources with controlled vocabulary allowing powerful searches across data sets.

“A recent advance in the automatic indexing of the biomedical literature” (Neveol et al, J. Biomed. Info.)

• Goal: Move towards automated indexing of Medline articles

• Method: Combine methods of NLP & machine learning to assign heading/subheading pairs.

• Results: Best combination 48% precision, 30% recall. Integrated into MTI tool for NLM curators.

“Cloud computing: a new business paradigm for biomedical informatics” (Rosenthal et al, J. Biomed. Inf.)

• Goal: Examine fit of BMI to cloud computing.

• Method: Focus on specific component technologies used by the field in different types of tasks.

• Result: Clouds require careful analysis and attention to the migration path from current infrastructure to future.

“Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery” (Barnes et al, Nat. Rev. Drug. Disc.)

• There are substantial challenges facing pharmaceutical industry (failed new drugs, slow pipeline).

• Opportunity for pre-competitive collaboration and engagement with public domain.

• Propose new areas for collaboration, and highlight cultural shifts that will be needed.

PROPOSED INITIATIVES

• Disease knowledge: Curating gene-disease associations, shared pathways, imaging repositories

• Target pharmacology: redefine druggability, catalog of targets/phenotypes, share data on known molecules

• Drug safety: adverse event signatures, Pgx data (!), ADME models

• Knowledge management: literature mining, patent mining, data standards

• Pharmaceutical infrastructure: gene indices/nomenclature, robust web service standards, data storage cooperatives.

Warnings and Causes for Hope

“An agenda for personalized medicine” (Ng et al, Nature)

• Goal: Compare direct-to-consumer (DTC) services.

• Method: Compare analyses from two DTC companies for 13 diseases on 5 individuals.

• Result: Raw data very accurate. Interpretation vary significantly. For 7 diseases, 50% or less of predictions agree.

• Conclusion: Focus on high risk, strong effect, direct measures. Focus on PGx. Monitor outcomes.

“Back to the future: why randomized controlled trials cannot be the answer to pharmacogenomics and personalized medicine” (Frueh, Pharmacogenomics)

• Question: RCTs are the gold standard, shouldn’t they be required for personalized medicine interventions?

• Answer: No. Not based on “averages” (by definition), better to use case-control, retrospective and other mechanisms.

• Conclusion: Insistence on RCT level evidence will unnecessarily hinder the roll out of personalized medicine.

“Computing has changed biology--biology education must catch up” (Pevzner et al, Science)

• Education Forum piece

• Computation is now essential to biology

• Undergraduate biology education has not changed

• New course proposed for all biology undergrads: “Algorithmic, mathematical, and statistical concepts in Biology”

“Distilling free-form natural laws from experimental data” (Schmidt & Lipson, Science)

• Goal: Define algorithmically what makes a correlation in observed data important and insightful.

• Method: Propose a principle for identifying nontriviality: candidate equations should predict connections between dynamics of subcomponents of the system.

• Result: Example in undergraduate physics, recovered well-known physical laws (Hamiltonian, Lagrange, Equation of Motion)

“A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data” (Albers & Hripcsak, Physics Letters A)

• Goal: Apply statistical physics and information theory to clinical chemistry measurements.

• Method: 2.5 million data points over 20 years, look at time delay mutual information. Focus on creatinine.

• Result: Creatinine is initially measured twice a day at Columbia, and then every morning. Yesterday’s measurement predicts today’s.

• Conclusion: Sophisticated dynamic modeling methods (that physicists use )are applicable to biological systems.

2008 Crystal ball...Sequencing makes a comeback (watch out microarrays....)

Translational science projects will create astounding data sets (hopefully available) to catalyze research

GWAS will continue to proliferate

Consumer-oriented genetics will create demand for online resources for interpretation

Difficult decisions about when/how to bring new molecular diagnostics to practice.

2008 Crystal ball...Sequencing makes a comeback (watch out microarrays....)

Translational science projects will create astounding data sets (hopefully available) to catalyze research

GWAS will continue to proliferate

Consumer-oriented genetics will create demand for online resources for interpretation

Difficult decisions about when/how to bring new molecular diagnostics to practice.

2009 Crystal ball...

Focus on mechanism in interpreting genetic associations

More sophisticated mechanisms to find signal in GWAS, including data integration

Cellular dynamics of expression, metabolites, proteins

Multiple human & cancer genome sequences

Consumer sequencing (vs. genotyping)

2009 Crystal ball...

Focus on mechanism in interpreting genetic associations

More sophisticated mechanisms to find signal in GWAS, including data integration

Cellular dynamics of expression, metabolites, proteins

Multiple human & cancer genome sequences

Consumer sequencing (vs. genotyping)

2010 Crystal ball... Clinical records will be linked to genomics to make discoveries.

More emphasis on drugs and ancestry in DTC companies

Whole genome sequencing for a cohort with a common disease (cancer already here?)

Consumer sequencing (vs. genotyping)

Semantics in literature mining for knowledge discovery

Cloud computing will contribute to one biomedical discovery.

Thanks.See you in 2011!

russ.altman@stanford.edu

top related