amia tb-review-15

100
Translational Bioinformatics 2015: The Year in Review Russ B. Altman, MD, PhD Stanford University

Upload: russ-altman

Post on 21-Mar-2017

168 views

Category:

Science


0 download

TRANSCRIPT

Translational Bioinformatics 2015: The Year in Review

Russ B. Altman, MD, PhDStanford University

Disclosures

•Founder & Consultant, Personalis Inc (genome sequencing for clinical applications). Consultant, Pfizer (pharmaceuticals).

•Funding support: NIH, NSF, Pfizer, Oracle, Microsoft, Lightspeed Ventures, PARSA Foundation.

•I am a fan of informatics, genomics, medicine & clinical pharmacology.

Goals

•Provide an overview of the scientific trends and publications in translational bioinformatics

•Create a “snapshot” of what seems to be important in Spring, 2015 for the amusement of future generations.

•Marvel at the progress made and the opportunities ahead.

Process

1. Follow literature through the year

2. Solicit nominations from colleagues

3. Search key journals and key topics on PubMed

4. Evaluate & ponder

5. Select papers to highlight in ~1-3 slides

Caveats•Translational bioinformatics = informatics methods that link biological entities (genes, proteins, small molecules) to clinical entities (diseases, symptoms, drugs)--or vice versa.

•Considered last ~14 months

•Focused on human biology and clinical implications: molecules, clinical data, informatics.

•NOTE: Amazing biological papers with straightforward informatics generally not included.

•NOTE: Amazing informatics papers which don’t link clinical to molecular generally not included.

Final list

•215 Semi-Finalists, 101 Finalists

•22 Presented here + 29 “shout outs” = 51

•Apologies to those I misjudged. Mistakes are mine.

•7 TOPICS: TBI & Society, Variation Triage, Cancer, Clinical Genomics, Drugs, Systems & Networks, NLP Applications, Odds & Ends

•Slides and bibliography will be posted at rbaltman.wordpress.com

Thanks!Conversations and recommendations

Phil Bourne

Atul Butte

Andrea Califano

Josh Denny

Michel Dumontier

Peter Elkin

Emily Flynn

Lewis Frey

Mark Gerstein

George Hripcsak

John Hogenesch

Enoch Huang

Larry Hunter

Rachel Karchin

Natalia Khuri

Alan Laederach

Yong Li

Tianyun Liu

Yves Lussier

Hua Fan-Minogue

Lucila Ohno-Machado

Chirag Patel

Beth Percha

Raul Rabadan

Dan Roden

Neil Sarkar

Nigam Shah

Jost Stuart

Peter Tarczy-Hornoch

Nick Tatonetti

Jessie Tenenbaum

Olga Troyanskaya

Piet van der Graaf

Scott Waldman

Dennis Wall

Rong Xu

TBI & Society

“A new initiative on precision medicine.” (Collins & Varmus, NEJM)

• Goal: Define & advance precision medicine—treating disease considering individual variability.

• Method: Follow up on President Obama’s announcement in State of Union address.

• Result: Major funding effort by NIH focused on cancer initially, all diseases eventually. Create cohort of 1x 106 individuals to support this.

• Conclusion: Translational Informatics is central to discovery and implementation of precision medicine. This conference will grow.

25635347

“Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.” (Collins et al, Ann Intern Med)

• Goal: Improve the ways that prediction models (of all types) are reported in the literature.

• Method: Develop a set of recommendations for reporting studies that develop, validate, or update a prediction model—both diagnostic & prognostic

• Result: Checklist of 22 items, copublished in multiple journals.

• Conclusion: Prediction models will be key to precision medicine, and should be communicated clearly.

25560714

24487276

24487276

“Why we should care about what you get for ‘only $99’ from a personal genomic service.” (Murray, Ann Intern Med)

• Conclusion: DTC testing challenges traditional role of physicians

“Misinterpretation of TPMT by a DTC Genetic Testing Company” (Brownstein et al, Clin Pharm & Ther)

• Conclusion: Rare variants were misinterpreted and could have caused harm.

“Regulatory changes raise troubling questions for genomic testing.” (Evans et al, Genet Med)

• Conclusion: There are logical inconsistencies between CLIA and HIPPA 25255365

24514942

24714787

CLIA v. HIPAA…kind of like King Kong v. Godzilla…

or Ninja v Samurai…

CLIA v. HIPAA…kind of like King Kong v. Godzilla…

“Meaningful use of pharmacogenetics.” (Ratain & Johnson, Clin Pharm & Ther)

• Conclusion: PGx is ready for implementation

“Useless until proven effective: the clinical utility of preemptive pharmacogenetic testing.” (Janssens & Deverka, Clin Pharm & Ther)

• Conclusion: PGx is not ready for implementation

“Pharmacogenomic knowledge gaps and educational resource needs among physicians in selected specialties.” (Johansen Taber & Dickinson et al, PGx Pers Med)

• Conclusion: MDs are unsure how to use PGx.

25399712

25399713

25045280

King Kong favors PGx

Ninjas favor PGx…butEVERYONE needs more

educationabout the issue

Variation Triage (warning: huge topic this year)

“Guidelines for investigating causality of sequence variants in human disease.” (MacArthur et al, Nature)

• Goal: Clear guidelines for reporting disease-causing variants in genome

• Method: Discuss key challenges in establishing and documenting evidence of causality

• Result: Propose guidelines for summarizing confidence in variant pathogenicity.

• Conclusion: Harmonization of reporting expectations will assist in dissemination of good genetic annotation information.

24759409

24487276

24487276

“GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database” (Leslie et al, Bioinformatics)

• Goal: Create a publicly available database of GWAS results.

• Method: Annotation of 1390 GWAS studies with search + manual annotation.

• Result: > 6.2 Million SNPs associated with phenotypes.

• Conclusion: A useful resource for integration with other data sets in support of precision medicine.

24931982

“Effective diagnosis of genetic disease by computational phenotype analysis of the disease-associated genome.” (Zemojtel et al, Sci Trans Med)

• Goal: Create an automated platform for diagnosis of rare Mendelian disease, including enriched “disease-associated” NGS panel.

• Method: Assess semantic similarity of phenotype to known diseases, and assess variant pathogenicity.

• Result: Mean rank of 2.1 on 50 retrospective cases, and 2.4 on 11/40 prospective cases.

• Conclusion: Methods for automated diagnosis of novel genetic diseases are within reach.

25186178

“Disease Risk Factors Identified Through Shared Genetic Architecture and Electronic Medical Records ” (Li et al, Sci Trans Med)

• Goal: Evaluate relationships between risk factors and diseases based on shared genetic architecture.

• Method: Using statistical similarity measure between diseases and traits, found 120 similar pairs and evaluated EMR for 5 of them.

• Result: Several traits appear before their associated disease, offering a potential early warning system.

• Conclusion: Shared genetic architecture can provide early clues to disease risk and prognosis.

24786325

“A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases.” (Yang et al, PLoS ONE)

• Goal: Understand role of long noncoding RNAs in disease

• Method: Build lncRNA-gene network and use propagation algorithm to infer lncRNA-disease arcs.

• Result: 768 potential lncRNA-disease associations, with validation on known cases.

• Conclusion: lncRNA are important modulators of epigenetic and genetic signals relevant to disease.

24498199

\

“Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: implications for the return of incidental results.” (Tabor et al, A J Hum Genet)

Result: Risk alleles of potential utility for both Mendelian and complex are in every individual. Implications for return of results.

“Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits.” (Pickrell, A J Hum Genet)

Result: Assessed which (of 450) genetic/epigenetic features are most associated with GWAS hits.

Shout Outs for Variant Triage

25087612

24702953

“Adjusting for heritable covariates can bias effect estimates in genome-wide association studies.” (Aschard et al, A J Hum Genet)

“Meta-analysis of Correlated Traits via Summary Statistics from GWASs with an Application in Hypertension.” (Zhu et al, A J Hum Genet)

“Clinical phenotype-based gene prioritization: an initial study using semantic similarity and the human phenotype ontology” (Masino et al, BMC Bioinf)

Shout Outs for Variant Triage

25640676

25047600

25500260

“SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci.” (Slowikowski et al, Bioinformatics)

Result: Tool for assessing SNP-related gene enrichment in cell types, tissues, pathways.

Shout Outs for Variant Triage

24813542

Cancer

“Cancer etiology. Variation in cancer risk among tissues can be explained by the number of stem cell divisions.” (Tomasetti & Vogelstein, Science)

• Goal: Assess the contribution of cell division frequency to cancer risk

• Method: Assess for each tissue of origin the expected number of cell divisions

• Result: Risk of cancer is strongly associated with the normal number of divisions for self-renewal. Less than 1/3 due to inherited mutations/environment.

• Conclusion: Cancer is mostly due to bad luck.

25554788

“Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes.” (Leiserson et al, Nat Gen)

• Goal: Integrate many cancer data sets to find mutated subnetworks that recur

• Method: HotNet2 algorithm uses diffusion algorithm over protein-protein interaction network

• Result: 16 significantly mutated subnetworks mixing known and unknown pathways, including rare mutations.

• Conclusion: New diagnostic and therapeutic opportunities associated with accumulating mass of cancer genomic information. 25501392

24487276

24487276

“Genetic basis for clinical response to CTLA-4 blockade in melanoma.” (Snyder et al, NEJM)

• Goal: Understand basis for differential response to therapeutic immunomodulatory antibodies in melanoma

• Method: Sequence genome of responders/nonresponders. Analyze.

• Result: Creation of certain mutated versions of host proteins on cell surface (neo-antigens) correlates with efficacy.

• Conclusion: Immune attack may be mediated by neo-antigens (potentially similar previously presented antigens from infections) 25409260

25409260

25409260

“Predicting cancer-specific vulnerability via data-driven detection of synthetic lethality.” (Jerby-Arnon et al, Cell)

Result: Developed a pipeline to characterize synthetic lethal genes in cancer. Captures known partners and suggests new ones, particularly those that are gain-of-function and thus amenable to potential druggability.

Shout Outs for Cancer Genomics

25171417

Clinical Genomics

“Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak.” (Gire et al, Science)

• Goal: Understand genomes of Ebola outbreak in Africa

• Method: Sequence 99 Ebola genomes from 78 patients to 2000x coverage.

• Result: West African variant diverged from central in 2004, crossed 2014, no new sources. Lots of mutations for therapeutic opportunity.

• Conclusion: Genomics can be applied very rapidly and effectively in public health emergencies.

25214632

25214632

“Clinical Interpretation and Implications of Whole-Genome Sequencing” (Dewey et al, JAMA)

• Goal: Examine coverage of current NGS data on clinically relevant genome.

• Method: 12 individual genomes deeply analyzed manually.

• Result: 10-20% of key variants not interrogated adequately, 100 SNPs/genome took humans 54 hours to annotate, 2-6 disease causing variants per subject.

• Conclusion: Accuracy is still an issue, and manual annotation is still necessary for best genome interpretations. 24618965

24487276

24487276

“A probabilistic model to predict clinical phenotypic traits from genome sequencing.” (Chen et al, PLoS Comp Bio)

• Goal: Assess our ability to predict binary phenotypes from genome data.

• Method: Bayesian model based on Personal Genome Project data, applied to 146 phenotypes.

• Result: 16% of phenotypes robustly predictable, best performer in CAGI assessment.

• Conclusion: Although not diagnostic, we are starting to use genetics to adjust disease probabilities.

25188385

“Personalized pharmacogenomics profiling using whole-genome sequencing.” (Mizzi et al, Pharmacogenomics)

Result: Analyzed 482 genomes, found 1012 novel pharmacogene variations. Conclude: sequencing is necessary, genotyping not sufficient

“Missense variants in CFTR nucleotide-binding domains predict quantitative phenotypes associated with cystic fibrosis disease severity.” (Masica et al, Hum Mol Gen)

Result: Classifier can predict disease severity from genotypes in three prognostic classes.

Shout Outs for Genomic Applications

25141897

25489051

Drugs

“A community computational challenge to predict the activity of pairs of compounds.” (Bansal et al, Nat Biotech)

• Goal: Community assessment of ability to predict synergism/antagonism of cancer drugs.

• Method: Blinding prediction, based on individual drug response ‘omic profiles

• Result: 4/32 methods better than random. Best algorithm assumed serial drug use, modeled “residual” contribution. Ensemble of methods = best.

• Conclusion: Extrapolation is harder than interpolation. Encouraging results on hard problem.

25419740

“Systems pharmacology augments drug safety surveillance.” (Lorberbaum et al, Clin Pharm & Ther)

• Goal: Improve pharmacovigilance with integration of systems biology, chemical genomics data

• Method: Modular assembly of drug safety subnetworks (MADSS) algorithm.

• Result: Improved ability to predict drug associations to side effects for MI, GI, Liver, Kidney systems.

• Conclusion: System biology network inference can assist in prediction and understanding of side effects.

25670520

24487276

24487276

“Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality.” (Xu et al, JAMIA)

Result: EHR analysis suggests Metformin protective of cancer.

“3D Pharmacophoric Similarity improves Multi Adverse Drug Event Identification in Pharmacovigilance.” (Vilar et al, Sci Rep)

Result: Structural similarity of drugs allows them to ‘borrow” information for improved pharmacovigilance

Shout Outs for Drugs

25053577

25744369

Systems & Networks

“Human symptoms-disease network.” (Zhou et al, Nat Comms)

• Goal: Mine PubMED to build a symptom-based network of human diseases, relate to underlying molecular interactions.

• Method: Combine disease-symptom network with disease-gene network to evaluate overlap of both.

• Result: Symptom-based similarity correlates with shared genetic structure. Diversity of symptoms correlates to disease genetic complexity.

• Conclusion: Similarity of diseases, symptoms, genetic architecture are all highly linked and can lead to useful hypotheses about diagnosis & treatment.

24967666

“Obesity accelerates epigenetic aging of human liver.” (Horvath et al, PNAS)

• Goal: Understand the relationship between epigenetic and obesity.

• Method: Use novel epigenetic biomarker of aging (measure of DNA methylation) to associate BMI and ‘effective’ age.

• Result: Epigenetic age increases 3.3 yrs for each 10 BMI units. Not clearly reversible with weight loss. 279 genes under-expressed in old livers.

• Conclusion: Epigenetic changes associated with disease may be useful for understanding disease onset, natural history and comorbidities. 25313081

“A circadian gene expression atlas in mammals: implications for biology and medicine.” (Zhang et al, PNAS)

• Goal: Characterize the role of circadian clock in ‘mammal’ gene expression.

• Method: Measure tissue-specific gene expression over 24 hours.

• Result: 43% of proteins show circadian expression variation, often tissue specific. noncoding RNAs may be involved in control. Most drugs target genes that are rhythmic.

• Conclusion: The clock may have important implications for biological variability and drug 25349387

“Robust clinical outcome prediction based on Bayesian analysis of transcriptional profiles and prior causal networks” (Zarringhalam et al, Bioinformatics)

Result: Generate differential expression profile for individual patients, and infer specific regulation model.

“A multiscale statistical mechanical framework integrates biophysical and genomic data to assemble cancer networks.” (AlQuraishi et al, Nat Gen)

Result: Combine genomic, structural, biochemical data to infer detailed impact of mutations in proteins involved in cancer signaling networks.

“Cross-species regulatory network analysis identifies a synergistic interaction between FOXM1 and CENPF that drives prostate cancer malignancy.” (Aytes et al, Cancer Cell)

Result: Compare regulatory networks for mouse/human to find conserved master regulators promoting tumor growth.

22995991

25362484

Shout Outs for Systems & Networks

24823640

NLP Applications

“dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text” (Xu et al, BMC Bioinformatics)

• Goal: Systematically (re)characterize phenotype relationships among diseases.

• Method: Use text mining to extract disease risk pairs, analyzed correlations with underlying genetics.

• Result: 34,448 unique pairs among 12,981 diseases.

• Conclusion:

24725842

“Integrated text mining and chemoinformatics analysis associates diet to health benefit at molecular level” (Jensen et al, PLoS Comp Bio)

• Goal: Assemble knowledge of food-phytochemical and food-disease associations.

• Method: Data mining of text, classification to assemble associations, including both positive/negative associations.

• Result: 20,654 phytochemicals associated to 1,592 human disease phenotypes

• Conclusion: Systematic approach to nutrition can be incorporated into precision medicine

24453957

24487276

24487276

24487276

“NCBI disease corpus: a resource for disease name recognition and concept normalization” (Dogan et al, J Biomed Inf)

Result: Fully and carefully annotated corpus of 793 papers.

“Literome: PubMed-scale genomic knowledge base in the cloud” (Poon et al, Bioinformatics)

“A literature search tool for intelligent extraction of disease-associated genes” (Jung et al, JAMIA)

Results: Publicly available gene-gene & gene-phenotype interactions mined from PubMED.

Shout Outs for NLP Applications

24393765

24939151

23999671

Odds & Ends

“Modeling 3D facial shape from DNA.” (Claes et al, PLoS Genet)

• Goal: Assess the impact of genetic variations on facial shape.

• Method: Parameterize “face space” and associate features with Ancestry Informative Markers

• Result: 20 genes show significant effects on facial features.

• Conclusion: These allow approximation of appearance based on SNPs in genome.

24651127

24487276

24487276

SLC35D1 SNPs —extreme

effect

LRP6 SNPs

FGFR1 SNPs

SLC35D1 SNPs —extreme

effect

LRP6 SNPs

FGFR1 SNPs

“Proteomics. Tissue-based map of the human proteome.” (Uhlen et al, Science)

• Goal: Survey human proteome variation in human tissues.

• Method: Quantitative transcriptomics + immunohistochemistry for localization in 32 tissues.

• Result: Detected > 90% of putative protein coding genes. Characterized secretome, membraneome, druggome.

• Conclusion: Major resource for integrative analysis of human biology.

25613900

“A field guide to genomics research.” (Bild et al, PLoS Biol)

• Goal: Characterize common pitfalls in genomics research

• Method: Reflection on personality types

• Result: 6 genome researcher phenotypes…

• Conclusion: You can figure out which phenotype you match, and you don’t need your SNPs…

24409093

24409093

1. Farmer—storehouse of data, tools—no design2. Gold Miner—keeps digging until finds something

significant3. Cowboy—wrangles data without analyzing it properly4. Hermit—always isolates themselves, no collaboration5. Master(with Servant)—unreasonable expectations

about time and complexity of appropriate analysis6. Jailer—keeps own data locked up, never shares

“Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients.” (Jensen et al, Natt Comms)

“The top 100 papers.” (Van Noorden, Nature)

“Bibliometrics: Is your most cited work your best?” (Ioannidis et al, Nature)

“Humans can discriminate more than 1 trillion olfactory stimuli.” (Bushdid et al, Science)

“Fossilized nuclei and chromosomes reveal 180 million years of genomic stasis in royal ferns.” (Bomfleur, Science)

Shout Outs for Odds & Ends

24959948

25355343

24653037

25355346

24653035

2014 Crystal ball... Emphasis on non European-descent populations for

discovery of disease associations

Crowd-based discovery in translational bioinformatics

Methods to recommend treatment for cancer based on genome/transcriptome

Increase in “trained systems” (ala Watson) applications in translational bioinformatics

Repurposing with combinations of drugs (vs. one)

More cost-effectiveness evidence for genomics

Linking essential genes, drug targets, and drug response

2015 Crystal ball... Increase in “trained systems” (a la IBM’s Watson) applications in translational bioinformatics

Increased attention to genetic x environment analyses

Mega-cohort studies start to report out findings

Immuno-informatics and systems immunology explode

Increased integration of EMR, genomics, imaging

The term “precision medicine” will be mentioned more frequently in PubMED abstracts.

IF invited, I will give this talk in person.