alexander bisignano, recombine // the path to personalized medicine
TRANSCRIPT
Alexander Bisignano Co-Founder & CEO, Recombine [email protected] @alxbz
2/17/15 Copyright Recombine 2015 1
Data Science & Genomics
The Path to Personalized Medicine
Historical Approaches to Generating & Analyzing Genetic Data
Genetic Data
2/17/15 Copyright Recombine 2015 2
THE HUMAN GENOME A LITTLE BIT ABOUT
2/17/15 Copyright Recombine 2015 5
GENETICS 101 § 23 Pairs of Chromosomes § 2 Sex Chromosomes (X/Y) § Chromosomes Made of Nucleic Acids § 4 Nucleic Acids in DNA: A, T, C, & G
§ Adenine § Cytosine § Guanine § Thymine
§ Chromosomes contain Genes § Genes are instructions for Proteins § ~3 Billion Bases
WHAT DO WE UNDERSTAND? GENETIC DISEASE
2/17/15 Copyright Recombine 2015 6
ANEUPLOIDY – CHROMOSOMAL ABNORMALITIES § Ex. Trisomy 21: Down Syndrome § Errors made during Meiosis – Formation of Sperm or Eggs § INCIDENCE: 1/500 Live Births
WHAT DO WE UNDERSTAND? GENETIC DISEASE
2/17/15 Copyright Recombine 2015 8
SINGLE GENE DISORDERS– INHERITED DISEASES § Ex. Cystic Fibrosis: Broken CFTR Gene § Carriers of 1 Broken Copy are not affected; 2 Broken Copies are affected § INCIDENCE: 1/300 Live Births
Decreasing Costs & Increasing Information
The Genomic Revolution
2/17/15 Copyright Recombine 2015 10
EXPONENTIALLY MORE DATA DECREASING COSTS LEAD TO
2/17/15 Copyright Recombine 2015 12
1 �25 �
625 �15625 �
390625 �9765625�
244140625�6.104E+09 �
FISH� aCGH� SNPs � EXOME � WGS�
15�Minutes �
3.47�Days�
208.33�Days�
117.96�Years �
5.85�Millenia �
*If each data point takes ~1 minute to analyze, how long will a single sample take?�
DATA CHALLENGES IN GENOMICS DEALING WITH ALL THE DATA
2/17/15 Copyright Recombine 2015 13
DATA STORAGE § While great database systems exist, standard data storage remains problem § ‘Laboratory’ diagnostic companies do not employee CS or Data Engineers
DATA INTEGRITY § There are >20 major databases with Genomic Annotation § Database differences are ubiquitous (mutation names, conventions, etc.)
DATA SHARING IS STILL CHALLENGING § Many Universities & Companies concerned over ‘forfeiting IP’ § Platforms for sharing are still very ‘young’ § Data quality varies enormously
How do we model, predict & clinically act upon complex disease?
Complex Genetic Diseases
2/17/15 Copyright Recombine 2015 14
MODELING COMPLEX DISEASE WILL THE SINGLE-GENE PARADIGM HOLD?
HOW DO MANY GENES INTERACT? § Complex Signal-Transduction pathways § How do they affect a single outcome § Can be influenced by environmental factors § Non-100% inheritance pattern
2/17/15 Copyright Recombine 2015 15
COMPLEX GENETIC TRAITS ANALYSIS NO LONGER POSSIBLE BY HUMANS ALONE
2/17/15 Copyright Recombine 2015 16
Gene Variants Progressive Combina2ons
FSH 6 6
PKA 4 24
GAB2 5 120
R2C2 7 840
IRS1 4 3360
AKT 3 10080
The FSH signal transduction pathway in in granulosa cells leads to follicle recruitment.
SAMPLE APPLICATIONS § Predicting patient response to drugs § Predicting patient disease § Forecasting treatment success rate § Predicting optimal treatment pathway
COMPLEX GENETIC TRAITS COMMUNICATING OUTCOME & MAKING MEDICAL DECISIONS
2/17/15 Copyright Recombine 2015 17
AVERAGE POPULATION RISK VS. PERSONAL RISK § Ex. Breast Cancer: Average v. BRCA1 Mutation Carrier
AVERAGE PATIENT 12.5% LIFETIME RISK
BRCA1 CARRIER 60% LIFETIME RISK
STEPS TOWARDS PERSONALIZATION OUTLIERS STILL INFORM THE MEAN
2/17/15 Copyright Recombine 2015 18
• Ex. Single-Gene, Population Outliers
Step 1: Identify & Predict Extreme Cases
• Ex. Multi-Gene, Population Segments
Step 2: Segment the Slightly More
Complex • Ex. Multi-Gene, Personal Predictions
Step 3: Predict & Personalize
• Ex. Single-Gene, Population Outliers
Step 1: Identify & Predict Extreme Cases
• Ex. Multi-Gene, Population Segments
Step 2: Segment the Slightly More Complex
Num
ber o
f Ind
ivid
uals
An Unexpected Challenge due to the Unknown Unknowns of Genomics
Automation of Data Analysis
2/17/15 Copyright Recombine 2015 19
TEST 1000s OF MUTATIONS AT ONCE DNA MICROARRAYS
2/17/15 Copyright Recombine 2013 20
…TAACTGCTATTTTCGTACCA…
Hybridized Genomic DNA
ATTGACGATAAAAGCATGG_
SyntheJc Probe
T
Single Base Extension
Coupled Light ReacJon
CAN DIFFERENTIATE 2 NUCLEIC ACIDS SAMPLE MUTATION CLUSTER
2/17/15 Copyright Recombine 2015 21
Num
ber o
f Ind
ivid
uals
Wavelength of Light
AA AC CC
ONLY 2 LIGHT CHANNELS DNA MICROARRAYS
2/17/15 Copyright Recombine 2013 22
2 LIGHT CHANNELS, 4 NUCLEIC ACIDS. HUH? § With only 2 light channels how do we differentiate all 4 nucleic acids?
§ Paradigm 1: § Only test for highly conserved, bi-allelic Single Nucleotide
Polymorphisms (SNPs) (Bi-allelic SNPs thought only to differ between two of the four amino acids)
§ i.e. There is an ASSUMPTION that targeted mutations only differ between 2 of the 4 nucleic acids
But…
CANAVAN DISEASE MUTATION REAL LIFE EXAMPLE
2/17/15 Copyright Recombine 2015 23
A single mutation breaks the ASPA Gene § Mutation known as p.Y231X (c.693C>A) § Causes a premature Termination Codon (aka STOP) § Assumption of allele frequencies based on known literature:
§ Reality of allele frequencies in CERTAIN POPULATIONS:
C (>99%)
A (<1%)
C (~72%)
A (<1%)
G (~27%)
WHAT ABOUT POLYMORPHISMS? DNA MICROARRAYS
2/17/15 Copyright Recombine 2015 24
Num
ber o
f Ind
ivid
uals
Wavelength of Light
AA
AG AC
GG CC
OVERLAP PRIMERS ACCOUNTING FOR POLYMORPHISMS
2/17/15 Copyright Recombine 2015 25
ATCGCCTATAGACCCA_ 5’ ATCGTAGCGGATATCTGGGTCAAATCATGATCAAGGATA 3’ 3’ TAGCATCGCCTATAGACCCAGTTTAGTACTAGTTCCTAT 5’
_CAAATCATGATCAAGG
SOME PROBES ARE DESIGNED TO FAIL § If an unplanned polymorphism exists, certain primers of ours simply will
not show any result.
_TAAATCATGATCAAGG
PLAN FOR ALL POSSIBILITIES ASSUMPTIONS BASED ON INCOMPLETE DATA
2/17/15 Copyright Recombine 2015 26
Takeaways § Assumptions about the genome are often being disproved § Even great genomic technologies need sanity checking
Alexander Bisignano Co-Founder & CEO, Recombine [email protected] @alxbz
2/17/15 Copyright Recombine 2015 27
855-‐OUR-‐GENES – [email protected] – www.recombine.com