seminar towards the precision medicine era: computationalrshamir/seminar/16/pm-intro.pdf · lecture...
TRANSCRIPT
![Page 1: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/1.jpg)
Towards the Precision
Medicine Era: Computational challenges
Ron Shamir, CS, TAU
Fall 2016 seminar
http://www.cs.tau.ac.il/~rshamir/seminar/16/precmedsem16.html
![Page 2: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/2.jpg)
Lecture 1 Outline • A little bit of biology • Gene expression • Protein-protein networks • Protein-DNA networks • Functional enrichment • About the seminar • Your opportunity to ask lots of
questions!!!
2
![Page 5: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/5.jpg)
Gregor Mendel laws of inheritance,“gene” 1866
Watson and Crick DNA Discovery 1953
Genome Project 2003
5
![Page 6: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/6.jpg)
DNA and Chromosomes •DNA: 4 bases molecule: ACGT
•Chromosome: contiguous stretch of DNA
•Genome: totality of DNA material
6
![Page 8: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/8.jpg)
Genes: Recipes for Proteins • Gene: a DNA
segment that specifies the sequence of a protein.
• RNA: copy of DNA of a gene; “manufacturer instructions” for a protein
.html1/p1/Page1http://morgan.rutgers.edu/MorganWebFrames/Level 8
![Page 9: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/9.jpg)
DNA RNA protein
transcription translation
The hard disk
One program
Its output
9
![Page 10: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/10.jpg)
© Ron Shamir
The busy chef
• The profile of the cell: which genes are expressed as mRNAs and at what quantities.
10
20,000 recipes 10,000 dishes, in different quantities
Cooking 10,000 dishes
DNA RNA protein
![Page 12: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/12.jpg)
Gregor Mendel laws of inheritance,“gene” 1866
Watson and Crick DNA Discovery 1953
Genome Project 2003
One of many computational challenges in the Human Genome project:
Assemble a puzzle of 27 million pieces
12
![Page 13: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/13.jpg)
Complexity • ~3,000,000,000 letters in the genome • 2,278,100 letters in the Bible • => one genome = a stack of ~ 1,000 Bibles
• ~20,000 genes in the genome • Hard to identify • Harder to figure their function • Even harder to figure how they work together
13
![Page 14: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/14.jpg)
Enter Bioinformatics • The marriage of CS and Biology • Responds to the explosion of biological data,
and builds on the IT revolution
14
![Page 15: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/15.jpg)
September 15 2016: 220,731,315,250
bases
Biology is becoming an information science 15
![Page 17: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/17.jpg)
• Find out the function of genes/proteins • Understand gene regulation • Figure out how genes, proteins interact:
Gene networks, development, … • Understand human DNA variations • Figure out the medical implications of all
the above • Research driven by new genome-wide high
throughput technologies • Key computational challenge: integration
Now that we know the human genome sequence, what’s next?
17
![Page 18: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/18.jpg)
DNA chips / Microarrays • Simultaneous measurement
of expression levels of all genes.
• Perform 105-106 measurements in one experiment
• Allow global view of cellular processes.
18
Measured now primarily by deep sequencing (NGS) Up to 1010 bases in one experiment
![Page 19: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/19.jpg)
The Raw Data
gene
s Expression levels,
“Raw Data”
experiments Entries of the Raw Data matrix: Ratios/absolute values/…
• expression pattern for each gene • Profile for each experiment /condition/sample/chip
Needs normalization!
19
![Page 20: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/20.jpg)
GEO
20
Nearly 2 million expression profiles All publicly available, well organized A vast, underutilized resource. © Ron Shamir
![Page 21: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/21.jpg)
Protein interaction networks
21 © Ron Shamir
![Page 22: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/22.jpg)
Protein-protein interactions (PPIs)
• Low throughput measurements: accurate, scarce
• High throughput: more abundant, noisy • Large, readily available resource
© Ron Shamir 22
![Page 25: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/25.jpg)
Regulation of Transcription
• A gene’s ranscription regulation is mainly encoded in the DNA in a region called the promoter
• Each promoter contains several short DNA subsequences, called binding sites (BSs) that are bound by specific proteins called transcription factors (TFs)
© Ron Shamir 25
TF TF Gene 5’ 3’
BS BS
![Page 26: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/26.jpg)
Regulation of Transcription (II)
• By binding to a gene’s promoter, TFs promote or repress the recruitment of the transcription machinery
• The conditions that govern a gene’s transcription are determined by the specific combination of BSs in its promoter
© Ron Shamir 26
Gene 1
Gene 2
![Page 27: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/27.jpg)
Modeling TF binding sites: Position Weight Matrix (PWM)
0 0.2 0.7 0 0.8 0.1 A
0.6 0.4 0.1 0.5 0.1 0 C
0.1 0.4 0.1 0.5 0 0 G
0.3 0 0.1 0 0.1 0.9 T
© Ron Shamir 27
ATGCAGGATACACCGATCGGTA 0.0605 GGAGTAGAGCAAGTCCCGTGA 0.0605 AAGACTCTACAATTATGGCGT 0.0151
Score: product of base probabilities. Need to set score threshold for hits.
![Page 28: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/28.jpg)
Protein-DNA interactions
• Can be predicted using PWMs (look for hits in the promoters)
• Can be measured experimentally (ChIP-chip, ChIP-seq, PBM,…)
• The result in all cases: for each TF – a list of gene targets
• Presentable as a network • We often combine the PPI
and the PDI networks © Ron Shamir
28
![Page 30: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/30.jpg)
Goal
• Challenge: Detect active functional modules: connected subnetwork of proteins whose genes are co-expressed
• “Where is the action in the network in a particular experiment?”
© Ron Shamir 31
![Page 34: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/34.jpg)
What is the Gene Ontology?
• Set of biological phrases (terms) which are applied to genes: – protein kinase – apoptosis – membrane
24th Feb 2006 Jane Lomax
![Page 35: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/35.jpg)
What is the Gene Ontology?
• Genes are linked, or associated, with GO terms by trained curators at genome databases – known as ‘gene associations’ or GO
annotations
• Allows biologists to make inferences across large numbers of genes without researching each one individually
![Page 36: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/36.jpg)
GO structure
gene A
![Page 37: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/37.jpg)
Clark et al., 2005
part_of
is_a
![Page 38: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/38.jpg)
Clark et al., 2005
part_of
is_a
![Page 39: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/39.jpg)
Reminder: Hypergeometric score • Urn with N balls of which
m are red. • Draw n balls at random
w/o replacement • X = no. of red balls drawn
−−
==
nN
knmN
km
kXP )(
'( , , , ) ( ')
k kHG N m n k P X k
≥
= =∑
P-value for the chance that draw is random measures
enrichment © Ron Shamir 40
![Page 40: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/40.jpg)
GO Enrichment
• I have a set of genes/proteins. Is it enriched for a particular function?
• One function: use Hypergeometric p-val • Testing all function: use HG but correct
for multiple testing (Bonferroni/FDR)
© Ron Shamir 41
![Page 41: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/41.jpg)
The seminar
42
![Page 42: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/42.jpg)
Guidelines
• You will need to dig deeply for the methods: supplements (on journal websites), previous papers,..
• See seminar website for resources • (re)start with the basics: definitions,
examples • Papers contain more than you can cover: Select your presentation focus wisely
© Ron Shamir 43
![Page 43: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/43.jpg)
Guidelines (2)
• Provide intuition and examples to motivate your method
• Add something original that you thought of (and don’t hide that!)
• Focus more on the algorithms than on the results (rule of thumb: 60-40)
© Ron Shamir 44
![Page 44: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/44.jpg)
Planning your presentation • Start: 3:10, Break 4-4:10, Talk End: 4:40,
followed by 5 min for questions, then open discussion
• Use mostly slides, and the board sparingly • Rehearse your talk! • Make contingencies in case you’re out of time • In the end, summarize the paper, repeating
the main results. Discuss strengths, weaknesses, steps ahead.
© Ron Shamir 45
![Page 45: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/45.jpg)
The questionnaire
• Prepare a short (4-5 item) questionnaire on the paper
• Level should basic, but require reading the paper
• Distribute it to students after the seminar
• Students will bring in their answers next week, and you will grade them.
© Ron Shamir 46
![Page 46: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/46.jpg)
:קביעת הציון הסופי
35%: הבנת החומר• 35%: הצגת החומר• 10%: בחירה טובה איזה חומר להציג•): שיחות ודפי שאלות(השתתפות פעילה בסמינר •
20% 10%: בונוס על מקוריות• !!. 10%-: חריגה מהזמן•
© Ron Shamir 47
![Page 47: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/47.jpg)
Lecture 2 - Outline
• Precision medicine • One story • Your opportunity to ask lots of
questions!!!
48
![Page 48: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/48.jpg)
Precision medicine
![Page 49: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/49.jpg)
Precision medicine • Precisely tailoring therapies to subcategories
of disease, often defined by genomics • Unlike “personalized medicine”, avoids the
(mis)interpretation of per-patient drug development
• Medicine has always been personalized – the difference is new biomedical technologies
The Precision Medicine Initiative: A New National Effort Euan A. Ashley, JAMA. 2015;313(21):2119-2120. doi:10.1001/jama.2015.3595.
![Page 50: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/50.jpg)
Problems with current medicine • Even for successful drugs, effect may be
achieved by a minority of the cohort • High NNT: ave number of patients needed to
treat to help one patient (often >10 in drug; >50 in prevention)
![Page 51: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/51.jpg)
© Ron Shamir 52
![Page 52: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/52.jpg)
© Ron Shamir 53
![Page 53: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/53.jpg)
© Ron Shamir 54
![Page 54: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/54.jpg)
PM and Genetic disease • Cystic Fibrosis: mutated
chloride channel. Ivacaftor drug helps in case the channel reaches the cell surface. The subclass of patients that can benefit from it was identified by a mutation.
• Six mutation-dependent categories identified Towards Precision Medicine Euan A. Ashley, Nat Rev Genetics 16
![Page 55: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/55.jpg)
PM and Genetic disease (2) • Precision oncology: Identifying and targeting
diseased pathways expressed in a tumor may help more than histology. “A better microscope”
• Study suggested that in 96% of undiagnosed primary tumors a genomic alteration could be identified and that in 85% of cases, it is potentially treatable by a known drug.
![Page 56: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/56.jpg)
PM and Genetic disease (3) • clopidogrel highly successful for heart attack
prevention during surgery, but prescribing required prior testing for mutations in CYP2C19
• Prevention! Screening high-risk families for relevant mutations can be cost-effective and life saving
![Page 57: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/57.jpg)
Examples of
precision medicine
![Page 58: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/58.jpg)
Pharmacogenomics • Avoid the “one size fits all” in drug prescription • The use of genomic information to individualize drug
prescribing • Pharmacology + Genomics • Analyze how the genetic makeup of a person affects
his/her drug response • Develop effective, safe medications and doses that
will be tailored to a person's genetic makeup • Most genetic tests are now done after diagnosis and
delay prescription – in the future: preemptive testing • CPIC maintains a list of gene variants and actionable
drugs
![Page 59: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/59.jpg)
The inevitable conclusion • To improve medicine and make it more
precise and personal, we need to know the genome sequence of the individual and his/her medical history.
• To make use of such information we first need to collect such data on many patients and analyze it seriously
• The time is ripe to do it!
© Ron Shamir 60
![Page 60: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/60.jpg)
Projects around the world • US precision medicine initiative: Assembly of a 1M
cohort of individuals willing to share their electronic medical record data and genomic data. – 1st generation: data from genotyping chips containing
1-2 million SNPs or enhanced exome sequencing. – 2nd generation: genome sequencing
![Page 61: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/61.jpg)
Projects around the world (2) • United Kingdom 100,000 Genomes Project
![Page 62: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/62.jpg)
Projects around the world (3) • United Kingdom and Denmark already have large-scale
biobanks. • US Million Veteran Program reports recruitment currently at more
than 300 000 individuals, with thousands having been sequenced and hundreds of thousands having been genotyped.
• USA eMERGE consortium combines electronic medical record data and genomic data from almost 200 000 individuals.
• USA Global Alliance for Genomics and Health aims for the establishment of a common framework of harmonized approaches for effective and responsible sharing of genomic and clinical data.
• National Human Genome Research Institute created the Electronic Medical Records and Genomics Network, which now includes 10 EHR-based DNA repositories and >350 000 subjects
![Page 63: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/63.jpg)
Projects around the world (4) • 23andme collected data from ~1 million individuals willing to
contribute their time and DNA to research. • Regeneron partnered with the Geisinger Health System to
connect the exome sequence with EMR data from hundreds of thousands of patients.
• Kaiser Permanente Northern California Research Program on Genes, Environment and Health biobank18 included ~200K consented subjects with saliva or blood samples linked to comprehensive longitudinal EHR data and self-reported demographic and behavioral information. A subset of 110K+ of these individuals have genome-wide genotype and telomere length data available, forming the Genetic Epidemiology Research on Adult Health and Aging cohort (2014 numbers)
![Page 64: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/64.jpg)
Electronic Health Records (EHRs) • Created and maintained by HMOs, hospitals and clinical practice
environments. • The EHR is a mix of structured and narrative text data. • Structured data: billing codes, laboratory tests, medication
prescriptions, and certain standardized document elements (eg, height, weight, vital signs, problem lists).
• EHR billing codes: – diagnosis-related groups to categorize hospitalizations – International Classification of Disease: ICD codes to describe
diagnoses and morbidities – Current Procedural Terminology codes to describe procedures.
• Narrative or text data provider notes, especially those portions entered as “free” or unstructured text (the bulk of the data). Can be structured by NLP.
• Scanned data in analog form, e.g. radiographic images, scanned text documents. Cannot easily be searched for content.
![Page 65: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/65.jpg)
Mobile health • Mobile wearable devices can measure people’s
activity and other factors continuously and accurately.
• A natural target: physical fitness. Easily measureable and a greater risk factor for all-cause mortality than smoking, diabetes, and obesity
• MyHeartCounts: cardiovascular mobile health study; recruited 30 000 smartphone users in 2 weeks
![Page 66: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/66.jpg)
What to sequence per individual • Gene panel: capture and sequence selected genes (a few
dozens to a few hundreds) at great coverage (~100x) • Exome sequencing: the exons and regulatory regions
(~10mb, 10-50x) • Whole genome sequencing (WGS, 30x)
– Processing requires 1T – Final VCF file: 1G
• Tradeoffs: cost, speed, sensitivity, clinical standards • Storage and analysis challenges!! the data size of
genomics will soon surpass that of online video and particle physics
Pukelwartz, Supercomputing for the parallelization of whole genome analysis, Bioinformatics 2014 Stephens, Z. D. et al. Big Data: astronomical or genomical? PLOS Biol. 13, e1002195 (2015).
![Page 67: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/67.jpg)
The actionable genome • 1500-2000 drugs
FDA-approved to date • Most drugs have a specific
protein that they target or are otherwise linked to.
• In that case we say that the gene is druggable or actionable
• No of druggable genes: ~4500
http://www.raps.org/Regulatory-Focus/
![Page 68: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/68.jpg)
Mutation types • Somatic vs germline • SNV: single nucleotide variation • Indel: insertion or deletion • CNV: copy number variation • SV: structural variation
![Page 69: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/69.jpg)
Somatic mutation frequencies
Lawrence Getz Mutational heterogeneity in cancer Nature 13
![Page 70: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/70.jpg)
Many other issues • Security and privacy:
– Need to maintain data security, patient privacy – De-identification of the data works to some extent – but a full genome uniquely identifies the individual
• Need informed consent of the individual to use data
• Should the person be informed on his/her results • EHRs: noisy, incomplete, many biases • Genomic data: still lacks clinical-level standards
![Page 71: seminar Towards the Precision Medicine Era: Computationalrshamir/seminar/16/PM-intro.pdf · Lecture 1 Outline •A little bit of biology •Gene expression •Protein-protein networks](https://reader033.vdocuments.site/reader033/viewer/2022053023/60544fed9990bc6383092e85/html5/thumbnails/71.jpg)
Sources • Ashley, The Precision Medicine Initiative,
JAMA 2015 • Ashley, Towards precision medicine. Nature
Rev Genetics Sept 2016 • Hall et al. Merging Electronic Health Record
Data and Genomics for Cardiovascular Research, Circ Cardiovasc Genet April 2016