open data, compound repurposing, and rare diseases -- point loma nazarene university
TRANSCRIPT
Open data, compound repurposing, and rare diseases
Andrew Su, Ph.D.@[email protected]://sulab.org
January 30, 2017
Slides: slideshare.net/andrewsu
2
Programmer/Comp sci
Statistician/ Mathematician
Biologist
Data scientist
Bioinformatician Biostatistician
Adapted from http://blog.fejes.ca/?p=2418
…teach STEM students the importance of connecting computational, mathematical, and natural sciences.
3
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
4
Credit: http://www.slideshare.net/PhRMA/rare-disease-infographics
Rare disease case study #15
Photo: Retta Beery
6
Bainbridge et al., STM, 2011
7
Photo: Retta Beery
Rare disease case study #28
9
… but no obvious treatments
10
Bainbridge et al., STM, 2011
SPR
What differentiates SPR and NGLY1?11
SPR
12
Sarah Olmsteadhttps://flic.kr/p/364dZW
NGLY1
13
NGLY1(11 PubMed articles)
Congenital disorders of glycosylation
(822)
PNGase(686)
ERAD(1330)
glycosylation(48,862)
alacrima(164)
Genetic interactors
(3016)
symptoms(109,928)
25 million articles in PubMed
The biomedical literature is massive…14
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
0200,000400,000600,000800,000
1,000,0001,200,0001,400,000
Number of new PubMed-indexed articles
… but it is very hard to query and compute15
… but it is very hard to query and compute16
ImatinibCrizotinibErlotinibGefitinibSorafenibLapatinibDasatinib
…
Acute myeloid leukemiaAcute lymphoblastic leukemia
Chronic myelogenous leukemiaChronic lymphocytic leukemia
Hodgkin lymphomaNon-Hodgkin lymphoma
Myeloma…
AND
GleevecGlivecSTI-571STI 571STI571ST1571ST 1571CGP-57148CGP 57148CGP57148CGP57148B…
… but it is very hard to query and compute17
EntrezGene ID HGNC symbol Description
10884 MRPS30 mitochondrial ribosomal protein S30
10914 PAPOLA poly(A) polymerase alpha
11333 PDAP1 PDGFA associated protein 1
11334 TUSC2 tumor suppressor candidate 2
130120 REG3G regenerating islet-derived 3 gamma
5068 REG3A regenerating islet-derived 3 alpha
50807 ASAP1 ArfGAP with SH3 domain, ankyrin repeat and PH domain 1
55 ACPP acid phosphatase, prostate
8853 ASAP2 ArfGAP with SH3 domain, ankyrin repeat and PH domain 2
Human genes referred to as “PAP”
18
Biomedical research relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
Information extraction from biomedical text19
1. Identify biomedical concepts in text
… We report a case of familial systemic mastocytosis with the rare KIT K509I germ line mutation. In vitro treatment with imatinib, dasatinib and PKC412 reduced cell viability of primary mast cells harboring KIT K509I mutation. Both patients with familial systemic mastocytosis had remarkable hematological and skin improvement after three months of imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
Information extraction from biomedical text20
imatinib
dasatinib
PKC412
Familial systemic mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation of
Mutation causes
causes
treats
inhibits
21
Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
22
http://www.navy.mil/management/photodb/photos/101104-N-6383T-508.jpg
The Gene Wiki project, circa 200823
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
24
Lissencephaly
Gene-disease annotation databases25
Query: Reelin (RELN)
Gene-disease annotation databases26
Lissencephaly Familial Temporal Lobe Epilepsy
Query: Reelin (RELN)
Gene-disease annotation databases27
Lissencephaly Familial Temporal Lobe Epilepsy OtosclerosisSchizophrenia
Query: Reelin (RELN)
Gene-disease annotation databases28
Lissencephaly Familial Temporal Lobe Epilepsy OtosclerosisSchizophreniaBipolar Disorder Autistic Disorder Alzheimer Disease Schizophrenic Psychology Breast Neoplasms …
Child Development Disorders, Pervasive
Cognition Cognition Disorders Dominance, Cerebral Executive Function Field Dependence-
Independence Functional Laterality Choice Behavior Precursor T-Cell
Lymphoblastic Leukemia-Lymphoma
27 “diseases”
Psychotic Disorders Attention Attention Deficit Disorder
with Hyperactivity Memory Memory, Short-Term Mental Disorders Task Performance and
Analysis Tobacco Use Disorder Weight Gain Schizophrenia, Paranoid
Query: Reelin (RELN)
is to data
is to text
biomedicalProvide a database of the world’s knowledge that anyone can edit
- Denny Vrandečić
Subclass of
Regulates
Physically interacts with
Protein
Neural development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid beta A4 Q423510
Q13561329
http
://w
ww
.wik
idat
a.or
g/w
iki/Q
1356
1329
Decreased expression in
Property:P1910Schizophrenia Q41112
Bipolar disorder Q131755
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
http
s://
ww
w.w
ikid
ata.
org/
w/a
pi.p
hp?a
ctio
n=w
bget
entit
ies&
ids=
Q13
5613
29&
form
at=j
son
Property:P1910Q41112
Q131755
32
Seeding Wikidata with biomedical data
• All human, mouse genes and proteins
• All Gene Ontology terms• All FDA approved drugs • 9,000+ human diseases• 120 reference microbial genomes
Mitraka et al (2015) Semantic Web Applications for the Life SciencesBurgstaller-Muelbacher et al (2016) DatabasePutman et al (2016) Database
Centralizing key data storage34
287 language editions of Wikipedia
Bioinformatics community
Toxicology community
Epidemiology community… …
“Show all tyrosine kinase inhibitors that are used to treat hematologic cancers.”
“Show all human membrane proteins associated with colorectal cancer.”
“Show all monoclonal antibodies used to treat melanoma.”
39
Crowdsourcing via Citizen Science
Biomedical Linked Open Data
40
Sou
rce:
http
s://w
ilson
com
mon
slab
.org
/201
4/03
/06/
calli
ng-a
ll-su
ppor
ters
Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts?
41
42
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.78
$$$
43
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.87
$$$
• 9 days• 145 workers• Total: $630.96
45
http://mark2cure.org
46
Paid crowdsourcing
• F = 0.84• 28 days• 212 workers• Total cost: $0
$$$
• F = 0.87• 9 days• 145 workers• Total: $630.96
“Help science, please”
Citizen Science
Does Citizen Science scale?47
1,000,000 articles * 10 AE / article 15,828 volunteers
needed10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation events per year
Number of annotation events per year
per volunteer
Does Citizen Science scale?48
15,828 volunteers
needed
200,000 volunteers
460,000 volunteers
37,000 volunteers
1,000,000+ volunteers
Mapping the biomedical network around NGLY1 49
NGLY1
50
http://mark2cure.org
51
A preliminary view of the NGLY1-focused biological network
1,200 contributors3,200 documents 787,400 annotations
Finding new indications for existing drugs or therapies53
Raynaud’s Syndrome
Fish oil
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
Finding new indications for existing drugs or therapies54
Finding new indications for existing drugs or therapies55
Raynaud’s Syndrome
Fish oil
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
A
C
B
B BB
BB
B
56
A preliminary view of the NGLY1-focused biological network
A
C
B
B BB
BB
B
AB
B BB
BB
B
A
B
B BB
BB
B
57
Biomedical research relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
58
Paul Pavlidis,
UBC
Lynn Schriml,
U Maryland
Matt and Cristina Might,
Crowd volunteers and partners
(Salomon) (Lotz)
(Yang, Maximov) (Topol)
Louis Gioia
Julee Adesara
Toby Li
Karthik G
Erick Scott
Adam Mark
Kevin Xin
Jake Bruggemann
Mike Mayers
Andra Waagmeester
Max Nanis
Cyrus Afrasiabi
Ian MacLeod
Julia Turner
Ginger Tsueng
Sebastien Lelong
Erik Clarke
Jennifer Fouquier
Ben GoodChunlei Wu Shirley Willis
Tobias Meissner Katie Fisch Sandip Chatterjee
Ramya Gamini Greg Stupp Sebastian Burgstaller
Tim Putman Nuria Queralt Rosinach
Sal Loguercio
M2C M2C
GW
GW
GW
GW GW
GW
GW
60