open data, compound repurposing, and rare diseases (iscb)
TRANSCRIPT
Open data, compound repurposing, and rare diseases
Andrew Su, Ph.D.@[email protected]://sulab.org
February 16, 2017
Slides: slideshare.net/andrewsu
Raynaud disease and fish oil2
Raynaud disease
Raynaud disease and fish oil3
Raynaud disease
Fish oil / EPA
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
Raynaud disease and fish oil4
“Undiscovered public knowledge”5
Raynaud disease
Fish oil / EPA
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
A
C
B
B BB
BB
B
“Undiscovered public knowledge”6
Raynaud disease
Fish oil / EPA
Abnormal platelet activity
Abnormal blood
viscosity
High blood viscosity
Elevated RBC rigidity
Vasodilation
Low blood triglycerides
Increased prostacyclins
A
C
B
B BB
BB
B
“Undiscovered public knowledge”7
Building a Network of BioThings (then)8
Eicosapentaenoic acid
Platelet aggregation Fatty Acid
Edge = co-mention
x 1000s article titles
Building a Network of BioThings (now)9
Eicosapentaenoic acid
Platelet aggregation Fatty Acid
x 1000s article titles
x 26 million articles…… and full abstracts
decreases Edge = co-mention
= PubChem:446284 = Timnodonic acid
Information extraction 10
1: Identify all biomedical concepts
2: Identify relationships between concepts
11
PathwaysDiseasesProteinsVariants
GenesDrugs
Goal: Assemble a network of biomedical entities that is comprehensive, current, computable, and traceable.
12
13
PathwaysDiseasesProteinsVariants
GenesDrugs
Goal: Assemble a network of biomedical entities that is comprehensive, current, computable, and traceable.
14
Sou
rce:
http
s://w
ilson
com
mon
slab
.org
/201
4/03
/06/
calli
ng-a
ll-su
ppor
ters
Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts?
15
16
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.78
$$$
17
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.87
$$$
• 9 days• 145 workers• Total: $630.96
19
http://mark2cure.org
20
Paid crowdsourcing
• F = 0.84• 28 days• 212 workers• Total cost: $0
$$$
• F = 0.87• 9 days• 145 workers• Total: $630.96
“Help science, please”
Citizen Science
Does Citizen Science scale?21
1,000,000 articles * 10 AE / article 15,828 volunteers
needed10,275 AE * 365 days
212 annotators* 28 days
AE = Annotation events
=
Number of annotation events per year
Number of annotation events per year
per volunteer
Does Citizen Science scale?22
15,828 volunteers
needed
200,000 volunteers
460,000 volunteers
37,000 volunteers
1,000,000+ volunteers
23
Nina Hale https://flic.kr/p/zoVih
Rare disease case study #224
25
26
… but no obvious treatments
Mapping the biomedical network around NGLY1 27
NGLY1
28
http://mark2cure.org
29
A preliminary view of the NGLY1-focused biological network
1,200 contributors3,200 documents 787,400 annotations
30
A preliminary view of the NGLY1-focused biological network
A
C
B
B BB
BB
B
AB
B BB
BB
B
A
B
B BB
BB
B
31
http://slides.com/dhimmel/big-data-seminar
32
http://slides.com/dhimmel/big-data-seminar
33
http://slides.com/dhimmel/big-data-seminar
34
http://slides.com/dhimmel/big-data-seminar
35
http://slides.com/dhimmel/big-data-seminar
36
Biomedical research relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
Ben GoodChunlei Wu Shirley Willis
Sebastien Lelong
Andra Waagmeester
Max Nanis
Cyrus Afrasiabi
Julia Turner
Ginger Tsueng
M2C M2C
Louis Gioia
Toby Li
Karthik G
Kevin Xin
Jake Bruggemann
Mike Mayers
DR
DR
Julee Adesara
Ramya Gamini Greg Stupp Sebastian Burgstaller
Tim Putman Nuria Queralt Rosinach
DRDR
DR
DR M2C
The Crowds Funding