open data, compound repurposing, and rare diseases (iscb)

36
Open data, compound repurposing, and rare diseases Andrew Su, Ph.D. @andrewsu [email protected] http://sulab.org February 16, 2017 Slides: slideshare.net/andrewsu

Upload: andrew-su

Post on 19-Mar-2017

59 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Open data, compound repurposing, and rare diseases (ISCB)

Open data, compound repurposing, and rare diseases

Andrew Su, Ph.D.@[email protected]://sulab.org

February 16, 2017

Slides: slideshare.net/andrewsu

Page 2: Open data, compound repurposing, and rare diseases (ISCB)

Raynaud disease and fish oil2

Raynaud disease

Page 3: Open data, compound repurposing, and rare diseases (ISCB)

Raynaud disease and fish oil3

Raynaud disease

Fish oil / EPA

Abnormal platelet activity

Abnormal blood

viscosity

High blood viscosity

Elevated RBC rigidity

Vasodilation

Low blood triglycerides

Increased prostacyclins

Page 4: Open data, compound repurposing, and rare diseases (ISCB)

Raynaud disease and fish oil4

Page 5: Open data, compound repurposing, and rare diseases (ISCB)

“Undiscovered public knowledge”5

Raynaud disease

Fish oil / EPA

Abnormal platelet activity

Abnormal blood

viscosity

High blood viscosity

Elevated RBC rigidity

Vasodilation

Low blood triglycerides

Increased prostacyclins

A

C

B

B BB

BB

B

Page 6: Open data, compound repurposing, and rare diseases (ISCB)

“Undiscovered public knowledge”6

Raynaud disease

Fish oil / EPA

Abnormal platelet activity

Abnormal blood

viscosity

High blood viscosity

Elevated RBC rigidity

Vasodilation

Low blood triglycerides

Increased prostacyclins

A

C

B

B BB

BB

B

Page 7: Open data, compound repurposing, and rare diseases (ISCB)

“Undiscovered public knowledge”7

Page 8: Open data, compound repurposing, and rare diseases (ISCB)

Building a Network of BioThings (then)8

Eicosapentaenoic acid

Platelet aggregation Fatty Acid

Edge = co-mention

x 1000s article titles

Page 9: Open data, compound repurposing, and rare diseases (ISCB)

Building a Network of BioThings (now)9

Eicosapentaenoic acid

Platelet aggregation Fatty Acid

x 1000s article titles

x 26 million articles…… and full abstracts

decreases Edge = co-mention

= PubChem:446284 = Timnodonic acid

Page 10: Open data, compound repurposing, and rare diseases (ISCB)

Information extraction 10

1: Identify all biomedical concepts

2: Identify relationships between concepts

Page 11: Open data, compound repurposing, and rare diseases (ISCB)

11

PathwaysDiseasesProteinsVariants

GenesDrugs

Goal: Assemble a network of biomedical entities that is comprehensive, current, computable, and traceable.

Page 12: Open data, compound repurposing, and rare diseases (ISCB)

12

Page 13: Open data, compound repurposing, and rare diseases (ISCB)

13

PathwaysDiseasesProteinsVariants

GenesDrugs

Goal: Assemble a network of biomedical entities that is comprehensive, current, computable, and traceable.

Page 14: Open data, compound repurposing, and rare diseases (ISCB)

14

Sou

rce:

http

s://w

ilson

com

mon

slab

.org

/201

4/03

/06/

calli

ng-a

ll-su

ppor

ters

Page 15: Open data, compound repurposing, and rare diseases (ISCB)

Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts?

15

Page 16: Open data, compound repurposing, and rare diseases (ISCB)

16

Experts versus crowd for concept identification

593 PubMed abstracts

6,900 mentions of “disease concepts”

F = 0.87F = 0.78

$$$

Page 17: Open data, compound repurposing, and rare diseases (ISCB)

17

Experts versus crowd for concept identification

593 PubMed abstracts

6,900 mentions of “disease concepts”

F = 0.87F = 0.87

$$$

• 9 days• 145 workers• Total: $630.96

Page 18: Open data, compound repurposing, and rare diseases (ISCB)

19

http://mark2cure.org

Page 19: Open data, compound repurposing, and rare diseases (ISCB)

20

Paid crowdsourcing

• F = 0.84• 28 days• 212 workers• Total cost: $0

$$$

• F = 0.87• 9 days• 145 workers• Total: $630.96

“Help science, please”

Citizen Science

Page 20: Open data, compound repurposing, and rare diseases (ISCB)

Does Citizen Science scale?21

1,000,000 articles * 10 AE / article 15,828 volunteers

needed10,275 AE * 365 days

212 annotators* 28 days

AE = Annotation events

=

Number of annotation events per year

Number of annotation events per year

per volunteer

Page 21: Open data, compound repurposing, and rare diseases (ISCB)

Does Citizen Science scale?22

15,828 volunteers

needed

200,000 volunteers

460,000 volunteers

37,000 volunteers

1,000,000+ volunteers

Page 22: Open data, compound repurposing, and rare diseases (ISCB)

23

Nina Hale https://flic.kr/p/zoVih

Page 23: Open data, compound repurposing, and rare diseases (ISCB)

Rare disease case study #224

Page 24: Open data, compound repurposing, and rare diseases (ISCB)

25

Page 25: Open data, compound repurposing, and rare diseases (ISCB)

26

… but no obvious treatments

Page 26: Open data, compound repurposing, and rare diseases (ISCB)

Mapping the biomedical network around NGLY1 27

NGLY1

Page 27: Open data, compound repurposing, and rare diseases (ISCB)

28

http://mark2cure.org

Page 28: Open data, compound repurposing, and rare diseases (ISCB)

29

A preliminary view of the NGLY1-focused biological network

1,200 contributors3,200 documents 787,400 annotations

Page 29: Open data, compound repurposing, and rare diseases (ISCB)

30

A preliminary view of the NGLY1-focused biological network

A

C

B

B BB

BB

B

AB

B BB

BB

B

A

B

B BB

BB

B

Page 30: Open data, compound repurposing, and rare diseases (ISCB)

31

http://slides.com/dhimmel/big-data-seminar

Page 31: Open data, compound repurposing, and rare diseases (ISCB)

32

http://slides.com/dhimmel/big-data-seminar

Page 32: Open data, compound repurposing, and rare diseases (ISCB)

33

http://slides.com/dhimmel/big-data-seminar

Page 33: Open data, compound repurposing, and rare diseases (ISCB)

34

http://slides.com/dhimmel/big-data-seminar

Page 34: Open data, compound repurposing, and rare diseases (ISCB)

35

http://slides.com/dhimmel/big-data-seminar

Page 35: Open data, compound repurposing, and rare diseases (ISCB)

36

Biomedical research relies on effective

Pie

tro B

ellin

iht

tps:

//flic

.kr/p

/k5j

mja

KNOWELDGE MANAGEMENT

Page 36: Open data, compound repurposing, and rare diseases (ISCB)

Ben GoodChunlei Wu Shirley Willis

Sebastien Lelong

Andra Waagmeester

Max Nanis

Cyrus Afrasiabi

Julia Turner

Ginger Tsueng

M2C M2C

Louis Gioia

Toby Li

Karthik G

Kevin Xin

Jake Bruggemann

Mike Mayers

DR

DR

Julee Adesara

Ramya Gamini Greg Stupp Sebastian Burgstaller

Tim Putman Nuria Queralt Rosinach

DRDR

DR

DR M2C

The Crowds Funding