scholarly communication for bioinformatics students

42
The Changing Face of Scholarly Communication and the Opportunities it Affords the Bioinformatics/Systems Biology Student Philip E. Bourne University of California San Diego [email protected] http://www.sdsc.edu/ pb Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011

Upload: philip-bourne

Post on 24-Jan-2015

781 views

Category:

Documents


1 download

DESCRIPTION

Presentation made to the incoming bioinformatics and systems biology students at UCSD on how they could get involved in changing scholarly communication. Given February 28, 2011

TRANSCRIPT

Page 1: Scholarly Communication for Bioinformatics Students

The Changing Face of Scholarly Communication and the Opportunities it

Affords the Bioinformatics/Systems Biology Student

Philip E. BourneUniversity of California San Diego

[email protected]://www.sdsc.edu/pb

Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011

Page 2: Scholarly Communication for Bioinformatics Students

Observation 1: Everyone in this Room is Driven by One Thing Above All Else

Page 3: Scholarly Communication for Bioinformatics Students

Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other

Page 4: Scholarly Communication for Bioinformatics Students

Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on

Publications

Page 5: Scholarly Communication for Bioinformatics Students

Perhaps it is Time We Though Less About a Publication as a Reward and More About How it

Can be Presented to Maximize its Use

Page 6: Scholarly Communication for Bioinformatics Students

So What Needs to Happen

– We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze,

visualize and annotate data to maximize knowledge discovery

– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Easy Hard

Page 7: Scholarly Communication for Bioinformatics Students

One Personal Example of Why This Needs to Happen Now

Page 8: Scholarly Communication for Bioinformatics Students

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation

http://sagecongress.org/Presentations/Sommer.pdf

Page 9: Scholarly Communication for Bioinformatics Students

Chordoma

• A rare form of brain cancer

• No known drugs• Treatment – surgical

resection followed by intense radiation therapy

http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

Page 10: Scholarly Communication for Bioinformatics Students

http://sagecongress.org/Presentations/Sommer.pdf

Page 11: Scholarly Communication for Bioinformatics Students

http://sagecongress.org/Presentations/Sommer.pdf

Page 12: Scholarly Communication for Bioinformatics Students

http://sagecongress.org/Presentations/Sommer.pdf

Page 13: Scholarly Communication for Bioinformatics Students

Adapted: http://sagecongress.org/Presentations/Sommer.pdf

Isaac

If I have seen further it is only by standing on the shoulders of giants

Isaac Newton

From Josh’s point of view the climb up just takes too long

> 15 years and > $850M to be more precise

Page 14: Scholarly Communication for Bioinformatics Students

http://sagecongress.org/Presentations/Sommer.pdf

Page 15: Scholarly Communication for Bioinformatics Students

http://sagecongress.org/Presentations/Sommer.pdf

Page 16: Scholarly Communication for Bioinformatics Students

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

Page 17: Scholarly Communication for Bioinformatics Students

So We Have Seem What Needs the Change and Why. What about the How?

Page 18: Scholarly Communication for Bioinformatics Students

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

We Need Data and Knowledge About That

Data to Interoperate

1. User clicks on content2. Metadata and

webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Page 19: Scholarly Communication for Bioinformatics Students

We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?

• Open Access• Governance – publishers vs. database

providers• Reward• Metadata standards for provenance, privacy

etc.• Exemplars• ….

Page 20: Scholarly Communication for Bioinformatics Students

A Small Example - The World Wide Protein Data Bank

• The single worldwide repository for data on the structure of biological macromolecules

• Vital for drug discovery and the life sciences

• 39 years old• Free to all

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

Page 21: Scholarly Communication for Bioinformatics Students

The World Wide Protein Data Bank – The Best Case Scenario

• Paper not published unless data are deposited – strong data to literature correspondence

• Highly structured data conforming to an extensive ontology

• DOI’s assigned to every structure

http://www.wwpdb.org

We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34

Page 22: Scholarly Communication for Bioinformatics Students

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example Interoperability: The Database View

We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220

Page 23: Scholarly Communication for Bioinformatics Students

Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu

Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate

Page 24: Scholarly Communication for Bioinformatics Students

ICTP Trieste, December 10, 2007We need data and knowledge about that data to interoperate

Page 25: Scholarly Communication for Bioinformatics Students

Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that

Data, But as Yet Not Used Much

Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673

We need data and knowledge about that data to interoperate

Page 26: Scholarly Communication for Bioinformatics Students

Semantic Tagging of Database Content in The Literature or Elsewhere

http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 27: Scholarly Communication for Bioinformatics Students

We need data and knowledge about that data to interoperate

Page 28: Scholarly Communication for Bioinformatics Students

The Publishers are Starting to Do It

From Anita de Waard, Elsevier

Page 29: Scholarly Communication for Bioinformatics Students

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

We need data and knowledge about that data to interoperate

Page 30: Scholarly Communication for Bioinformatics Students

Word 2007 Add-in for authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolit

We need data and knowledge about that data to interoperate

Page 31: Scholarly Communication for Bioinformatics Students

Challenges

• Authors – Carrot IF one or more publishers fast tracked a

paper that had semantic markup it might catch on

• Publishers– Carrot Competitive advantage

We need data and knowledge about that data to interoperate

Page 32: Scholarly Communication for Bioinformatics Students

The Promise – A Hypothetical Example

Immunology Literature

Cardiac DiseaseLiterature

Shared Function

We need data and knowledge about that data to interoperate

Page 33: Scholarly Communication for Bioinformatics Students

High-throughput Biology Requires High-throughput Knowledge Discovery

Consider an Example from Our Own Work…

Roger Chang Will Give You Another Example

Page 34: Scholarly Communication for Bioinformatics Students

The TB-Drugome1. Determine the TB structural proteome

2. Determine all known drug binding sites from the PDB

3. Determine which of the sites found in 2 exist in 1

4. Call the result the TB-drugome

High-throughput Data Requires High-throughput Knowledge

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Page 35: Scholarly Communication for Bioinformatics Students

1. Determine the TB Structural Proteome

284

1, 446

3, 996 2, 266

TB proteome

homology models

solved structu

res

• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Page 36: Scholarly Communication for Bioinformatics Students

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370

20

40

60

80

100

120

140

2. Determine all Known Drug Binding Sites in the PDB

• Searched the PDB for protein crystal structures bound with FDA-approved drugs

• 268 drugs bound in a total of 931 binding sites

No. of drug binding sites

No.

of d

rugs

MethotrexateChenodiol

AlitretinoinConjugated estrogens

DarunavirAcarbose

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Page 37: Scholarly Communication for Bioinformatics Students

Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/

Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).

Page 38: Scholarly Communication for Bioinformatics Students

1 2 3 4 5 6 7 8 9 10 11 12 13 140

2

4

6

8

10

12

14

16

18

20

From a Drug Repositioning Perspective

• Similarities between drug binding sites and TB proteins are found for 61/268 drugs

• 41 of these drugs could potentially inhibit more than one TB protein

No. of potential TB targets

No.

of

drug

s raloxifenealitretinoin

conjugated estrogens &methotrexate

ritonavir

testosteronelevothyroxine

chenodiol

Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976

Page 39: Scholarly Communication for Bioinformatics Students

Top 5 Most Highly Connected Drugs

Drug Intended targets Indications No. of connections TB proteins

levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin

hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor

14

adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein

alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2

cutaneous lesions in patients with Kaposi's sarcoma 13

adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN

conjugated estrogens

estrogen receptormenopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure

10acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC

methotrexatedihydrofolate reductase, serum albumin

gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis

10acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp

raloxifeneestrogen receptor, estrogen receptor β

osteoporosis in post-menopausal women 9

adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC

Page 40: Scholarly Communication for Bioinformatics Students

We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of

PubMed Abstracts – Its About Changing the System

Our Future is in Your Hands!

Page 41: Scholarly Communication for Bioinformatics Students

Acknowledgements• BioLit Team

– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn

• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey

• RCSB PDB team– Andreas Prilc– Dimitris Dimitropoulos

• TB Drugome Team– Lei Xie– Sarah Kinnings– Li Xie

http://funsite.sdsc.edu/drugome/TB/

http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit