The Changing Face of Scholarly Communication and the Opportunities it
Affords the Bioinformatics/Systems Biology Student
Philip E. BourneUniversity of California San Diego
[email protected]://www.sdsc.edu/pb
Third UCSD Bioinformatics and Systems Biology Expo – 2/28/2011
Observation 1: Everyone in this Room is Driven by One Thing Above All Else
Observation 2: We Are a Field That Uses/Produces Public On-Line Data Like No Other
Observation 3: We Have Shaped the Way Data Are Shared – We Have Had Very Little Impact on
Publications
Perhaps it is Time We Though Less About a Publication as a Reward and More About How it
Can be Presented to Maximize its Use
So What Needs to Happen
– We need data and knowledge about that data to interoperate i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge discovery
– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
One Personal Example of Why This Needs to Happen Now
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
http://sagecongress.org/Presentations/Sommer.pdf
Chordoma
• A rare form of brain cancer
• No known drugs• Treatment – surgical
resection followed by intense radiation therapy
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
Adapted: http://sagecongress.org/Presentations/Sommer.pdf
Isaac
If I have seen further it is only by standing on the shoulders of giants
Isaac Newton
From Josh’s point of view the climb up just takes too long
> 15 years and > $850M to be more precise
http://sagecongress.org/Presentations/Sommer.pdf
http://sagecongress.org/Presentations/Sommer.pdf
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
So We Have Seem What Needs the Change and Why. What about the How?
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
We Need Data and Knowledge About That
Data to Interoperate
1. User clicks on content2. Metadata and
webservices to data provide an interactive view that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?
• Open Access• Governance – publishers vs. database
providers• Reward• Metadata standards for provenance, privacy
etc.• Exemplars• ….
A Small Example - The World Wide Protein Data Bank
• The single worldwide repository for data on the structure of biological macromolecules
• Vital for drug discovery and the life sciences
• 39 years old• Free to all
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34
The World Wide Protein Data Bank – The Best Case Scenario
• Paper not published unless data are deposited – strong data to literature correspondence
• Highly structured data conforming to an extensive ontology
• DOI’s assigned to every structure
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example Interoperability: The Database View
We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220
Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate
ICTP Trieste, December 10, 2007We need data and knowledge about that data to interoperate
Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that
Data, But as Yet Not Used Much
Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
We need data and knowledge about that data to interoperate
Semantic Tagging of Database Content in The Literature or Elsewhere
http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging
We need data and knowledge about that data to interoperate
The Publishers are Starting to Do It
From Anita de Waard, Elsevier
This is Literature Post-processingBetter to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
We need data and knowledge about that data to interoperate
Word 2007 Add-in for authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
We need data and knowledge about that data to interoperate
Challenges
• Authors – Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers– Carrot Competitive advantage
We need data and knowledge about that data to interoperate
The Promise – A Hypothetical Example
Immunology Literature
Cardiac DiseaseLiterature
Shared Function
We need data and knowledge about that data to interoperate
High-throughput Biology Requires High-throughput Knowledge Discovery
Consider an Example from Our Own Work…
Roger Chang Will Give You Another Example
The TB-Drugome1. Determine the TB structural proteome
2. Determine all known drug binding sites from the PDB
3. Determine which of the sites found in 2 exist in 1
4. Call the result the TB-drugome
High-throughput Data Requires High-throughput Knowledge
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
1. Determine the TB Structural Proteome
284
1, 446
3, 996 2, 266
TB proteome
homology models
solved structu
res
• High quality homology models from ModBase (http://modbase.compbio.ucsf.edu) increase structural coverage from 7.1% to 43.3%
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 370
20
40
60
80
100
120
140
2. Determine all Known Drug Binding Sites in the PDB
• Searched the PDB for protein crystal structures bound with FDA-approved drugs
• 268 drugs bound in a total of 931 binding sites
No. of drug binding sites
No.
of d
rugs
MethotrexateChenodiol
AlitretinoinConjugated estrogens
DarunavirAcarbose
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Map 2 onto 1 – The TB-Drugomehttp://funsite.sdsc.edu/drugome/TB/
Similarities between the binding sites of M.tb proteins (blue), and binding sites containing approved drugs (red).
1 2 3 4 5 6 7 8 9 10 11 12 13 140
2
4
6
8
10
12
14
16
18
20
From a Drug Repositioning Perspective
• Similarities between drug binding sites and TB proteins are found for 61/268 drugs
• 41 of these drugs could potentially inhibit more than one TB protein
No. of potential TB targets
No.
of
drug
s raloxifenealitretinoin
conjugated estrogens &methotrexate
ritonavir
testosteronelevothyroxine
chenodiol
Kinnings et al 2010 PLoS Comp Biol 6(11): e1000976
Top 5 Most Highly Connected Drugs
Drug Intended targets Indications No. of connections TB proteins
levothyroxine transthyretin, thyroid hormone receptor α & β-1, thyroxine-binding globulin, mu-crystallin homolog, serum albumin
hypothyroidism, goiter, chronic lymphocytic thyroiditis, myxedema coma, stupor
14
adenylyl cyclase, argR, bioD, CRP/FNR trans. reg., ethR, glbN, glbO, kasB, lrpA, nusA, prrA, secA1, thyX, trans. reg. protein
alitretinoin retinoic acid receptor RXR-α, β & γ, retinoic acid receptor α, β & γ-1&2, cellular retinoic acid-binding protein 1&2
cutaneous lesions in patients with Kaposi's sarcoma 13
adenylyl cyclase, aroG, bioD, bpoC, CRP/FNR trans. reg., cyp125, embR, glbN, inhA, lppX, nusA, pknE, purN
conjugated estrogens
estrogen receptormenopausal vasomotor symptoms, osteoporosis, hypoestrogenism, primary ovarian failure
10acetylglutamate kinase, adenylyl cyclase, bphD, CRP/FNR trans. reg., cyp121, cysM, inhA, mscL, pknB, sigC
methotrexatedihydrofolate reductase, serum albumin
gestational choriocarcinoma, chorioadenoma destruens, hydatidiform mole, severe psoriasis, rheumatoid arthritis
10acetylglutamate kinase, aroF, cmaA2, CRP/FNR trans. reg., cyp121, cyp51, lpd, mmaA4, panC, usp
raloxifeneestrogen receptor, estrogen receptor β
osteoporosis in post-menopausal women 9
adenylyl cyclase, CRP/FNR trans. reg., deoD, inhA, pknB, pknE, Rv1347c, secA1, sigC
We Need Better Ways to Associate Data and Knowledge and its More than Just Text Mining of
PubMed Abstracts – Its About Changing the System
Our Future is in Your Hands!
Acknowledgements• BioLit Team
– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn
• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey
• RCSB PDB team– Andreas Prilc– Dimitris Dimitropoulos
• TB Drugome Team– Lei Xie– Sarah Kinnings– Li Xie
http://funsite.sdsc.edu/drugome/TB/
http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit