ICBP Houston April 27, 2012
Communicating Systems Biology – Why and How We Should Do Better in a Digital World ?
Philip E. BourneUniversity of California San Diego
[email protected]://www.slideshare.net/pebourne/
ICBP Houston April 27, 2012
Why We Should Do Better
• Discovery processes are increasingly complex and broad in scope
• Data must be connected more closely to the methods under study
• Science is an increasingly social endeavor
http://www.discoveryinformaticsinitiative.org/Yolanda Gil and Haym Hirsch
ICBP Houston April 27, 2012
Why We Should Do BetterThe Scientific Process is Too Slow to Respond to a Crisis – Either Global
or Personal
Motivation
http://knol.google.com/k/plos-currents-influenza#
By the time the paper is published we could all be dead
ICBP Houston April 27, 2012* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010
1RUZ: 1918 H1 Hemagglutinin
Structure Summary page activity forH1N1 Influenza related structures
3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir
In a time of crisis the need for fast access to accurate data and any knowledge associatedwith that data are paramount
Motivation
ICBP Houston April 27, 2012
If that is not enough…
For some people the scientific process may be too slow to save their life
Motivation
ICBP Houston April 27, 2012
Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
ICBP Houston April 27, 2012
Chordoma
• A rare form of brain cancer
• No known drugs• Treatment – surgical
resection followed by intense radiation therapy
Motivation
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
ICBP Houston April 27, 2012
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
ICBP Houston April 27, 2012
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
ICBP Houston April 27, 2012
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
ICBP Houston April 27, 2012
http://sagecongress.org/Presentations/Sommer.pdf
Motivation
ICBP Houston April 27, 2012Motivation
http://sagecongress.org/Presentations/Sommer.pdf
ICBP Houston April 27, 2012
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation
Motivation
ICBP Houston April 27, 2012
Science is an Increasingly Social Endeavor
Witness the Story of Meredith
ICBP Houston April 27, 2012
A Requirement is More Open ScienceBut ….
ICBP Houston April 27, 2012
Openness is Misunderstood by Scientists
• Witness the confusion regarding open access
• Witness PubMed Central
ICBP Houston April 27, 2012
What Are the Impediments to Open Science?
Change Reward
You don’t get tenure for starting a blog!
ICBP Houston April 27, 2012
How Can We Do Better? …
ICBP Houston April 27, 2012
How Can We Do Better?
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management and discovery tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
Both Are Under Stress
• PubMed contains ~21M entries (May 2011)
• ~100,000 papers indexed per month
• In Feb 2009:– 67,406,898 interactive
searches were done– 92,216,786 entries were
viewed
• 1330 databases reported in NAR 2011
• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times
PLoS Comp. Biol. 2005 1(3) e34
Some More Comparisons
• Journals have a pretty standardized interface
• Journals have a business model
• The quality is declining as numbers increase (?)
• Audience believes they are sustainable
• Efforts to make the interfaces different!
• Little attempt at a business model compared to the Web 2.0 world
• Quality is increasing (?)• Not well sustained
PLoS Comp. Biol. 2008. 4(7): e1000136Databases versus journals
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
We Need Data and Knowledge About That
Data to Interoperate
1. User clicks on content2. Metadata and
webservices to data provide an interactive view that can be annotated
3. Selecting features provides a data/knowledge mashup
4. Analysis leads to new content I can share
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
The Knowledge and Data Cycle
PLoS Comp. Biol. 2005 1(3) e34
ICBP Houston April 27, 2012
We Need Data and Knowledge About That Data to Interoperate – What is Stopping Us?
• Governance – publishers vs. database providers
• Reward• Metadata standards for provenance, privacy
etc.• Exemplars• ….
Caveat: Each discipline is different – I speak very much from a biomedicalsciences perspective
A Small Example - The World Wide Protein Data Bank
• The single worldwide repository for data on the structure of biological macromolecules
• Vital for drug discovery and the life sciences
• 41 years old• Free to all
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34 ICBP Houston April 27, 2012
The World Wide Protein Data Bank – The Best Case Scenario
• Paper not published unless data are deposited – strong data to literature correspondence
• Highly structured data conforming to extensive ontologies
• DOI’s assigned to every structure
http://www.wwpdb.org
We need data and knowledge about that data to interoperatePLoS Comp. Biol. 2005 1(3) e34 ICBP Houston April 27, 2012
ICBP Houston April 27, 2012
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
Example Interoperability: The Database View
We need data and knowledge about that data to interoperateBMC Bioinformatics 2010 11:220
ICBP Houston April 27, 2012
Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu
Nucleic Acids Research 2008 36(S2) W385-389 We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that
Data, But as Yet Not Used Much
Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673
We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
Semantic Tagging of Database Content in The Literature or Elsewhere
http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging
ICBP Houston April 27, 2012
Where Will It All End?http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html
ICBP Houston April 27, 2012
This is Literature Post-processingBetter to Get the Authors Involved
• Authors are the absolute experts on the content
• More effective distribution of labor
• Add metadata before the article enters the publishing process
We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
Word Add-in for Authors
• Allows authors to add metadata as they write, before they submit the manuscript
• Authors are assisted by automated term recognition– OBO ontologies– Database IDs
• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
Challenges
• Authors – Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers– Carrot Competitive advantage
We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
The Promise – A Hypothetical Example
Immunology Literature
Cardiac DiseaseLiterature
Shared Function
We need data and knowledge about that data to interoperate
ICBP Houston April 27, 2012
How Can We Do Better?
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management and discovery tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
ICBP Houston April 27, 2012
One Small Example of the Problem• jMol, VMD … are de facto
standard important tools for rendering biological molecules .. but
• They are not versatile ie do not for example:– Respond to the data they are
reading– Offer views that match the users
interests– Allow the user to annotate the
data– Allow those annotations to be
shared (published?)
Think More About the Tools
Github is Great But We Need Apps for Science
Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136
ICBP Houston April 27, 2012
A Few Things to Accelerate the Rate of Scientific Discovery
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
ICBP Houston April 27, 2012
Reward Systems Need to ChangeWhat is Needed?
• Author disambiguation• Auditing (identification and metrics) of all
scholarship - means new tools• Seniors need to promote alternative forms of
scholarship• Juniors need to respond
Reward Systems Need to Change
Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia PLoS Comp Biol 2011 7(10 e1002001
ICBP Houston April 27, 2012
What Are these Alternative Forms of Scholarship?
Research[Grants]
JournalArticle
ConferencePaper
PosterSession
Reviews
BlogsCommunity Service/Data
Curation
Reward Systems Need to Change
ICBP Houston April 27, 2012Reward Systems Need to Change
ICBP Houston April 27, 2012
A Unique Identifier is Going to Happen
• It is DOIs for people• Some scientists will
resist• The winner is ORCID?
Reward Systems Need to Change
ICBP Houston April 27, 2012
Ideally the ID will be Tagged to Every Piece of Scholarly Communication
I an Not a Scientist I am a NumberPLoS Comp. Biol. 2008 4(12) e1000247
Reward Systems Need to Change
ICBP Houston April 27, 2012
One Solution: Use the Traditional Reward System in New Ways
The Wikipedia Experiment – Topic Pages
• Identify areas of Wikipedia that relate to the journal that are missing of stubs
• Develop a Wikipedia page in the sandbox
• Have a Topic Page Editor review the page
• Publish the copy of record with associated rewards
• Release the living version into Wikipedia
ICBP Houston April 27, 2012
How Can We Do Better?
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management and discovery tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
ICBP Houston April 27, 2012
The Truth About My Laboratory
• I have ?? mail folders!
• The intellectual memory of my laboratory is in those folders
• This is an unhealthy hub and spoke mentality
We Need Scientist Management Tools
ICBP Houston April 27, 2012
The Truth About My Laboratory
• I generate way more negative that positive data, but where is it?
• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …
• Software is open but where is it?• Farewell is for the data too
Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136
We Need Scientist Management Tools
http://artbyvida.com/portfolio.php
ICBP Houston April 27, 2012
Many Great Tools Out There
We Need Scientist Management Tools
Taverna
ICBP Houston April 27, 2012
The Dream of Discovery Informatics
• At the end of the day a software agent reviews all of our labs electronic notebooks. Common themes and individual interests are extracted and searched against recent literature, public data, blogs, other social media and results returned and ranked for perusal next morning over coffee.
ICBP Houston April 27, 2012
How Can We Do Better?
• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate
i.e. we need new kinds of fast, versatile publications and data archives
– We need to be more open with both– We need to think more about the tools that analyze, visualize
and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet
Easy Hard
Yes YouTube Can Increase the Rate of Discovery
Unleash the full power of the Internet
ICBP Houston April 27, 2012
The Lab ExperimentPaper+Rich Media
• My students enjoyed the experience• The shyest student was actually the most bold
in front of the camera• “We will become a generation of “science
castors”• They liked the exposure for the most part –
rather than the PI it puts them out in front
Unleash the full power of the Internet
ICBP Houston April 27, 2012
Organic Growth
• Some of their work viewed 20,000+ times• Global audience of researchers, educators and
academic/research institutions– 60,000 unique visitors & 2M pageviews/month– 16,000 registered users & 600 communities– 5,000 uploads of video content (about journal articles,
conferences, research news and classes)– Growing 4-5% monthly
• Sustainability - evolving a business model supporting journals and conferences
3 Years Laterwww.scivee.tv
Unleash the full power of the Internet
ICBP Houston April 27, 2012
Products
ApplicationProduct Primary Customers
Journals PubCast Journals, publishers, societies
Meetings PosterCast Societies, conference orgs.SlideCast
Comm. PaperCast Societies, journalsPodcastSlideCast
Education PosterCast Societies, universitiesSlideCast
Books BookCast Publishers, book sellers
What Emerged: SciveeCasts
Unleash the full power of the Internet
ICBP Houston April 27, 2012
AndroidiPhone
Windows Phone 7
Step 1presenter starts
PowerPoint
Step 2presenter starts
recording onsmart phone
Step 3presenter stops recording and
initiates upload
Slides
Website
Step 5slides and podcastare automatically
synchronizedSync FilePodcast
Step 6listener
plays back synchronized presentation
Proposal - The TeachU WorkflowMacPC
Step 4slides areuploaded
ICBP Houston April 27, 2012
Acknowledgements• BioLit Team
– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn
• MBT– John Moreland– John Beaver
• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey
• wwPDB team– Andreas Prilc– Dimitris Dimitropoulos
• SciVee Team– Apryl Bailey– Leo Chalupa– Lynn Fink– Marc Friedman (CEO)– Ken Liu– Alex Ramos– Willy Suwanto– Ben Yukich
http://www.scivee.tv
http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit
ICBP Houston April 27, 2012
What Is Open Science
• Unrestricted access and reuse of scientific knowledge as found in the literature and elsewhere provided attribution is given
• Ditto the data, protocols, software etc. from which that knowledge is derived
• Something catalyzed by the Fourth Paradigm
ICBP Houston April 27, 2012
What Motivates Me to Talk About Open Science?
• I am a domain (life) scientist not a computer or information scientist
• I have been co-directing a major open and freely accessible biological data source – the Protein Data Bank (PDB) for the past 11 years.
• Almost 6 years ago I co-founded and remain the founding Editor in Chief of the open access journal PLoS Computational Biology
• I co-founded SciVee.tv to disseminate science in new ways
• There must be a business model to enable persistence and growth
ICBP Houston April 27, 2012
What Are the Promises of Open Science?
• To accelerate the rate of scientific discovery worldwide
• To enable contributions from a broader geographic and economic base
• To approach learning and comprehension in new ways
• To reach a broader audience including the general public
ICBP Houston April 27, 2012
MBT Featureshttp://mbt.sdsc.edu
• Offer a framework not an end user application
• Responds to the data type• Support read write access• Encourages others to
write end user applications
• Discourages feature creep
Think More About the Tools
Immunome Research, 2007 3(1):3
Immunologists
MedicinalChemists
BMC Bioinformatics 2005, 6:21.