society for biocuration panel discussion, april 2013
DESCRIPTION
Society for biocuration panel discussion, April 2013TRANSCRIPT
Biocuration and scholarly communication cycle: roles and opportunities for biocurators
Panel DiscussionTheo Bloom, Editorial Director for Biology, PLOS
Hinxton, April 2013
2
Take-home / talking points / provocation
• The needs and motivations of authors and ‘users’ of the literature differ
• Some studies output structured data well• Many/most studies don’t, and here’s our
biggest problem• We need to move towards universal
solutions and away from bespoke ones• In the meantime there is a lot of help needed
3
What authors want
Publication credit
Kudos - “good journal”
First / fast
Easy
Compliant
4
What readers/users want
Reusability
Thorough
Complete
Replicable
Compliant
Growth in the cost of traditional publishing
6
PLOS BiologyOctober, 2003 PLOS Medicine
October, 2004
PLOS Community JournalsJune-September, 2005 October, 2007
PLOS ONEDecember,2006
Growth of PLOS journals and of Open Access
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20120
5,000
10,000
15,000
20,000
25,000
30,000
0
20,000
40,000
60,000
80,000
100,000
120,000
PLOS
Open access total (secondary axis)
8
PLOS ONE’s Key Innovation: the editorial process
Editorial criteria• Scientifically rigorous• Ethical• Properly reported• Conclusions supported by the data
Editors and reviewers do not ask• How important is the work?• Which is the relevant audience?
Everything that deserves to be published, will be published
9
Two types of study generating data
Type 1: the structured data is the output• e.g. DNA sequence, protein structure, clinical
trial results• Often large-scale / high-throughput• Curators and databases support these really
well, even small-scale studies
Type 2: the “paper” is the output• No structured database exists/ no
widespread agreement on standards
10
Provocation: the solutions we’re all proposing don’t deal with the main problems
• Adding more steps and checks to publication makes it slow (unpopular with authors)
• Assuming an expert editor handling each article makes it slow and expensive
• Some authors are moving towards preprints, blog-style publications, and definitely away from traditional journals (e.g. PLOS ONE)
• We need to fix the problems at the time the studies are done
Do some science
Write a description
Store some of the data somewhere…
Do some science
Write a narrative description that is
inextricably linked to the data and methods
Integrated collection of methods, results, data, metadata
Store all of the data somewhere useful and link to publication
13
Where should data go?
• Curated, subject-specific, open access, long-term databases (GenBank, ArrayExpress)
• General non-specific repositories: Dryad, FigShare, Institutional (bigger is better? Can we have a ‘kite-marked’ list?)
• Supplementary files with the article (heterogeneous, poorly formatted, hard to collate/mine)
• NOT: the author’s website or file drawer
Steps towards better data handling - 1
Partnership with Dryad (www.datadryad.org)• Unstructured data ‘packages’ associated with
published articles• Freely available - CC0• A unique identifier (DOI) for each package• Statistics for access• Seamless tying together of article and data
Partnership with figshare (www.figshare.org)• figshare widget displays Supporting Information
files directly in the article • search, magnify, download singly or as a package
What to do with ‘homeless’ data?
Steps towards better data handling - 2Planning in hand for ‘data papers’ • Describes reusable dataset to support reuse• Publishes associated metadata
• Structured data cross-referenced to its “natural home” (e.g., protein structures to PDB)
• Unstructured data in PLOS Dataverse instance• Ensures valuable data actionable for reuse
• actionable formats • curated to reasonable standard• accessible in a recognized, stable repository
• Inherently reusable data• Valid experimental / observational design• Good quality control, ethical experiments• Data perceived to have “standalone” value
16
Publish your Big Paper
Send it to Science Exchange to reproduce
Independent scientists attempt to reproduce the study
Success! Science Exchange issues a
certificate of validation, which is posted on the paper
Reproduction is published in PLOS ONE and data is stored at figshare
Hopefully publish in PLOS ONE although this is not required
Failure! Authors think long and hard about what they’ve
done
Reproducibility Initiative
PLOS + partnerships:
18
Take-home / talking points / provocation
• The needs and motivations of authors and ‘users’ of the literature differ
• Some studies output structured data well• Many/most studies don’t, and here’s our
biggest problem• We need to move towards universal
solutions and away from bespoke ones• In the meantime there is a lot of help needed