scott edmunds open data examples, from the science as an open enterprise session at wikisym 2013

22
pen Science Example 2013

Upload: gigascience-bgi-hong-kong

Post on 28-Jan-2015

106 views

Category:

Technology


2 download

DESCRIPTION

Scott Edmunds Open data examples from the Science as an Open Enterprise workshop at Wikisym, 7th August 2013 at the Cyberport, Hong Kong

TRANSCRIPT

Page 1: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

Open Science Examples

2013

Page 2: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

www.gigasciencejournal.com

Journal, data-platform and database for large-scale data

Editor-in-Chief: Laurie GoodmanExecutive Editor: Scott Edmunds

Commissioning Editor: Nicole NogoyLead Curator: Chris Hunter

Data Platform: Peter Li

in conjunction with

Page 3: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

GigaProject: deconstructing the paper

www.gigadb.orgwww.gigasciencejournal.com

Worlds largest genomics organisation with: 20PB storage, 20.5K cores, 212TFlops, >1000 bioinformaticians

Utilizes big-data infrastructure and expertise from:

Combining and integrating:Open-access journal

Data Publishing Platform

Data Analysis Platform

Page 4: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

V

Lessons Learned from Genomics: Sharing is important…Bermuda Accords 1996/1997/1998Fort Lauderdale Agreement, 2003

Page 5: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

Sharing aids individuals…

Piwowar HA, Day RS, Fridsma DB (2007) PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308

Sharing Detailed Research Data Is Associated with Increased Citation Rate.

Every 10 datasets collected contributes to at least 4 papers in the following 3-years.Piwowar, HA, Vision, TJ, & Whitlock, MC (2011). Data archiving is a good investment Nature, 473 (7347), 285-285 DOI: 10.1038/473285a

Page 6: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

19961997

19981999

20002001

20022003

20042005

20062007

20080

100

200

300

400

500

600

700rice wheat

Rice v Wheat: consequences of publically available genome data.

Sharing aids specific communities…

Papers

Page 7: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

...cool stuff you can do with:

Open-Data

Open-Access

Page 8: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

To maximize its utility to the research community and aid those  fighting the current epidemic, genomic data is released here into the public domain under a CC0 license. Until the publication of research papers on the assembly and whole-genome analysis of this isolate we would ask you to cite this dataset as:

Li, D; Xi, F; Zhao, M; Liang, Y; Chen, W; Cao, S; Xu, R; Wang, G; Wang, J; Zhang, Z; Li, Y; Cui, Y; Chang, C; Cui, C; Luo, Y; Qin, J; Li, S; Li, J; Peng, Y; Pu, F; Sun, Y; Chen,Y; Zong, Y; Ma, X; Yang, X; Cen, Z; Zhao, X; Chen, F; Yin, X; Song,Y ; Rohde, H; Li, Y; Wang, J; Wang, J and the Escherichia coli O104:H4 TY-2482 isolate genome sequencing consortium (2011) Genomic data from Escherichia coli O104:H4 isolate TY-2482. BGI Shenzhen. doi:10.5524/100001 http://dx.doi.org/10.5524/100001

Our first DOI:

To the extent possible under law, BGI Shenzhen has waived all copyright and related or neighboring rights to Genomic Data from the 2011 E. coli outbreak. This work is published from: China.

Page 9: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 10: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 11: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

“The way that the genetic data of the 2011 E. coli strain were disseminated globally suggests a more effective approach for tackling public health problems. Both groups put their sequencing data on the Internet, so scientists the world over could immediately begin their own analysis of the bug's makeup. BGI scientists also are using Twitter to communicate their latest findings.”

“German scientists and their colleagues at the Beijing Genomics Institute in China have been working on uncovering secrets of the outbreak. BGI scientists revised their draft genetic sequence of the E. coli strain and have been sharing their data with dozens of scientists around the world as a way to "crowdsource" this data. By publishing their data publicy and freely, these other scientists can have a look at the genetic structure, and try to sort it out for themselves.”

Page 12: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 13: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

Downstream consequences:

“Last summer, biologist Andrew Kasarskis was eager to help decipher the genetic origin of the Escherichia coli strain that infected roughly 4,000 people in Germany between May and July. But he knew it that might take days for the lawyers at his company — Pacific Biosciences — to parse the agreements governing how his team could use data collected on the strain. Luckily, one team had released its data under a Creative Commons licence that allowed free use of the data, allowing Kasarskis and his colleagues to join the international research effort and publish their work without wasting time on legal wrangling.”

1. Citations (~160) 2. Therapeutics (primers, antimicrobials) 3. Platform Comparisons

4. Example for faster & more open science

Page 14: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

1.3 The power of intelligently open dataThe benefits of intelligently open data were powerfully illustrated by events following an outbreak of a severe gastro-intestinal infection in Hamburg in Germany in May 2011. This spread through several European countries and the US, affecting about 4000 people and resulting in over 50 deaths. All tested positive for an unusual and little-known Shiga-toxin–producing E. coli bacterium. The strain was initially analysed by scientists at BGI-Shenzhen in China, working together with those in Hamburg, and three days later a draft genome was released under an open data licence. This generated interest from bioinformaticians on four continents. 24 hours after the release of the genome it had been assembled. Within a week two dozen reports had been filed on an open-source site dedicated to the analysis of the strain. These analyses provided crucial information about the strain’s virulence and resistance genes – how it spreads and which antibiotics are effective against it. They produced results in time to help contain the outbreak. By July 2011, scientists published papers based on this work. By opening up their early sequencing results to international collaboration, researchers in Hamburg produced results that were quickly tested by a wide range of experts, used to produce new knowledge and ultimately to control a public health emergency.

Page 15: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 16: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 17: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013
Page 18: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

New funding models

Page 19: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

The rise of the independent scientist?

http://www.perlsteinlab.com/

Page 20: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

Biggest crowdfunding successes

Page 21: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

The Peoples Parrot: Amazona vittata Puerto Rican Parrot Genome ProjectRarest parrot, national bird of Puerto Rico

Community funded from artworks, fashion shows, beer, crowdfunding…

Genome annotated by students in community college as part of bioinformatics education

Paper and Data published in GigaScience and GigaDB

Taras K Oleksyk, et al., (2012) A Locally Funded Puerto Rican Parrot (Amazona vittata) Genome Sequencing Project Increases Avian Data and Advances Young Researcher Education. GigaScience 2012, 1:14Steven J. O’Brien. (2012): Genome empowerment for the Puerto Rican parrot – Amazona vittata. GigaScience 2012, 1:13Oleksyk et al., (2012): Genomic data of the Puerto Rican Parrot (Amazona vittata) from a locally funded project. GigaScience. http://dx.doi.org/10.5524/100039

Page 22: Scott Edmunds Open data examples, from the Science as an Open Enterprise session at Wikisym 2013

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Huayen Gao (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.openaccesscentral.com/blogs/gigablog/

Peter LiChris HunterJesse Si ZheNicole NogoyTam SneddonAlexandra BasfordLaurie Goodman

Follow us:www.gigadb.org

galaxy.cbiit.cuhk.edu.hkwww.gigasciencejournal.com

CBIIT

Funding from:Our collaborators:team: