data infrastructure for coastal and estuarine science

40
Data Infrastructures for Estuarine and Coastal Science Anne E. Thessen http:// www.slideshare.net/ athessen [email protected]

Upload: anne-thessen

Post on 10-May-2015

226 views

Category:

Science


1 download

DESCRIPTION

This talk was given at the Atlantic Estuarine Research Society at their 2014 Sprint meeting in Ocean City, Maryland, USA

TRANSCRIPT

Page 1: Data Infrastructure for Coastal and Estuarine Science

Data Infrastructures for Estuarine and Coastal Science

Anne E. Thessen

http://www.slideshare.net/[email protected]

Page 2: Data Infrastructure for Coastal and Estuarine Science

Photo Credit: NASA/ GSFC/ NOAA/ USGS

Page 3: Data Infrastructure for Coastal and Estuarine Science

Outline

• Why are we talking about data infrastructures?

• What are the challenges?• What are the requirements?• What parts are already available?• How do we get there?• PSA

Page 4: Data Infrastructure for Coastal and Estuarine Science

Data Type Important Easy

Atmospheric Data 52.2% 21.6%Climate Data 56.0% 23.3%Oceanographic Data 42.5% 18.9%Geophysical Data 55.5% 22.0%Geological Data 56.3% 19.8%Critical Zone Data 19.3% 8.2%Hydrology Data 48.4% 20.1%

Results from EarthCube Stakeholder Alignment Survey

Why Are We Talking About Data Infrastructure?

Page 5: Data Infrastructure for Coastal and Estuarine Science

Working with multiple data sets from many disciplines?

Working with multiple data sets within a discipline?

88.1% say it is important23.5% say it is easy

70.7% say it is important9.8% say it is easy

Results from EarthCube Stakeholder Alignment Survey

Why Are We Talking About Data Infrastructure?

Page 6: Data Infrastructure for Coastal and Estuarine Science

Why Are We Talking About Data Infrastructure?

• “Data Deluge”• Large-scale problems• Maturation of the internet• Increased investment (i.e.

EarthCube)• Estuarine and coastal

science has interdisciplinary nature and strong sharing culture

Page 7: Data Infrastructure for Coastal and Estuarine Science

User Needs

Where Do We Start?

Available Technology

Existing Infrastructure

Incentives

Page 8: Data Infrastructure for Coastal and Estuarine Science

Sociological

Technological

• Data sharing• Incentives• Data cultures• Science practices• Massive heterogeneity

• Storage capacity• Moving data around• Efficient query• Processing speed• Knowledge representation

Page 9: Data Infrastructure for Coastal and Estuarine Science

Stakeholder Assessment

Data producers

Photo Credit: The University of Nottingham Photo Credit: Kay Nietfeld/EPA

Data consumers

Page 10: Data Infrastructure for Coastal and Estuarine Science

What is the current state of sharing?

• Data sharing varies widely by discipline– No universal rules or agreements– Sharing in marine science is 40%– Other disciplines - 10% to 100%

Page 11: Data Infrastructure for Coastal and Estuarine Science

What is the current state of sharing?

• Data sharing varies widely and by discipline• Far more scientists say they are willing to

share data than actually do– Time to prepare– Concerns about misuse

Page 12: Data Infrastructure for Coastal and Estuarine Science

What is the current state of sharing?

• Data sharing varies widely and by discipline• Far more scientists say they are willing to

share data than actually do• Lack of access to data is a major impediment

Page 13: Data Infrastructure for Coastal and Estuarine Science

If sharing is so important why aren’t more people doing it?

The large proportion of researchers who claim to be willing to share data and the low numbers of researchers who actually make their data easily available suggests that data sharing would increase substantially if the proper infrastructure were in place.

Page 14: Data Infrastructure for Coastal and Estuarine Science

Reasons for Not Sharing

• Not enough time or funding• No place to put the data• No standards or policies for sharing• Others have no need for the data• Loss of control• No way to get credit• Sensitive data cannot be shared• Errors will be exposed• Loss of competitiveness

Page 15: Data Infrastructure for Coastal and Estuarine Science

Social Infrastructure Requirements

• Repository capability• Place conditions on access• Mechanisms for data citation and credit• Data sharing policy• Value added services• Requirements from publishers and funders• Respect for confidentiality• Ease of use

Page 16: Data Infrastructure for Coastal and Estuarine Science

We need a system that can

• Share• Preserve• Digitize• Automate• Integrate– Data– Infrastructure

Page 17: Data Infrastructure for Coastal and Estuarine Science

Data Set Size

Page 18: Data Infrastructure for Coastal and Estuarine Science

Data Set Heterogeneity

• Data format• Data file format• Data quality and completeness• Physical samples

Page 19: Data Infrastructure for Coastal and Estuarine Science

What Will We Do With the Data?

• Preserve Data– Format migration– Redundancy– Self-Repair

• Serve Data– Discoverable– Accessible– Usable

Page 20: Data Infrastructure for Coastal and Estuarine Science

Technical Infrastructure Requirements

• Preservation• Layered service architecture• Repository functions• Accommodate heterogeneity• Bridge digital and physical

Page 21: Data Infrastructure for Coastal and Estuarine Science

Review Requirements

Sociological• Repository capability• Place conditions on access• Mechanisms for data citation

and credit• Data sharing policy• Value added services• Requirements from

publishers and funders• Respect for confidentiality• Ease of use

Technological• Preservation• Layered service architecture• Repository functions• Accommodate

heterogeneity• Bridge digital and physical

Page 22: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Repositories

Page 23: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Citation

Repositories

Page 24: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Preservation

Repositories

Citation

Page 25: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Quality Control and Usage Metrics

Repositories

Citation

Preservation

Crowd Sourcing

Web 2.0

Page 26: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Integration

Repositories

Citation

Preservation

Quality and Metrics

Web 3.0

Page 27: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Mobilization

Repositories

Citation

Preservation

Quality and Metrics

Integration

Page 28: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Access Protocols

Web Services

Data Brokers Repositories

Citation

Preservation

Quality and Metrics

Integration

Mobilization

Page 29: Data Infrastructure for Coastal and Estuarine Science

What is Available?

Standards

Repositories

Citation

Preservation

Quality and Metrics

Integration

Mobilization

Access

Page 30: Data Infrastructure for Coastal and Estuarine Science

How Can it all Fit Together?

Quality and

Metrics

Access

Citation

PreservationMobilization

Integration

Repositories

Standards

Page 31: Data Infrastructure for Coastal and Estuarine Science

Who Should Be Doing All This Work?

• Librarians• Data Scientists• Informaticians• Ontologists• Computer Scientists• Software Developers• Standards Groups

Image by Michael Krigsman

Page 32: Data Infrastructure for Coastal and Estuarine Science

PSA

Page 33: Data Infrastructure for Coastal and Estuarine Science

Why Share Data?

• Increased recognition• Increased economic opportunities• Improved data set• Improved science• Time and money saved

Page 34: Data Infrastructure for Coastal and Estuarine Science

Photo Credit: Emergency Cleaning Solutions

Page 35: Data Infrastructure for Coastal and Estuarine Science

Photo Credit: The Collared Sheep

Page 36: Data Infrastructure for Coastal and Estuarine Science
Page 37: Data Infrastructure for Coastal and Estuarine Science

Acknowledgements

• Benjamin Fertig• David Patterson• Mike Kemp• John Milliman• Melissa Cragin• Sayeed Choudhury• Tim DiLauro• Carol Palmer

• Nathan Wilson• Alan Renear• Ruth Duerr• Cyndy Chandler• Peter Fox• Krishna Sinha• Janet Fredericks• Carl Lagoze

Page 38: Data Infrastructure for Coastal and Estuarine Science

Questions?

Page 39: Data Infrastructure for Coastal and Estuarine Science

ReferencesAtkins DE, Droegemeier KK, Feldman SI, Garcia-Molina H, Klein ML, Messerschmitt DG, Messina P, Ostriker JP, Wright MH.

2003. Revolutionizing science and engineering through cyberinfrastructure.

Borgman CL. 2010. Research data: who will share what, with whom, when, and why? Fifth China-North America Library Conference 2010

Borgman CL. 2012. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63(6):1059-1078

Burton A, Treloar A. 2009. Designing for discovery and re-use: the ANDS data-sharing verbs approach to service decomposition. The International Journal of Digital Curation 4.

Costello M. 2009. Motivating online publication of data. BioScience 59:418-426

Cragin MH, Palmer CL, Carlson JR, Witt M. 2010. Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A 368:4023-4038

Edwards PN, Mayernik MS, Batcheller AL, Bowker GC, Borgman CL. 2011. Science friction: data, metadata and collaboration. Social Studies of Science 41(5):667-690

Enke N, Thessen AE, Bach K, Bendix J, Seeger B, Gemeinholzer B. 2012. The User’s View on Biodiversity Data Sharing. Ecological Informatics 11: 25-33

Field D Sansone SA, Collis A, Booth T, Dukes P, Gregurick SK, Kennedy K, Kolar P, Kolker E, Maxon M, Millard S, Mugabushaka AM, Perrin N, Remacle JE, Remington K, Rocca-Serra P, Taylor CF, Thorley M, Tiwari B, Wilbanks J. 2009. ‘Omics data-sharing. Science 326:234-236

Froese R, Lloris D, Opitz S. 2003. Scientific data in the public domain. ACP-EU Fisheries Research Report 14:267-271.

Gleditsch NP, Strand H. 2003. Posting your data: will you be scooped or will you be famous? International Study Perspectives 4:89-97

Heidorn PB. 2008. Shedding light on the dark data in the long tail of science. Library Trends 57:280-299.

Henty M, Weaver B, Bradbury SJ, Simon P. 2008. Investigating data management practices in Australian Universities. APSR. QUT digital repository http://eprints.qut.edu.au/14549

Hey T, Tansley S, Tolle K. 2009. The Fourth Paradigm. Microsoft Research. Redmond, WA, USA, 252 pp.

Page 40: Data Infrastructure for Coastal and Estuarine Science

ReferencesKey Perspectives Ltd. 2010. Data Dimensions: disciplinary differences in research data-sharing, reuse and long term viability.

DCC Scarp Synthesis Report. ISSN 1759-586X

Laogze C, Patzke K. 2011. A research agenda for data curation cyberinfrastructure. JCDL’11

Mayernik MS, DiLauro T, Duerr R, Metsger E, Thessen AE Choudhury GS. 2013. Data Conservancy provenance, context and lineage services: key components for data preservation and curation. Data Science Journal 12:158-171

Palmer CL, Cragin MH, Heidorn PB, Smith LC. 2007. Data curation for the long tail of science: the case of environmental studies. Digital Curation

Palmer CL, Weber NM, Cragin MH. 2011. The analytic potential of scientific data: understanding re-use value. ASIST 2011

Piwowar HA, Day RS, Fridsma DB. 2007. Sharing detailed research data is associated with increased citation rate. PLoS ONE 3:e308

Savage CJ, Vickers AJ. 2009. Empirical study of data-sharing by authors publishing in PLoS journals. PLoS ONE 4: e7078

Sinha AK, Thessen AE, Barnes CG. 2013. Geoinformatics: towards an integrative view of Earth as a system, in Bickford, M.E., ed., The Web of Geological Sciences: Advances, Impacts, and Interactions: Geological Society of America Special Paper 500, p. 1-14. 10.1130/2013.2500(19)

Smith VS. 2009. Data publication: towards a database of everything. BMC Research Notes 2:113

Tenopir C, Allard S, Douglass KL, Aydinoglu AU, Wu L, Read E, Manoff M, Frame M. 2011. Data sharing by scientists: practices and perceptions. PLoS ONE 6.6

Thessen AE, Patterson DJ. 2011. Data issues in the life sciences. ZooKeys 150:15-51

Wallis JC, Mayernik MS, Borgman CL, Pepe A. 2010. Digital libraries for scientific data discovery and reuse: from vision to practical reality. Joint Conference on Digital Libraries 2010

Weber NM, Baker KS, Thomer AK, Chao TC, Palmer CL. 2012. Value and context in data use: domain analysis revisited. Proceedings of the American Society for Information Science and Technology. 49(1):1-10

Whitlock MC. 2011. Data archiving in ecology and evolution: best practices. TREE 26(2):61-65