open data is not enough: making data sharing work

35
Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License Open Data is Not Enough Making Data Sharing Work Mark A. Parsons 0000-0002-7723-0950 Secretary General American Chemical Society San Diego, California, USA 13 March 2016

Upload: research-data-alliance

Post on 15-Apr-2017

120 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Unless otherwise noted, the slides in this presentation are licensed by Mark A. Parsons under a Creative Commons Attribution-Share Alike 3.0 License

Open Data is Not Enough Making Data Sharing Work

Mark A. Parsons0000-0002-7723-0950Secretary General

American Chemical SocietySan Diego, California, USA13 March 2016

All of society’s grand challenges require diverse

(often large) data to be shared and integrated

across cultures, scales, and technologies.

Research Data Alliance

Vision Researchers and innovators openly share data across technologies, disciplines, and countries to address the grand challenges of society.

Mission RDA builds the social and technical bridges that enable open sharing of data.

Dynamics of Infrastructure Edwards, et al. 2007 Understanding Infrastructure: Dynamics, Tensions, and Design.

• Infrastructures become “ubiquitous, accessible, reliable, and transparent” as they mature.

• Systems Networks Inter-networks

• “system-building, characterized by the deliberate and successful design of technology-based services.”

• “technology transfer across domains and locations results in variations on the original design, as well as the emergence of competing systems.”

• Finally, “a process of consolidation characterized by gateways that allow dissimilar systems to be linked into networks.”

Not what, but When is infrastructure?

Not what, but When and Who is infrastructure?

Bridges and Gateways

Gateways are often wrongly understood as “technologies,” i.e. hardware or software alone. A more accurate approach conceives them as combining a technical solution with a social choice, i.e. a standard, both of which must be integrated into existing users’ communities of practice. Because of this, gateways rarely perform perfectly. — Edwards et al. 2007

Infrastructure is

Relationships, interactions, and connections between people, technologies, and institutions

FranBerman,ResearchDataAlliance

“Create - Adopt - Use” (in 12-18 months)

Systems Interoperability

Adopted Policy

Sustainable Economics

Common Types, Standards, Metadata

TrafficImage:MikeGonzalez

Adopted Community Practice

Training, Education, Workforce

Shared Principles

• Openness

• Consensus

• Balance

• Harmonization

• Community Driven

• Non-profit

Solving the problem must include adopters in the process.

Image courtesy bigthink.com

Open problem solving is key.

Figure courtesy webbirdmedia.com

No defined architecture.

Architecture figure courtesy edrawsoft.com

rd-alliance.org

SouthAmerica1%

NorthAmerica34%

Europe49%

Australasia4%

Asia9%

Africa3%

OrganizationalTypeMembers (Feb2016)

Press&Media 22Policy/FundingAgency 58LargeEnterprise 85ITConsultancy/Development 119SmallandMediumEnterprise 212Other 198Government/PublicServices 583Academia/Research 2447TOTAL 3724

TheRDACommunity:3700+membersfrom110countries

(February2016)

May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr May-July Aug-Oct Nov-Jan Feb-Apr

392

9911274

16562048

24042636

28813126

34343698 3724

60+ Working and Interest Groups

RDA Organisational Members

RDA Affiliate Members

https://rd-alliance.org/organisation/rda-organisation-affiliate-members.html

RDA Organisational & Affiliate members

RepresenttheinterestsofRDA’sorganisationalmembersandensurethattheirinputandneedsplayaroleinguidingtheprogramsandactivitiesoftheRDA.

FranBerman,ResearchDataAlliance

RDA: Accelerate Data Sharing and Interoperability Across Cultures, Communities, Scales, Technologies

▪ Technicalpartsofthedataengine:▪ Datatyperegistriesreferencemodel▪ Wheatdatainteroperabilityframework

▪ Rulesoftheroad:▪ Commonagreementondatacitation▪ Commonpracticefordatarepositories▪ Principlesoflegalinteroperability

▪ Betterdrivers• Summerschoolsindatascienceandcloud

computinginthedevelopingworld(withCODATA)

• Activedatamanagementplandevelopmentandmonitoring

Policy and Practice

Systems Interoperability

Sustainable Economics

Common Types, Standards, Metadata

Training, Education, Workforce

Working Glocally—Bridging across scales

Glocalization “means the simultaneity—the co-presence—of both universalizing and and particularizing tendencies.”

— Roland Robertson

Glocalism is playing at multiple scales at once.

The Wheat Data Interoperability WG

Active members: Alaux Michael (INRA, France), Aubin Sophie (INRA, France), Arnaud Elizabeth (Bioversity, France), Baumann Ute (Adelaide Uni, Australia), Buche Patrice (INRA, France), Cooper Laurel (Planteome, USA), Fulss Richard (CIMMYT, Mexico), Hologne Odile (INRA, France), Laporte Marie-Angélique (Bioversity, France), Larmand Pierre (IRD, France), Letellier Thomas (INRA, France), Lucas Hélène (INRA, France), Pommier Cyril (INRA, France), Protonotarios Vassilis (Agro-Know, Greece), Quesneville Hadi (INRA, France), Shrestha Rosemary (INRA, France), Subirats Imma (FAO of the United Nations, Italy), Aravind Venkatesan (IBC, France), Whan Alex (CSIRO, Australia) Co-chairs: Esther Dzalé Yeumo Kaboré (INRA, France), Richard Allan Fulss (CIMMYT, Mexico)

� Aims: contribute to the improvement of Wheat related data interoperability by � Building a common interoperability framework (metadata, data formats and vocabularies) � Providing guidelines for describing, representing and linking Wheat related data

Contributors

Sponsors

slide courtesy Esther Dzalé

� Guidelines (http://wheatis.org/DataStandards.php) � Data exchange formats

� Example: VCF (Variant Call Format) for sequence variation data, GFF3 for genome annotation data, etc.

� Data description best practices � Consistent use of ontologies, consistent use of external database cross references

� Data sharing best practices � Share data matrices along with relevant metadata (example: trait along with

method, units and scales or environmental ones) � Useful tools and use cases that highlight data formats and vocabularies issues

� A portal of wheat related ontologies and vocabularies (http://agroportal.lirmm.fr/ontologies?filter=WHEAT) � Allows the access to the ontologies and vocabularies through APIs.

� A prototype � Implementation of use cases of wheat data integration within the AgroLD

(Agronomic Linked Data) tool: http://volvestre.cirad.fr:8080/agrold/

The deliverables

slide courtesy Esther Dzalé

RDA Chemistry Data Interest Group

• Involves International Union of Pure & Applied Chemistry

• Building connections to instrument makers

• Connecting to other RDA groups including Materials Data and Data Citation

• Planning Working Groups

Some themes amidst the difference

1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything!

2. Certifying Trust in assertions, evidence, organisations, processes…

3. The value of Conversations, Relationships, and Mediation — an agile network effect.

‹#›An Area of Convergence and Agreement

Internet Domain

nodes with IP numbers

packages being exchanged

standardized protocols

Data Domain

objects with PID numbers

objects being exchanged

standardized protocols

Slide courtesy P. Wittenberg from L. Lannom from D. Clark

Some themes amidst the difference

1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything!

2. Certifying Trust in assertions, evidence, organisations, processes…

3. The value of Conversations, Relationships, and Mediation — an agile network effect.

Increasing Complexity of Mediation

From: C. Borgman, 2008, NSF Cyberlearning Report

Some themes amidst the difference

1. Persistent Identifiers for data, documents, people, organisations, instruments—Everything!

2. Certifying Trust in assertions, evidence, organisations, processes…

3. The value of Conversations, Relationships, and Mediation — an agile network effect.Trust

• When or do we need to certify trust? Do we?

• We must preserve the freedom to tinker.

• Build in decentralization where possible. Any centralization must be community governed.

• Trust is built through

• shared experience— e.g., RDA Plenaries

• shared perspectives — RDA is a forum for engagement and constructive disagreement

• actual reuse and adoption — in RDA consensus is defined through use.

• sustained performance — RDA seeks to build a broad coalition of international support

Some amateur thoughts on trust and sharing and infrastructure

Getting involved

Individuals✓Observers✓Contributors✓Drivers

31

Organisations✓ Insight✓ Adopt✓ Drive

Nationallevel✓ Coordination&Knowledge

Exchange,Strategy&/orImplementation

• Members• WGs-IGs-BoFs• Requestsfor

Comments• Plenaries

• Member• WGs-IGs-BoFs• RfCs• Fundedprojects• Adoption/Uptake

• Papers&Events• Meetings&Fora• Training&Workshops• Uptakepilots

https://rd-alliance.org/about/get-involved.html

12-16 September 2016in

Denver, Colorado, USA

34RDA Interest (IG) and Working Groups (WG) by Focus 1

35RDA Interest (IG) and Working Groups (WG) by Focus 2