from open data to open science, by geoffrey boulton

31
From Open Data to Open Science Geoffrey Boulton University of Edinburgh & CODATA “Learn” Workshop University College, London January 2016

Upload: learn-project

Post on 11-Jan-2017

576 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: From Open Data to Open Science, by Geoffrey Boulton

From Open Data to Open Science

Geoffrey BoultonUniversity of Edinburgh & CODATA

“Learn” WorkshopUniversity College, LondonJanuary 2016

Page 2: From Open Data to Open Science, by Geoffrey Boulton

:

Knowledge and understanding - the engines of material progress

depend on technologies that enable their accumulation and communication

1454 2002

Page 3: From Open Data to Open Science, by Geoffrey Boulton

Openness – the bedrock of science in the modern era

Henry Oldenburg

Page 4: From Open Data to Open Science, by Geoffrey Boulton

Scientific self correction

Page 5: From Open Data to Open Science, by Geoffrey Boulton

/var/folders/ls/nv6g47p94ks4d11f1p72h2ch0000gn/T/com.apple.Preview/com.apple.Preview.PasteboardItems/rutford_avo_afi_ed_july2010 (dragged).pdf

The Challenge: the “Data Storm” is undermining “self correction”

THEN AND NOW

Page 6: From Open Data to Open Science, by Geoffrey Boulton

A crisis of reproducibility and credibility?

Why such low levels of reproducibility?• Misconduct/fraud• Invalid reasoning• Absent or inadequate data and/or metadata

Page 7: From Open Data to Open Science, by Geoffrey Boulton

19

Exab

ytes

280

Exab

ytes

Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes

The digital revolutionGlobal information storage capacityIn optimally compressed bytes

DigitalStorage

Analogue Storage

Explosion of the Digital revolution

1986 1993

2000

2007

2014

- 4

000

Exa

byte

s

Page 8: From Open Data to Open Science, by Geoffrey Boulton

http://www.wired.co.uk/news/archive/2014-01/15/1000-dollar-genome/viewgallery/331679

Data acquistion: Cost down – Flux up

Page 9: From Open Data to Open Science, by Geoffrey Boulton

Information: how much is crystallised into knowledge?

Page 10: From Open Data to Open Science, by Geoffrey Boulton

Reinventing reproducibility for the digital age

How do we retain an essential principle?

The data providing the evidence for a published concept MUST be concurrently published, together

with necessary metadata and computer code.To do otherwise is scientific MALPRACTICE

Page 11: From Open Data to Open Science, by Geoffrey Boulton
Page 12: From Open Data to Open Science, by Geoffrey Boulton

Ozone Levels

Four key drivers of change for science

• Big data• Semantically-linked data• Open data• Cost reduction

Micro-satellite

Looking at clouds

Page 13: From Open Data to Open Science, by Geoffrey Boulton

Pillars of the Digital Revolution

Big Data

Volume

Velocity

Variety

Linked Open DataMany databases

SemanticRelations

Deeper meaning

Foundations : Openness

Machine analysis & learning Text and data mining

Page 14: From Open Data to Open Science, by Geoffrey Boulton

The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour

Uncoupledsystems

Simulating behaviour of highly coupled systems

Page 15: From Open Data to Open Science, by Geoffrey Boulton

Simulating system dynamics Mapping a complex state

Image of brain cells in a rat

Emergent behaviour of a specific 6-component coupled system

• patterns not hitherto seen• unsuspected relationship• complex systems

e.g. complexity: dynamic evolution and system state

Scientific opportunities

Page 16: From Open Data to Open Science, by Geoffrey Boulton

Satellite observation Surface monitoring

The opportunity: data-modelling: iterative integration

Initial conditions

Model forecast

Model-data iteration - forecast correction

Page 17: From Open Data to Open Science, by Geoffrey Boulton

Linear regression

Cluster analysis

Dynamic/complex behaviour

Complex systemsNo mathematical pipeline

Simple relationshipsClassical statistics

System characterisations: from simple to complex

Glucose in type II diabetes

Topological analysis

Page 18: From Open Data to Open Science, by Geoffrey Boulton

A barrier to openness? - Analytic overload.E.g. - Global Earth Observation System of Systems

• What is the human role? • Can we analyse & scrutinise what is in the

black box? - &who owns the box?• What does it mean to be a researcher in a

data intensive age?

A disconnect between machine analysis & human cognition?

Page 19: From Open Data to Open Science, by Geoffrey Boulton

Mathematics related discussions

Tim Gowers- crowd-sourced mathematics

An unsolved problem posed on his blog.

32 days – 27 people – 800 substantive contributions

Emerging contributions rapidly developed or discarded

Problem solved!

“Its like driving a car whilst normal research is like pushing it”

What inhibits such processes?- The criteria for credit and

promotion – ALTMETRICS THE ANSWER?

New modes of technology-enabled creativity:

e.g Crowd-sourcing

Page 20: From Open Data to Open Science, by Geoffrey Boulton

The Open Data Iceberg

The Technical Challenge

The Consent Challenge

The Ecosystem Challenge

The Funding Challenge

The Support Challenge

The Skills Challenge

The Incentives Challenge

The Mindset Challenge

Processes &Organisation

People

motivation and ethos.

Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.

A National Infrastructure

Technology

Page 21: From Open Data to Open Science, by Geoffrey Boulton

The “Science International” Accord:principles of open data

(www.icsu.org/science-international)

Responsibilities1-2. Scientists3. Research institutions & universities4. Publishers5. Funding agencies6. Scholarly societies and academies7. Libraries & repositories

8. Boundaries of openness

Enabling practices9. Citation and provenance10. Interoperability11. Non-restrictive re-use12. Linkability

Page 22: From Open Data to Open Science, by Geoffrey Boulton

Responsibilities

Scientistsi. Publicly funded scientists have a responsibility to contribute to the

public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re-used and re-purposed.

ii. The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the

link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations. To the extent possible, data should be deposited in well-managed and trusted repositories with low access barriers.

Page 23: From Open Data to Open Science, by Geoffrey Boulton

African Open Data/Open Science Platform

Platform ForumCoordination

GovernmentPriority setting

FundersFunding

Incentives

Capacity Building

Training and Skills

Infrastructure Roadmaps

Flagship Co-Designed

Data Intensive Projects

InternationalStandards

Programmes

Shared infrastructure investment; shared good practice; capacity building; system development

Page 24: From Open Data to Open Science, by Geoffrey Boulton

Disciplinary communities can lead the way

e.g. Elixir programme in life sciences/bio-informatics

Page 25: From Open Data to Open Science, by Geoffrey Boulton

Regional Platforms for Open Science

AfricanPlatform?

Asian Platform?

European

Platform?

AustralianPlatform

Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards.

S.AmericanPlatform?

Page 26: From Open Data to Open Science, by Geoffrey Boulton

Inputs Outputs

Open access

Administrative data (held by

public authorities e.g.

prescription data)

Public Sector Research data

(e.g. Met Office weather

data)

Research Data (e.g.

CERN, generated in universities)

Research publications

(i.e. papers in journals)

Open data

Open science “science as a public enterprise”

Collecting the data

Doing research

Doing science openly

Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists

(communication/dialogue – joint production of knowledge)

Stakeholders

• Communication/dialogue must be audience-sensitive• Is it – with all stakeholder groups?

Page 27: From Open Data to Open Science, by Geoffrey Boulton

Open ScienceData / Publications

Rese

arch

ers

Mon

o/M

ulti I

nter

Tr

ansd

iscip

linar

y

Sta

keho

lder

s

Rigo

ur

Inno

vatio

n P

olic

y So

lutio

ns

Open Knowledge

Page 28: From Open Data to Open Science, by Geoffrey Boulton

A national data-intensive system

Page 29: From Open Data to Open Science, by Geoffrey Boulton

International Research Data Collaboration

CODATA Policies & practice Frontiers of data

science Capacity Building

WDS• Data stewardship• Data standards

RDA• Interoperability

Page 30: From Open Data to Open Science, by Geoffrey Boulton

1. Maintaining “self-correction”

2. Open knowledge is creative & productive

“If you have an apple and I have an apple and we

exchange these apples, then you and I will still each have

one apple. But if you have an idea and I have an idea and

we exchange these ideas, then each of us will have two

ideas.”

3. Open data enables semantic linking

George Bernard Shaw

Why openness & sharing?

Page 31: From Open Data to Open Science, by Geoffrey Boulton

• Openly collected science is already helping policy makers.• AshTag app allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).

Chalara spread: 1992-2012

Citizen Science