“elixir and open data” view from an elixir node” presentation given by barend mons, scientific...

19
European Life Sciences Infrastructure for Biological Information www.elixir-europe.org Barend Mons Prof. Biosemantics, LUMC, Scientific director NBIC, Head of ELIXIR Node 18 December 2013 ELIXIR Launch, Brussels “ELIXIR and Open Data” View from an ELIXIR Node”

Upload: elixir-europe

Post on 07-Nov-2014

229 views

Category:

Technology


0 download

DESCRIPTION

“ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

TRANSCRIPT

Page 1: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

European Life Sciences Infrastructure for Biological Information www.elixir-europe.org

Barend Mons Prof. Biosemantics, LUMC, Scientific director NBIC,

Head of ELIXIR Node

18 December 2013 ELIXIR Launch, Brussels

“ELIXIR and Open Data”

View from an ELIXIR Node”

Page 2: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

Outline

• The Dutch Node

• Data in the eScience era: Pattern Recognition and Excavation

• Data Collection, Archiving and: Reduction

• Why is ELIXIR important for Open Data?

• What are the needs of clinical institutes and Industry: Open and Managed Data

• Why training as part of ELIXIR?

2

Page 3: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

3 Nodes and a Hub in NL (DTL) > Node in >>>>>>

Data interoperability and exchange

Training & Education

Compute and storage

infrastructure services

Page 4: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

The Data cycle in eScience

4 Data Stewardship covers the entire datacycle >>>>>> ?

Page 5: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

Cells and Organs

Organism

Page 6: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

The Data Challenge

6

• Computer speed and storage capacity is doubling every 18 months and this rate is steady

• DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade

Guy Cochrane, ENA, EMBL-EBI

Proper Data stewardship and analysis may be THE limiting factor in eScience

Page 7: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

Simplified eScience

7

All

Legacy

information

User

New

dataset

New Insights

The Goal is Knowledge Discovery, not Data Collection

Page 8: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

8

X

AREAL SURVEY DEEP EXCAVATION

Pattern Recognition in Open Data and detailed Excavation should be separated

Page 9: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

9 How do we discover patterns in „Ridiculograms‟?

The Explicitome:

1014 Individual explicit

associations

Page 10: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

The Semantic Web approach to interoperability

10

Cardinal Assertion

n identical

assertions

„n‟ different

provenances

The Unique Explicitome: 1011 Cardinal Assertions

Page 11: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

We publish about less than a million LS concepts

11 106 concept clusters (Knowlets)

Page 12: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

12 ≈99.999996% reduction of infoburden

Zipping the Explicitome

1014 Individual explicit

associations

1011

CA‟s

5 x 105

Knowlets

Page 13: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

In silico knowledge discovery for the millions..

13 Reasoning takes place on aggregated and zipped data

In silico hypothesis generation In cerebro rationalisation

And confirmational reading

experimentation Enrichment of the explicitome

Page 14: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

The Implicitome

14 > 1 M hypotheses hidden in the implicitome

Page 15: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

eScience & ELIXIR

15

All

Legacy

information

User

New

dataset

New Insights

The Goal is Knowledge Discovery, not Data Collection

Page 16: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

The Role of ELIXIR: Open versus Managed

16

All

Legacy

information

ELIXIR

Data sets

Private

Data sets

Tools and standards for

interoperability

Clinical

Data sets

Public

Data sets

Data Collection(s) Anywhere everywhere

Page 17: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

17

For Big Data to become huge, however, there are still

hurdles to leap. For one thing, the tools to analyse data are

not yet good enough. And people with the skills to analyse

data are scarce and will become scarcer. By 2018 there will

be a “talent gap” of between 140,000 and 190,000 people,

Not only „hardware‟

Only three things count: Experts Experts Experts

Page 18: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

Vision (personal, not necessarily ELIXIR)

• Data Collections are not a goal in themselves

• They ultimately serve Knowledge Discovery

• E-datastewardship in a time of plenty is thus also data zipping

• E-datastewardship should address the entire data cycle

• ELIXIR is more about a trusted partner for the ‘Tools&Rules’ for data stewardship and interoperability than about data archiving.

• Interoperability is for both humans and computers.

• Open data needs to be talking to ‘closed data’

• Data experts need the place in the egosystem they deserve

18

Page 19: “ELIXIR and Open Data” View from an ELIXIR Node” presentation given by Barend Mons, Scientific Director NBIC at ELIXIR Launch event, 18th December 2013

19

SHARED knowledge

is

Double Knowledge

„Knowledge is like love: it multiplies when shared‟