real-world data challenges: moving towards richer data ecosystems

9
| 1 Anita de Waard 0000-0002-9034- 4119 VP Research Data Collaborations Elsevier RDM Services [email protected] Big Data PI Meeting March 16, 2016 Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Upload: anita-de-waard

Post on 11-Apr-2017

189 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 1

Anita de Waard 0000-0002-9034-4119VP Research Data CollaborationsElsevier RDM [email protected]

Big Data PI MeetingMarch 16, 2016

Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Page 2: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 2

ESGF-VLESGF

ESG-CETESG-II

ESG-I

Usablecapabilities

Futurecapabilities

Prototypecapabilities

1999-2001

2001-2006

2006-2011

2011-2020

2020-

Planned Earth System Grid System Evolution

Planned Earth System Grid System Data ArchivalModel

IntercomparisonProjects

Remote Sensing, In Situ, Climatology,

Diagnostics, Ecosystem, Hydrology, Biology,

Etc.

In s

itu

An

alys

is

Data

M

anag

emen

t

Dist

ribut

ed

Sear

ch

Fede

ratio

n Di

strib

uted

Co

mpu

tatio

n

Prov

enan

ce

Capt

ure

Auth

entic

atio

n &

Au

thor

izatio

n

Netw

ork

Anal

ytic

al

Mod

elin

g

Dyna

mic

Re

sour

ces

Data

Tr

ansf

er

Long

-tail

Pu

blic

atio

n

Data

Ci

tatio

n

Mac

hine

Le

arni

ng

Wor

kflo

w

Qual

ity C

ontro

l

&

Assu

ranc

e

Met

rics

User

Notif

icat

ion

User

In

terfa

ce

Vers

ion

Co

ntro

l

Repl

icat

ion

Petabytes (1015) Exabytes (1018)

1999 20222017Centralized Archive Distributed Data Ecosystem Virtual Laboratory

Source: Dean Williams, Lawrence Livermore/ESGF, March 1st 2017

Trend # 1: Repositories are becoming virtual labs

Page 3: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 3

Trend # 2: Scientists are Moving ‘Beyond Downloads’

Page 4: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 4

Trend # 3: Computers are scientists, too!

“intelligent systems for computer-aided discovery can complement and integrate

into the insight generation loop in scalable ways…”

http://ieeexplore.ieee.org/abstract/document/7515118/: Computer-Aided Discovery: Toward Scientific Insight Generation with Machine Support

“This work combines time series Principal Component Analysis with InSAR to constrain

the space of possible model explanations on current empirical data sets and achieve a better

identification of deformation patterns”

Page 5: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 5

Raising many technical/organisational/policy questions:

• Is Long-Tail Data + Semantics = Big Data?• Is Data Science a field, or a skill? (A department, or a class?)• Are supercomputing centers research departments or bits of infrastructure? (And if

infrastructure, are they part of IT? (“Oh, no, anything but that!”)• Are repositories places to store outputs, or places where science is conducted?• If so, how are repositories and HPC’s recognised and rewarded?• How can we keep track of (micro)provenance of parts of data sets?• Should we explore Blockchain technology for this? (“Oh no, anything but that!”)• Is a piece of software part of the University’s Research Outputs? • If so, how do we reward brilliant coders who blog, but don’t write?• How do we reward (virtual) collaboration? • Why won’t those damn scientists share their data?• Who will own the Data Science Cloud: Amazon? Or the joint HPC’s (NDS??) Is

NIH Data Commons the Model? Or is this a free for all? What is the role of commercial parties?

• Is data curation/stewardship a part of science, or a glorified administrator's job?• What is the role of libraries, in all this? • And why the hell is a publisher talking about it?

Page 6: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 6 6

Inst. Data Repositorie(s)

Lab ELN(s)

Data Journal

Data search

Link to article

Journal

FindTopic

Identify gaps

Plan & Fund

Discover data, people, methods & protocols

Collect, analyze & vizualize

Store, preserve & share

Publish

Prepare, reproduce, re-use & benchmark

Domain-specificRepositories

General search

Faculty LIMS

Data center

Inst. Data Repositorie(s)

Lab ELN(s)

Data Journal

Data search

Data ManagementPlans

Metadata, methods & protocols ready for preservation and publishing

Link to article

JournalPublish data (under embargo)

Secure discoverabilityin & outside the institution

Plan each step from experiment to publish

Domain-specificRepositories

General search

What Elsevier is Interested in: Supporting RDM Networks

Page 7: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 7

Biological Pathways extracted via semantic text mining

A upregulates B

B upregulates C

C increases disease D

Normalizing vocabularies required: proteins, diseases, drugs, chemicals

A B C D

Bioactivities through text analysis

IC50 6.3nM, kinase binding assay 10mM concentration

Chemical StructuresAnd Properties

InChi,Name

NCBI,Uniprot

EMTREEReaxysTree,Structures

What Elsevier is Interested in: Knowledge Graphs in Life Science

Page 8: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 8

What Elsevier is Interested in: Knowledgegraphs in Research

Page 9: Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 9

Thank you!

Links to things we’re involved with:• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data• https://www.elsevier.com/about/open-science/research-data• https://www.hivebench.com• https://data.mendeley.com/• https://datasearch.elsevier.com/ • https://www.elsevier.com/books-and-journals/content-innovation/data-base-

linking• http://www.journals.elsevier.com/softwarex/• https

://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-award-in-the-geosciences

• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://www.force11.org/• http://www.nationaldataservice.org/• https://rd-alliance.org/

Anita de Waard, [email protected]