data for science: how elsevier is using data science to empower researchers

Post on 15-Apr-2017

495 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DATA FOR SCIENCEHOW ELSEVIER IS USING DATA SCIENCE TO EMPOWER RESEARCHERS

Paul Groth | @pgroth | pgroth.com

Disruptive Technology Director

Elsevier Labs | @elsevierlabs

European Data Forum 2016

12 million people per month

40 million reactions 75 million compounds500 million facts

3 EXAMPLES• Personalized: what should I read?

• Actionable: who should I collaborate with?

• Consumable: how do I make my data available?

RECOMMENDATIONS AT MENDELEY

• Maya Hristakeva• Data Scientist at Mendeley• @mayahhf• Spark Summit 2015• http://www.slideshare.net/SparkSummit/

sparking-science-up-with-research-recommendations-by-maya-hristakeva

Read &

Organize

Search &

Discover

Collaborate &

Network

Experiment&

Synthesize

MENDELEY BUILDS TOOLS TO HELP RESEARCHERS …

BEING THE BEST RESEARCHER YOU CAN BE!• Good researchers are on top of their game

• Large amount of research produced

• Takes time to get what you need

• Help researchers by recommending relevant research

PERSONALIZED ARTICLE RECOMMENDATIONInput:User libraries

Output:

Suggested articles to read

Algorithms:• Collaborative Filtering

– Item-based

– User-Based

– Matrix Factorization

• Content-based

Costly & GoodCostly & Bad

Cheap & GoodCheap & Bad

Tuned IB Mahout

Tuned UB Mahout

Tuned UB Spark

Tuned IB Spark

UB DimSumSpark MLlib

ALS Matrix Fact.Spark MLlib

Performance

+100%

+150%~$50

CALCULATING 75 TRILLION METRICS• Benchmark 4600 institutions & 220 countries updated weekly

• 40 terabytes of data

• HPCC massively parallel compute system – 40 node system

ALL DATA ISN’T CURATED

60 % OF TIME IS SPENT ON DATA PREPARATION

10 ASPECTS OF HIGHLY EFFECTIVE RESEARCH DATA

https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data

http://data.mendeley.com/

Each dataset receives a versioned DOI, so it can be cited

The citation for the associated article is

displayed

ACADEMIC COLLABORATIONS

CONCLUSION• Researchers are faced with an ever growing amount of data and content

• Data Science is key to making systems that help them

• I’ve shown three Elsevier examples. Many more!

• Antonio Gulli’s codingplayground.blogspot.nl • labs.elsevier.com

• Of course, we’re hiring

Contact: Paul Groth @pgroth

top related