data science with r for java developers

46
Data Science With R ~ for ~ Java Developers @Sander_Mak

Upload: sander-mak-sandermak

Post on 12-May-2015

9.552 views

Category:

Technology


1 download

DESCRIPTION

As presented at JavaOne 2013

TRANSCRIPT

Page 1: Data Science with R for Java developers

Data Science With R

~ for ~

Java De

velopers

@Sander_Mak

Page 2: Data Science with R for Java developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1

1 1

1

11

0

0

0

0

0

0

Page 3: Data Science with R for Java developers

90% of the world’s data wasproduced in the last 2 years

- SINTEF/ScienceDaily June 2013!!!!!!!!

We need more thanjust CRUD

Page 4: Data Science with R for Java developers

Stand back.

I know Data Science!

Page 5: Data Science with R for Java developers

SoftwareEngineering

DomainExpertise

Math & Statistics

DataScience

MachineLearning

OperationsResearch

Danger!Perl ahead!

Page 6: Data Science with R for Java developers

SoftwareEngineering

DomainExpertise

Math & Statistics

DataScience

MachineLearning

OperationsResearch

Danger!Perl ahead!

Page 7: Data Science with R for Java developers

Data Science:Achievement Unlocked

Page 8: Data Science with R for Java developers

R, R-Studio

Today

Data Science:Achievement Unlocked

Page 9: Data Science with R for Java developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

10

0

0

0

0

0

Page 10: Data Science with R for Java developers

LanguageDesigners Statisticians

Page 11: Data Science with R for Java developers

LanguageDesigners? Statisticians?

The best thing about R is that it was developed by statisticians. The worst thing about R is that... it was developed by statisticians. - Bo Cowgill, Google

Page 12: Data Science with R for Java developers

Why R, then?

Open Source

De-facto standard (in statistical research)

“It’s a DSL posing as general purpose language”

Interactive data exploration

Page 13: Data Science with R for Java developers

Why not R, then?Slow

Memory Bound

(Did I mention it’s a quirky language?)

Try googling for R...

Page 14: Data Science with R for Java developers

Why not R, then?

‘If you are using R and you think you’re in hell, this is a map for you.’

- The R Inferno

Slow

Memory Bound

(Did I mention it’s a quirky language?)

Try googling for R...

Page 15: Data Science with R for Java developers

Apparently, statisticians aren’t designers, either...

Page 16: Data Science with R for Java developers

VS

Page 17: Data Science with R for Java developers

Dynamic (eval)

Interpreted

Static types

Compiled

Functional/OO/Procedural OO

Page 18: Data Science with R for Java developers

Factor Enum

numeric

character String

Integer/Double/...

Page 19: Data Science with R for Java developers

Factor Enum

numeric

character String

vectorlist

dataframe

Integer/Double/...

Page 20: Data Science with R for Java developers

1-based 0-based12

34

01

23

Page 21: Data Science with R for Java developers

1-based 0-based12

34

01

23

for-loops

higher-order functionssapply(vec, function(elm) { elm + 1;})

Page 22: Data Science with R for Java developers

eager evalutationlazy evaluation

Page 23: Data Science with R for Java developers

eager evalutationlazy evaluation

pass-by-value(copy-on-write)

pass-by-reference

Function FValue A Value A

Value A’call F(A) modify

Page 24: Data Science with R for Java developers

Studio

Page 25: Data Science with R for Java developers

Central

ComprehensiveRArchiveNetwork

Studio

Page 26: Data Science with R for Java developers

Coding time!

Page 29: Data Science with R for Java developers

Titanic Competition: Machine Learning from Disaster

Sex == Female

Decision Tree

Age > 50Age > 16

Fare > 100

T FT T F

Page 30: Data Science with R for Java developers

Titanic Competition: Machine Learning from Disaster

Sex == Female

Decision Tree

Age > 50Age > 16

Random Forest

Fare > 100

T FT T F

T

FT T FT

FT T F

T

FT T FT

FT T F

Page 31: Data Science with R for Java developers

Demo time!

Page 32: Data Science with R for Java developers

...

...

Page 33: Data Science with R for Java developers

Agenda

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

1

0

0

0

0

0

0

Page 34: Data Science with R for Java developers

Bridging R and Java

Integrate

Assimilate

Replace

Page 35: Data Science with R for Java developers

rJava & Java/R interfaceIntegrate

Two way native interface - JNI: libjri - or TCP to RServe

Rengine re = new Rengine(new String[] {}, false, null);

// wait until engine is readyif (!re.waitForR()) { throw new IllegalStateException(“Can’t load R engine”);}

re.eval("data(cars)", false);REXP cars = re.eval("cars");

RVector carsVector = cars.asVector();// dissect carsVector...

Page 36: Data Science with R for Java developers

Assimilate

Reimplementation of R on JVM

Fast & lean

Parallelized

Just-another-lib

... not production ready yet...

Page 37: Data Science with R for Java developers

Assimilate

// create a script engine managerScriptEngineManager factory = new ScriptEngineManager();

// create an R engineScriptEngine engine = factory.getEngineByName("Renjin");

// load package from classpathengine.eval(“library(survey)");

// evaluate R code from Stringengine.eval("print('Hello from R')");

Reimplementation of R on JVM

Fast & lean

Parallelized

Just-another-lib

... not production ready yet...

Page 38: Data Science with R for Java developers

Reimplementation of R on JVM

Share data:

Integer[] data = {1, 2, 3};

engine.put("data", data); engine.eval("print(sum(data))");

Assimilate

Page 39: Data Science with R for Java developers

Reimplementation of R on JVM

Share data:

import(com.foo.User)

# instantiate Java beanstim <- User$new(name='Tim', age=23)tom <- User$new(name='Tom', age=45)

# invoke settertim$name <- "Timmy"

Use Java from Renjin:

Integer[] data = {1, 2, 3};

engine.put("data", data); engine.eval("print(sum(data))");

Assimilate

Page 40: Data Science with R for Java developers

Big Data?

Page 41: Data Science with R for Java developers

ReplaceJVM Libraries/platforms

Page 42: Data Science with R for Java developers

ReplaceScalable R distributions(non-JVM)

Revolution Analytics

Oracle Enterprise R

Page 43: Data Science with R for Java developers

Wrap-up

Data Science

The R language

Gimme some Java!

1

1

1 1 1

1

1

10

0

0

0

0

0

Page 44: Data Science with R for Java developers

SanitizeExplore

Model PredictScale

Page 45: Data Science with R for Java developers

Next steps

Computing for Data Analysisstarts Sept. 23rd

Install R Read

Page 46: Data Science with R for Java developers

Questions?Data Science

The R language

Gimme some Java!11

1 1 11 110

0

0

0

0

0

@Sander_Mak

branchandbound.net