rlondon 2014 06 17...• infrastructure integration: tibco business works, … terr integration with...

16
R / TERR Ana Costa e SIlva, PhD Senior Data Scientist TIBCO © Copyright 2000-2013 TIBCO Software Inc.

Upload: others

Post on 03-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

R / TERR

Ana Costa e SIlva, PhD

Senior Data Scientist

TIBCO

© Copyright 2000-2013 TIBCO Software Inc.

Page 2: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

2© Copyright 2000-2014 TIBCO Software Inc.

Hundreds Hundreds

of

Records

Key peformance indicators

Billions of Records (Big Data)

Data MiningMillions of Records

Visual Data Discovery

Trillions of Records (Fast Data)

Real Time Analytics

Tower of Big and Fast Data

Page 3: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

3© Copyright 2000-2014 TIBCO Software Inc.

Hundreds Hundreds

of

Records

Key peformance indicators

Billions of Records (Big Data)

Data MiningMillions of Records

Visual Data Discovery

Trillions of Records (Fast Data)

Real Time Analytics

Spotfire Event AnalyticsTIBCO Enterprise

Runtime for R

Spotfire Mobile MetricsSpotfire Analyst

Spotfire Business Author

Spotfire Consumer

Tower of Big and Fast Data

Page 4: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

4

TERR

• TIBCO Enterprise Runtime for R (TERR)

• Latest in family of statistics scripting engines: S, S-PLUS®, R, TERR

• Commercial Releases: v1.0 Nov 2012, v2.0 Nov 2013, v2.1 Feb 2014, …

• Developer Edition: www.TIBCOmmunity.com/community/products/analytics/terr

• Engine internals rebuilt from scratch

• Redesigned data object representation

• Redesigned memory management facilities

• Addresses long-standing problems with S language

• Fast and scalable engine !!

Page 5: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

5

Model Fitting: 5 Million Rows Model Scoring: 20 Million Rows

TERR Performance

TERR 7X faster 84X

Page 6: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

666

TERR: The Fastest Road to Big Data

• TERR: TIBCO Enterprise Runtime for R

• Most stable and performant access to analytics

• Zero learning curve for R programmers

• Supports in-database, in-Hadoop functionality

• Teradata, Oracle, …; Apache, Horton, Cloudera, MapR, …

• Deployment

• TERR Server execution: TIBCO Spotfire Statistics Services

• CEP Integration: TIBCO Business Events, Streambase

• Grid Integration: TIBCO GridServer

• Infrastructure Integration: TIBCO Business Works, …

Page 7: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

TERR integration with RStudio IDE

• RStudio integration

– TERR now compatible with the most popular IDE in the R Community

– Professional-quality development environment to use with TERR

• Features

– Syntax highlighting, code completion, and smart indentation

– Execute R code directly from the source editor

– Manage multiple working directories using projects

– Quickly navigate code

Page 8: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

8

Demo 1

Page 9: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

9

Hadoop / TERR: Write Your Mapper

mapper <-

function(d) {

words <-

strsplit(paste(d, collapse = ' '),

'[[:punct:][:space:]]+')[[1]]

# split on punctuation and spaces

words <- words[!(words == '')]

# get rid of empty words caused by whitespace at beginning of lines

df <- data.frame(word = words)

df$cnt <- 1

hsWriteTable(df, sep = "\t")

}

Use Standard R Syntax; Run using TERR

If you can understand this, you can write mapreduce:

cat input | mapper | sort |reducer

Page 10: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

10

Write Your Reducer

reducer <-

function(d) { # d$word is all one value per mapreduce

cat(paste(d$word[1], sum(d$cnt), collapse="\t"),

"\n")

}

Use Standard R Syntax; Run using TERR

If you can understand this, you can write mapreduce:

cat input | mapper | sort |reducer

Page 11: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

11

From the command line:

$ hadoop-streaming –map mapper.R –reduce reducer.R

–input ‘inputfile’ –output ‘outputfile’

From TERR: optionally call remotely via TIBCO Spotfire Statistics Services

Return.code <-

system(“hadoop-streaming –map mapper.R –reduce reducer.R

–input ‘inputfile’ –output ‘outputfile’ ”)

TERR Map Reduce

Page 12: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

12

Hadoop Big Data Tools

Complex

Technical

Confusing

TIBCO Approach

Authors and Consumers – Hide Complexity, Empower Users

Visual Query – data on demand

Fit interface to User skills

Page 13: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

Hadoop Streaming$ hadoop-streaming –map mapper.R –reduce reducer.R

-input ‘inputfile’ –output ‘outputfile’

13

Mapper.R TERRscript Reducer.R via TERRscript

HDFSEach Node Processes its own data using TERR

Data Node

Spotfire via Statistics Services

TERR Map Reduce

Data NodeData NodeData Node

Page 14: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

14

Demo 2

Page 15: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

TERR MapReduce from Spotfire

Parameterize MapReduce, Generate and Edit MapReduce code, Test Locally, I/O from Spotfire

Deploy through Hadoop Streaming MapReduce Interface from/to Spotfire

© Copyright 2000-2014 TIBCO Software Inc.

Receive analysis results directly back into Spotfire for visualisation and further analysis

Page 16: RLondon 2014 06 17...• Infrastructure Integration: TIBCO Business Works, … TERR integration with RStudio IDE • RStudio integration – TERR now compatible with the most popular

© Copyright 2000-2013 TIBCO Software Inc.

Thank you!

Ana Costa e Silva, PhD

Senior Data Scientist

[email protected]

Contact

© Copyright 2000-2013 TIBCO Software Inc. 16

TERR Developer Edition:

www.TIBCOmmunity.com/community/products/analytics/terr