wells fargo discusses transforming r …...how did aster r help me? 10 • big data problems speech...

15
#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER WELLS FARGO DISCUSSES TRANSFORMING R INTO AN ENTERPRISE LEVEL TOOL FOR MULTI-GENRE ANALYTICS Eric Legrand Data Scientist, Wells Fargo Roger Fried Data Scientist, Teradata Copyright © 2016 Wells Fargo Bank, N.A. Used with permission.

Upload: others

Post on 16-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER

WELLS FARGO DISCUSSES TRANSFORMING R INTO AN ENTERPRISE LEVEL TOOL FOR MULTI-GENRE ANALYTICS

Eric LegrandData Scientist, Wells Fargo

Roger FriedData Scientist, Teradata

Copyright © 2016 Wells Fargo Bank, N.A. Used with permission.

Page 2: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

• How Wells Fargo uses Aster Data

• Introducing Aster R

• Walk through recent multi-genre project

using Aster R

Agenda

2

Page 3: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

How Wells Fargo Uses Aster Data

3BIG!

SMALL!

Page 4: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Introducing Aster R

4

What:• Interface to Aster's SQL and SQL-MapReduce• Execute R-in-database operations and return results to:

• R object• Aster R virtual data object • Aster table

Why:• Expands data vocabulary • Encourages code reuse• Simplifies scripts• Allows for dynamic documents

Page 5: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

vWorker

SQL & SQL-MR are great but…

5

QueenAster

Analytics

• SQL• SQL-MR

vWorker vWorker vWorker

Challenges with SQL/SQL-MR:

• Solution lacks:

• Workflow

• Visualization

• Documentation

• Requires many tables

• Verbose and less flexible than R

Page 6: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

• SQL• SQL-MR

R fills in the gaps on the client

6

vWorker

QueenAster

Analytics

vWorker vWorker vWorker

Aster R Package

R interface to SQL/SQL-MR:

• Table and file views

• Fast IO between Aster and laptop

• Virtual object data manipulation

• Visualization

• Documentation

Page 7: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Working with the virtual data frame

7

# create ta.data.frame

tadf <- ta.data.frame( “table_name", schemaName = “my_schema")

# column names

ta.names(tadf)

# show first 6

ta.head(tadf)

# subset

ta.subset(tadf, col2 <= 5)

# aggregate count

ta.table(tadf$col1)

Page 8: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

• SQL• SQL-MR

…and the server

8

vWorker

QueenAster

Analytics

vWorker vWorker vWorker

Aster R Package

Execute R on server:

• Easily parallelize custom

functions

• R runner functions ta.tapply,

ta.eval reduce overhead

• Results return to client,

object or table

Page 9: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

R Runner Example

9

Write an R function to be called on the serverfirst3 <- function(tadf){

path <- gsub(“\\[|\\]”, “”, tadf$path) # remove bracketss <- strsplit(path, “, “)[[1]] # get tokenized lists <- s[1:3] # first 3s <- s[!is.na(s)] # remove NAsreturn(first3 = paste(s, collapse = “,”)) # return character

}

Perform in-database execution of the R function using the R Runnerout <- ta.tapply( data, INDEX = data$id, FUN = first3,

out.tadf = list(table = “my_table", schemaName = "sbx_aster_24hour",tableType = "analytic", partitionKey = "id",columns = c("id", “first3”),colTypes = c("int", “varchar"))

)

Page 10: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

How did Aster R help me?

10

• Big data problems speech transcriptions:• Speech transcriptions are messy• I’ve cleaned up samples on my laptop, but need to run across millions of

phone sojourns • Aster R makes this possible

• Small data problems the survey:• Survey metadata doesn’t exist• but online demo does and …• there are R packages to extract data from HTML and…• R’s list processing and “computing on the language” simplify convoluted

processes • Aster R handles all access to Aster

Page 11: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Text standardization and transformation

11

“deposit it in the atm”“deposited in the a t m”“deposited in the atm”“deposit in a t eme”

“deposit atm”“deposit it in the atm”“deposited in the a t m”“deposited in the atm”“deposit in a t eme”

id txt_id metric

1 4 50

1 2 10

2 1 40

2 3 80

id m_1 m_2 m_3 m_4

1 10 50

2 40 80

• ta.pivot()

• ta.tapply• text-morph SQL-MR

Page 12: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Raw data vs. medium rare

12

• Lists• Meta-programming • Dynamic output

HTMLforms

Aster R

Survey response table

Page 13: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Gluing the workflow together…

13

# list of HTML form text and valuesforms <- get_forms(html_dir)

# all survey responsesresp <- survey_partition(forms, table, partition_def)

# function dynamically selects where clause condition# response is 25 SQL queriesresp_sql <- paste(“select * from survey where”, to_boolean(resp))

# list of 25 ta.data.frame objectsrfc <- lapply(resp_sql, ta.data.frame, source = “query”)

# 25 modelsmodels <- lapply(rfc, function(d) ta.glm(y~., data = d))

Page 14: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Programs Help Manage Complexity

14

The key challenge is dealing with complexity and growth: how to expand the computing capabilitiesin a way that is easy to use and leads to trustworthy software.

-- John Chambers Software for Data Analysis

Page 15: WELLS FARGO DISCUSSES TRANSFORMING R …...How did Aster R help me? 10 • Big data problems speech transcriptions: • Speech transcriptions are messy • I’ve cleaned up samples

Thank You

Questions/CommentsEmail:

Follow MeTwitter @

Rate This Session # with the PARTNERS Mobile App

Remember To Share Your Virtual Passes

[email protected]

elegrand

705

15