wells fargo discusses transforming r …...how did aster r help me? 10 • big data problems speech...
TRANSCRIPT
#TDPARTNERS16 GEORGIA WORLD CONGRESS CENTER
WELLS FARGO DISCUSSES TRANSFORMING R INTO AN ENTERPRISE LEVEL TOOL FOR MULTI-GENRE ANALYTICS
Eric LegrandData Scientist, Wells Fargo
Roger FriedData Scientist, Teradata
Copyright © 2016 Wells Fargo Bank, N.A. Used with permission.
• How Wells Fargo uses Aster Data
• Introducing Aster R
• Walk through recent multi-genre project
using Aster R
Agenda
2
How Wells Fargo Uses Aster Data
3BIG!
SMALL!
Introducing Aster R
4
What:• Interface to Aster's SQL and SQL-MapReduce• Execute R-in-database operations and return results to:
• R object• Aster R virtual data object • Aster table
Why:• Expands data vocabulary • Encourages code reuse• Simplifies scripts• Allows for dynamic documents
vWorker
SQL & SQL-MR are great but…
5
QueenAster
Analytics
• SQL• SQL-MR
vWorker vWorker vWorker
Challenges with SQL/SQL-MR:
• Solution lacks:
• Workflow
• Visualization
• Documentation
• Requires many tables
• Verbose and less flexible than R
• SQL• SQL-MR
R fills in the gaps on the client
6
vWorker
QueenAster
Analytics
vWorker vWorker vWorker
Aster R Package
R interface to SQL/SQL-MR:
• Table and file views
• Fast IO between Aster and laptop
• Virtual object data manipulation
• Visualization
• Documentation
Working with the virtual data frame
7
# create ta.data.frame
tadf <- ta.data.frame( “table_name", schemaName = “my_schema")
# column names
ta.names(tadf)
# show first 6
ta.head(tadf)
# subset
ta.subset(tadf, col2 <= 5)
# aggregate count
ta.table(tadf$col1)
• SQL• SQL-MR
…and the server
8
vWorker
QueenAster
Analytics
vWorker vWorker vWorker
Aster R Package
Execute R on server:
• Easily parallelize custom
functions
• R runner functions ta.tapply,
ta.eval reduce overhead
• Results return to client,
object or table
R Runner Example
9
Write an R function to be called on the serverfirst3 <- function(tadf){
path <- gsub(“\\[|\\]”, “”, tadf$path) # remove bracketss <- strsplit(path, “, “)[[1]] # get tokenized lists <- s[1:3] # first 3s <- s[!is.na(s)] # remove NAsreturn(first3 = paste(s, collapse = “,”)) # return character
}
Perform in-database execution of the R function using the R Runnerout <- ta.tapply( data, INDEX = data$id, FUN = first3,
out.tadf = list(table = “my_table", schemaName = "sbx_aster_24hour",tableType = "analytic", partitionKey = "id",columns = c("id", “first3”),colTypes = c("int", “varchar"))
)
How did Aster R help me?
10
• Big data problems speech transcriptions:• Speech transcriptions are messy• I’ve cleaned up samples on my laptop, but need to run across millions of
phone sojourns • Aster R makes this possible
• Small data problems the survey:• Survey metadata doesn’t exist• but online demo does and …• there are R packages to extract data from HTML and…• R’s list processing and “computing on the language” simplify convoluted
processes • Aster R handles all access to Aster
Text standardization and transformation
11
“deposit it in the atm”“deposited in the a t m”“deposited in the atm”“deposit in a t eme”
“deposit atm”“deposit it in the atm”“deposited in the a t m”“deposited in the atm”“deposit in a t eme”
id txt_id metric
1 4 50
1 2 10
2 1 40
2 3 80
id m_1 m_2 m_3 m_4
1 10 50
2 40 80
• ta.pivot()
• ta.tapply• text-morph SQL-MR
Raw data vs. medium rare
12
• Lists• Meta-programming • Dynamic output
HTMLforms
Aster R
Survey response table
Gluing the workflow together…
13
# list of HTML form text and valuesforms <- get_forms(html_dir)
# all survey responsesresp <- survey_partition(forms, table, partition_def)
# function dynamically selects where clause condition# response is 25 SQL queriesresp_sql <- paste(“select * from survey where”, to_boolean(resp))
# list of 25 ta.data.frame objectsrfc <- lapply(resp_sql, ta.data.frame, source = “query”)
# 25 modelsmodels <- lapply(rfc, function(d) ta.glm(y~., data = d))
Programs Help Manage Complexity
14
The key challenge is dealing with complexity and growth: how to expand the computing capabilitiesin a way that is easy to use and leads to trustworthy software.
-- John Chambers Software for Data Analysis
Thank You
Questions/CommentsEmail:
Follow MeTwitter @
Rate This Session # with the PARTNERS Mobile App
Remember To Share Your Virtual Passes
elegrand
705
15