scoring sas® text analytics models in hadoop...what’s driving customer behavior, satisfaction,...
TRANSCRIPT
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Scoring SAS® Text Analytics Models in Hadoop
Simran BaggaProduct Manager – Text Analytics
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Call Center Notes Survey Feedback
Online Forums Blogs Consumer Reviews Online News Social Networks
Associate Comments Claims & Case NotesResearch & Publications
Live Chat Factory/Tech’n Notes HR data Medical/Health Records Contracts & Applications
Text is common to every industry, every region, every type of organization
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE CHALLENGE: UNLOCKING YOUR
Source: IDC Digital Universe Study, sponsored by EMC, May 2010
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Why Do Text Analysis?
Operational, website, and transactional sales data provide good awareness about customers’ past behaviors and actions.
Textual inputs provide an additional layer of insight around consumer attitudes, customer experience, and future intentions. They often deliver the “why” factor
Customer Experience
Better understand
what’s driving
customer behavior,
satisfaction, and
preferences
Early Warning
Detect potential
design, safety, or
service issues sooner
than conventional
methods
Monitoring & Exploration
Monitor the volume of
known topics, and
effectively conduct ad
hoc analyses as
needed
1 2 3
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
How Does Text Analytics Work?
Natural Language Processing
+
Machine Learning
+
Domain specific rules(Human Input)
Read “between
the lines”
Detect sarcasm,
resolve slang
Infer what the
author really meant
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
NLP: Exploring Terms and Associations
• Single and multi-word
terms automatically
detected and listed by
frequency
• Surface forms (plurals,
verb tenses, etc.)
grouped under parent
term
• Misspellings
automatically detected
and resolved to parent
term
• Part of speech and
Entities identified
Visualize
commonly co-
occurring terms
and term
relationships
Review how
terms are being
used in context
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Contextual Extraction Of Entities and Facts
• Identify and extract elements of interest
• 18 pre-defined entities provided out of the box
• Use a wide range of powerful linguistic and Boolean rules to ensure accuracy and contextual specificity
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Machine Learning Based Topic Discovery
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Categorizing Content
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DS2 in 30 Seconds
Procedural programming language
Mainly focused in parallel execution
Supports ANSI SQL data types
Allows Embedded SQL
Allows modular programming: Scope and Methods
Supports Packages and Threads
proc ds2 ds2accel=yes;
thread compute;
method run();
set hdfs.emp_donations;
total = sum(jan--dec);
end;
endthread;
data hdfs.totals;
dcl thread compute t;
method run();
set from t;
end;
enddata;
run; quit;
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DS2 Code: Three system methods
init() – Runs once on program start
Initialize the linguistic binaries with desired options
run() – Runs once for every row
Process the document using the transaction object
Loops through each match and obtain desired variables
term() – Runs once on program termination
Termination cleans up artifacts created in the init() method
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS Intersection With Hadoop
SAS treats Hadoop just as any other data source, pulling data FROM Hadoop and writing it back using ACCESS engines,
SAS works WITH Hadoop, lifting data in a purpose-built advanced analytics in-memory environment in both symmetric and asymmetric way, using In-Memory engines
SAS can work directly IN Hadoop, leveraging the distributed processing capabilities of Hadoop, using SAS Code and Scoring Accelerators.
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
In-Database Code Accelerator for Hadoop
Thread program defines the parallel logic
Thread program runs inside MapReduce task
MapReduce shuffle/sort separates BY groups
DS2 executes in Map/Reduce stage
Two MapReduce jobs may be used
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS Embedded Process (EP) for Hadoop
Lightweight execution container for DS2
Written in C and Java
Runs inside a MapReduce task
Orchestrated by Hadoop MapReduce framework
Resource allocation managed by YARN
Text extensions added to EP to enable text scoring in Hadoop
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Text Scoring In Hadoop
Accelerator=YES
PROC DS2
:
:
RUN;
Code
accele
rato
rApplication
Master
Resource
Manager
Data
Code accelerator
Embedded
Process
Data
Code accelerator
Embedded
Process
Data
Code accelerator
Embedded
Process
Hadoop
SAS
YARN
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Code Accelerator Execution
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
YARN Resource Manager Interface
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Conclusion - Run Faster. Run Embedded!
SAS provides the platform for big data
Move the computation to the data
Run faster inside the database
Eliminates massive data movement
Explores parallel processing power of Hadoop
Delivers faster time to results
Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
#analyticsx