scoring sas® text analytics models in hadoop...what’s driving customer behavior, satisfaction,...

19
#analyticsx Copyright © 2016, SAS Institute Inc. All rights reserved. Scoring SAS® Text Analytics Models in Hadoop Simran Bagga Product Manager – Text Analytics

Upload: others

Post on 08-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Scoring SAS® Text Analytics Models in Hadoop

Simran BaggaProduct Manager – Text Analytics

Page 2: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Call Center Notes Survey Feedback

Online Forums Blogs Consumer Reviews Online News Social Networks

Associate Comments Claims & Case NotesResearch & Publications

Live Chat Factory/Tech’n Notes HR data Medical/Health Records Contracts & Applications

Text is common to every industry, every region, every type of organization

Page 3: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

THE CHALLENGE: UNLOCKING YOUR

Source: IDC Digital Universe Study, sponsored by EMC, May 2010

Page 4: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Why Do Text Analysis?

Operational, website, and transactional sales data provide good awareness about customers’ past behaviors and actions.

Textual inputs provide an additional layer of insight around consumer attitudes, customer experience, and future intentions. They often deliver the “why” factor

Customer Experience

Better understand

what’s driving

customer behavior,

satisfaction, and

preferences

Early Warning

Detect potential

design, safety, or

service issues sooner

than conventional

methods

Monitoring & Exploration

Monitor the volume of

known topics, and

effectively conduct ad

hoc analyses as

needed

1 2 3

Page 5: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

How Does Text Analytics Work?

Natural Language Processing

+

Machine Learning

+

Domain specific rules(Human Input)

Read “between

the lines”

Detect sarcasm,

resolve slang

Infer what the

author really meant

Page 6: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

NLP: Exploring Terms and Associations

• Single and multi-word

terms automatically

detected and listed by

frequency

• Surface forms (plurals,

verb tenses, etc.)

grouped under parent

term

• Misspellings

automatically detected

and resolved to parent

term

• Part of speech and

Entities identified

Visualize

commonly co-

occurring terms

and term

relationships

Review how

terms are being

used in context

Page 7: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Contextual Extraction Of Entities and Facts

• Identify and extract elements of interest

• 18 pre-defined entities provided out of the box

• Use a wide range of powerful linguistic and Boolean rules to ensure accuracy and contextual specificity

Page 8: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Machine Learning Based Topic Discovery

Page 9: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Categorizing Content

Page 10: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DS2 in 30 Seconds

Procedural programming language

Mainly focused in parallel execution

Supports ANSI SQL data types

Allows Embedded SQL

Allows modular programming: Scope and Methods

Supports Packages and Threads

proc ds2 ds2accel=yes;

thread compute;

method run();

set hdfs.emp_donations;

total = sum(jan--dec);

end;

endthread;

data hdfs.totals;

dcl thread compute t;

method run();

set from t;

end;

enddata;

run; quit;

Page 11: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DS2 Code: Three system methods

init() – Runs once on program start

Initialize the linguistic binaries with desired options

run() – Runs once for every row

Process the document using the transaction object

Loops through each match and obtain desired variables

term() – Runs once on program termination

Termination cleans up artifacts created in the init() method

Page 12: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS Intersection With Hadoop

SAS treats Hadoop just as any other data source, pulling data FROM Hadoop and writing it back using ACCESS engines,

SAS works WITH Hadoop, lifting data in a purpose-built advanced analytics in-memory environment in both symmetric and asymmetric way, using In-Memory engines

SAS can work directly IN Hadoop, leveraging the distributed processing capabilities of Hadoop, using SAS Code and Scoring Accelerators.

Page 13: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

In-Database Code Accelerator for Hadoop

Thread program defines the parallel logic

Thread program runs inside MapReduce task

MapReduce shuffle/sort separates BY groups

DS2 executes in Map/Reduce stage

Two MapReduce jobs may be used

Page 14: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS Embedded Process (EP) for Hadoop

Lightweight execution container for DS2

Written in C and Java

Runs inside a MapReduce task

Orchestrated by Hadoop MapReduce framework

Resource allocation managed by YARN

Text extensions added to EP to enable text scoring in Hadoop

Page 15: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Text Scoring In Hadoop

Accelerator=YES

PROC DS2

:

:

RUN;

Code

accele

rato

rApplication

Master

Resource

Manager

Data

Code accelerator

Embedded

Process

Data

Code accelerator

Embedded

Process

Data

Code accelerator

Embedded

Process

Hadoop

SAS

YARN

Page 16: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Code Accelerator Execution

Page 17: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

YARN Resource Manager Interface

Page 18: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Conclusion - Run Faster. Run Embedded!

SAS provides the platform for big data

Move the computation to the data

Run faster inside the database

Eliminates massive data movement

Explores parallel processing power of Hadoop

Delivers faster time to results

Page 19: Scoring SAS® Text Analytics Models in Hadoop...what’s driving customer behavior, satisfaction, and preferences Early Warning Detect potential design, safety, or service issues sooner

Copyr i g ht © 2016, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

#analyticsx