deep learning for text analytics...natural language processing other use cases •text-based...

29
Company Confidential – For Internal Use Only Copyright © SAS Institute Inc. All rights reserved. Deep Learning for Text Analytics SAS User Group Malaysia 3 rd May 2018

Upload: others

Post on 22-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep Learning for Text AnalyticsSAS User Group Malaysia

3rd May 2018

Page 2: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Agenda

• The Natural Language in Machines

• What is it?

• Why we need it?

• Deep Learning with Recurrent Neural Network (RNN)

• Basic RNN architecture

• Text classification

• Text generation

• Creating, training and scoring an RNN in Jupyter Notebook

Page 3: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language in Machines

Page 4: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

SAS in a Chatbot

Source: https://becominghuman.ai/chatbots-using-aws-sas-viya-e8a7410ec256

Page 5: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Welcome! I’m your virtual assistant.

How can I help you?

In The Oil and Gas IndustryAI to Assist Customers

Provide answers and information on over 3,000 company products using information based on 100,000 information data sheets, 1,000 different pack options and 1,100 different physical characteristics.

Delivering the correct answer to each possible question was spread out over a variety of different sources including an external vehicle database with over a million different vehicle and engine combinations.

Need to recognize if a piece of information is missing and ask the customer further questions to clarify or confirm.

Page 6: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language ProcessingInteraction

Natural Language Processing (NLP)

Natural Language Understanding (NLU)

Natural Language Generation (NLG)

Natural Language Interaction (NLI)

Page 7: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language Processing

NLP Layer(Natural Language

Processing)

Knowledge Base

(Source Content)

Data Storage(Interaction History &

Analytics)

Page 8: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language ProcessingOther Use Cases

• Text-based applications

• Searching for a certain topic in the database

• Extracting information for a large document

Page 9: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep Learning with Recurrent Neural Network (RNN)

Page 10: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep LearningRecurrent Neural Network (RNN)

• Designed to handle sequential data

o Text

o Speech

o Time

• Performs the same task for every element of a sequence

• Output for each element depends on computations of its preceding element

• Common variants

o Gated Recurrent Unit (GRU)

o Long Short-Term Memory (LSTM)

Output

Input

Page 11: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text Classification

The 16th American President

number

order

entity

context

Page 12: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Word Vector

Unlabeled Corpus

The 15th American President

The 16th American President

The 17th American President

Alex reads this sentence

Alex read this sentence

Alex is reading this sentence

15th

17th

16th

read

reading

reads

Word Vector Algorithm

Words with similar context should have

similar vectors

Page 13: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text Generation

Translating vector back to

text

Convert text into vectorized

input

RNN

Calculate vector weight

Vector representing a sentence

based on the text

Use weight vector to refine model

Page 14: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text GenerationText Structure – Word Order

Who is the 16th American President

The 16th President who is American

Page 15: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Creating, Training, Scoring an RNNUsing Deep Learning

Page 16: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelLoading the Action Sets

Page 17: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelThe Dataset

Page 18: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelThe Dataset

Page 19: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelText Classification Model

Page 20: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelTraining the Text Classification Model

Page 21: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelScoring the Text Classification Model

Page 22: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelWord Order

Page 23: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelText Generation Model

Page 24: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelTraining the Text Generation Model

NOTE: The Synchronous mode is enabled.

NOTE: The total number of parameters is 3822440.

NOTE: The approximate memory cost is 19739.00 MB.

NOTE: Loading weights cost 0.00 (s).

NOTE: Initializing each layer cost 147.37 (s).

NOTE: The total number of threads on each worker is 32.

NOTE: The total number of minibatch size per thread on each worker is 16.

NOTE: The maximum number of minibatch size across all workers for the synchronous mode is 512.

NOTE: Target variable: title

NOTE: Number of input variables: 1

NOTE: Number of numeric input variables: 2

NOTE: Batch nUsed Learning Rate Loss Fit Error Time (s) (Training)

NOTE: 0 512 0.05 11.679 1 0.79

NOTE: 1 512 0.05 11.414 1 0.77

NOTE: 2 512 0.05 11.2 1 1.39

NOTE: 3 512 0.05 11.093 1 0.72

NOTE: 4 512 0.05 10.858 0.9983 0.85

NOTE: 5 512 0.05 10.64 0.9835 1.18

NOTE: 6 512 0.05 10.64 0.9835 0.91

Page 25: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelScoring the Text Generation Model

Out[15]: § Scoreinfo

Descr Value

0 Number of Observations Read 9635

1 Number of Observations Used 9635

2 Misclassification Error (%) 90.71156

3 Loss Error 9.162203

Page 26: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelText Generation Output

review: Really Really enjoy playing this game. It makes you feel verys mart as you solve some puzzles quickly and then the nextone

will be a real stumper… play it ALL the time.

ground truth: Crazy Addicted

prediction: love this game

review: Don’t bother with this app. I wish I could delete it from my purchased app history so I’m not reminded that it was every on

my phone. I should have known better.

ground truth: Really?

prediction: i love this

Page 27: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Sample RNN ModelText Generation Output

review: I really like this app! very easy to use and the audio is great. it is nice to see the spreading of God’s word is still free

for some people. a++++.

ground truth: awesome!

prediction: very app

Page 28: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Useful Links

• What’s New In SAS Deep Learning (Documentation)

http://go.documentation.sas.com/?docsetId=casdlpg&docsetTarget=n0gv3jjm5obouun1uvducbzl8nlf.htm&docsetVersion=8.2&locale=en

• Understanding Recurrent Neural Networks

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

• RNN Simplified

https://www.youtube.com/watch?v=_aCuOwF1ZjU

Page 29: Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based applications •Searching for a certain topic in the database •Extracting information

sas.com

Copyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Thank You