deep learning for text analytics...natural language processing other use cases •text-based...
TRANSCRIPT
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Deep Learning for Text AnalyticsSAS User Group Malaysia
3rd May 2018
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Agenda
• The Natural Language in Machines
• What is it?
• Why we need it?
• Deep Learning with Recurrent Neural Network (RNN)
• Basic RNN architecture
• Text classification
• Text generation
• Creating, training and scoring an RNN in Jupyter Notebook
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Natural Language in Machines
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
SAS in a Chatbot
Source: https://becominghuman.ai/chatbots-using-aws-sas-viya-e8a7410ec256
Welcome! I’m your virtual assistant.
How can I help you?
In The Oil and Gas IndustryAI to Assist Customers
Provide answers and information on over 3,000 company products using information based on 100,000 information data sheets, 1,000 different pack options and 1,100 different physical characteristics.
Delivering the correct answer to each possible question was spread out over a variety of different sources including an external vehicle database with over a million different vehicle and engine combinations.
Need to recognize if a piece of information is missing and ask the customer further questions to clarify or confirm.
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Natural Language ProcessingInteraction
Natural Language Processing (NLP)
Natural Language Understanding (NLU)
Natural Language Generation (NLG)
Natural Language Interaction (NLI)
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Natural Language Processing
NLP Layer(Natural Language
Processing)
Knowledge Base
(Source Content)
Data Storage(Interaction History &
Analytics)
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Natural Language ProcessingOther Use Cases
• Text-based applications
• Searching for a certain topic in the database
• Extracting information for a large document
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Deep Learning with Recurrent Neural Network (RNN)
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Deep LearningRecurrent Neural Network (RNN)
• Designed to handle sequential data
o Text
o Speech
o Time
• Performs the same task for every element of a sequence
• Output for each element depends on computations of its preceding element
• Common variants
o Gated Recurrent Unit (GRU)
o Long Short-Term Memory (LSTM)
Output
Input
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Text Classification
The 16th American President
number
order
entity
context
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Word Vector
Unlabeled Corpus
The 15th American President
The 16th American President
The 17th American President
Alex reads this sentence
Alex read this sentence
Alex is reading this sentence
15th
17th
16th
read
reading
reads
Word Vector Algorithm
Words with similar context should have
similar vectors
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Text Generation
Translating vector back to
text
Convert text into vectorized
input
RNN
Calculate vector weight
Vector representing a sentence
based on the text
Use weight vector to refine model
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Text GenerationText Structure – Word Order
Who is the 16th American President
The 16th President who is American
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Creating, Training, Scoring an RNNUsing Deep Learning
Sample RNN ModelLoading the Action Sets
Sample RNN ModelThe Dataset
Sample RNN ModelThe Dataset
Sample RNN ModelText Classification Model
Sample RNN ModelTraining the Text Classification Model
Sample RNN ModelScoring the Text Classification Model
Sample RNN ModelWord Order
Sample RNN ModelText Generation Model
Sample RNN ModelTraining the Text Generation Model
NOTE: The Synchronous mode is enabled.
NOTE: The total number of parameters is 3822440.
NOTE: The approximate memory cost is 19739.00 MB.
NOTE: Loading weights cost 0.00 (s).
NOTE: Initializing each layer cost 147.37 (s).
NOTE: The total number of threads on each worker is 32.
NOTE: The total number of minibatch size per thread on each worker is 16.
NOTE: The maximum number of minibatch size across all workers for the synchronous mode is 512.
NOTE: Target variable: title
NOTE: Number of input variables: 1
NOTE: Number of numeric input variables: 2
NOTE: Batch nUsed Learning Rate Loss Fit Error Time (s) (Training)
NOTE: 0 512 0.05 11.679 1 0.79
NOTE: 1 512 0.05 11.414 1 0.77
NOTE: 2 512 0.05 11.2 1 1.39
NOTE: 3 512 0.05 11.093 1 0.72
NOTE: 4 512 0.05 10.858 0.9983 0.85
NOTE: 5 512 0.05 10.64 0.9835 1.18
NOTE: 6 512 0.05 10.64 0.9835 0.91
Sample RNN ModelScoring the Text Generation Model
Out[15]: § Scoreinfo
Descr Value
0 Number of Observations Read 9635
1 Number of Observations Used 9635
2 Misclassification Error (%) 90.71156
3 Loss Error 9.162203
Sample RNN ModelText Generation Output
review: Really Really enjoy playing this game. It makes you feel verys mart as you solve some puzzles quickly and then the nextone
will be a real stumper… play it ALL the time.
ground truth: Crazy Addicted
prediction: love this game
review: Don’t bother with this app. I wish I could delete it from my purchased app history so I’m not reminded that it was every on
my phone. I should have known better.
ground truth: Really?
prediction: i love this
Sample RNN ModelText Generation Output
review: I really like this app! very easy to use and the audio is great. it is nice to see the spreading of God’s word is still free
for some people. a++++.
ground truth: awesome!
prediction: very app
Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.
Useful Links
• What’s New In SAS Deep Learning (Documentation)
http://go.documentation.sas.com/?docsetId=casdlpg&docsetTarget=n0gv3jjm5obouun1uvducbzl8nlf.htm&docsetVersion=8.2&locale=en
• Understanding Recurrent Neural Networks
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• RNN Simplified
https://www.youtube.com/watch?v=_aCuOwF1ZjU