voc-dl: revisiting voice of customer using deep learning › assets › slides › voc-dl.pdf · :...

© 2018 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

VoC-DL: Revisiting Voice of Customer using Deep LearningSusheel Suresh, Guru Rajan TS and Vipin Gopinath | Marketing Services | Adobe Systems, Bangalore

IAAI 2018 | Emerging Track Paper


Talk

Motivation Prior Work Data Collection Approach Comparisons Demo Q&A

Presenter

Presentation Notes

This is how I would like to organize my talk today.


Motivation Customers’ Comments

trouble installing photoshop. Already paid for it

I'm trying to get a trial for LiveCycle I am considering buying it and want to try it first.

I just purchased the 'Photoshop' but couldn't open it. please help me out !

Unfortunately, we have been unsuccessful with getting in touch with our Adobe representative for the past week and we urgently need to start the conversation for this platform. We are eager to customize a quote. Let's setup a proper demo

This form does not fit for an educational model. I am department chair and also teach graphic design. So answering questions for your data will always be slightly wrong. I usually have 20 students per class.

Hello,I have spent the last hour trying to find the customer support page for Adobe Analytics. I am going in circles. Where do I go to submit issues and ask questions? I have tried all the avenues. Please tell me how to contact Customer Care.

Do you offer discounted ratesfor Adobe stock image subscriptions for non-profit organizations?

Presenter

Presentation Notes

What problem are we solving? Why is it important? Most enterprises have varied data sources for capturing customer voice. (text written by customers, chat with call center etc. Written by thousands/millions of customers and they are trying to voice their concerns or intents. Not possible for a human to sit and channelize the issues customers face. Bad for business and customer experience. In textual representation, they maybe from third party review websites/apps, social media, in-house help forums, company websites etc. Customer can write on a whole host of things. Can an organization use a text classification system that can provide a deeper holistic understanding of the intentions behind the written text, rather than sentiments? How will it benefit – better engagement with customer.


Prior Work

Sentiment Analysis Pang et al. 2008

Desire Analysis Goldberg et al. 2009 – Identifying ‘Wishes’

Ramanand et al. 2010 – Identifying ‘Buy’ signals

Purchase Intent Analysis Gupta et al. 2014 – Quora & Yahoo Answers

Korpusik et al. 2016 – Twitter

Presenter

Presentation Notes

What approaches have been tried, and why they have not fully solved the problem? Goldberg – template system Ramanand – rule based system Gupta – hand engineered features followed by SVM – it is not scalable. Korpusik – RNN Goal - - - With the universal acceptance of word embedding techniques, (there has been recent study in using fastText , glovec, ) – active research in that field “””Highlight”””” The main problem still remains – because they don’t delve into getting a deep holistic meaning


Dataset

• Product Enquiry (PE) : Expression signifying an act of asking for product/service related information.

• Buying Intent (BI) : Expression signifying an intention to purchase or consume a product/service.

• Feedback (F) : Expression signifying some reaction to a product/service.

• Seeking Help (SH) : Expression signifying an act of seeking help related to a product/service.

• Pricing Query (PQ) : Expression signifying a question directed specifically towards pricing of a product/service.Cohens kappa coefficient : 0.701

Presenter

Presentation Notes

15 thousand data points crowd sourced the annotation task. each text annotated by 3 different annotators and we used the majority voting scheme to come up with final labels for the text Cohens kappa coefficient of inter-annotator agreement between the sets of annotations was 0.701 Data : customer written text on Adobe Website.


Approach

Stage 1 Word Embedding

Stage 2 Baseline Model – SVM

CNN

RNN

LSTM

Presenter

Presentation Notes

Word Embedding – Word Vectors (words projected onto a lower dimensional vector space) are essentially feature extractors that encode semantic features of words in their dimensions. CBOW takes the average of the context of a word Skip-gram model can capture two semantics for a single word We use the publicly available word2vec vectors that were trained on 100 billion words from Google News. - trained using skip-gram model (with negative subsampling ) “Negative Sampling”, which causes each training sample to update only a small percentage of the model’s weights. It’s worth noting that subsampling frequent words and applying Negative Sampling not only reduced the compute burden of the training process, but also improved the quality of their resulting word vectors as well. Words not in the pre-trained vec list initialized randomly. Side note : Maybe in future use glove, fastest, lexvec


Architectures

LSTM ArchitectureCNN Architecture

Presenter

Presentation Notes

input words represented in 300 dimension word2vec vectors are given as input and 128 filters of sizes 3,4,5 each are applied on the embeddings. One time max pooling is performed for all three convs and 128 feature maps are generated for each region size. the state size was set to 10 Implemented in Pytorch General Hyper Parameters. Used Adam algorithm with a learning rate of 0.01 was used for performing gradient descent. Cross Entropy Loss function


Results & RemarksModel Accuracy (%)

SVM 61.7

CNN 72.2

RNN 75.1

LSTM 82.3

Dataset CNN RNN LSTM

TREC 92.8 93.6 95.4

SST-1 43.0 42.9 46.3

SST-2 82.1 83.4 87.4

MR 75.4 77.2 82.7

Table 2. Results in % accuracy for different standard datasets and our models

Table 1. Experimental results (from our task)

Presenter

Presentation Notes

In addition to reporting the accuracy percentages for our task, we also performed an evaluation on other standard datasets like TREC, Stanford Sentiment Treebank and Movie Review datasets. As shown above, the LSTM performs the best based on 10 fold cross validation because of its ability to hold key contextual information compared to vanilla RNN’ or CNN’s. We have replicated the results for the different standard datasets and it is shown in table 2. Comparatively LSTM outperforms all the other ones.


Model Accuracy

LSTM

CNNRNN

Presenter

Presentation Notes

Explain what the graph means. Both test and train graphs are shown RNN ---- epochs LSTM, CNN ----x axis is number of iterations 80:20 split on train test data.


Demo

Presenter

Presentation Notes

Now I will show a quick demo of the proposed model.


Key References

Gupta, V.; Varshney, D.; Jhamtani, H.; Kedia, D.; and Karwa, S. 2014. Identifying purchase intent from social posts. In ICWSM.

Kim, Y. 2014. Convolutional neural networks for sentence classification. arXiv preprintarXiv:1408.5882.

Ramanand, J.; Bhavsar, K.; and Pedanekar, N. 2010. Wishful thinking: finding suggestions and’buy’wishes from product reviews. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, 54–61. ACL

Korpusik, M.; Sakaki, S.; Chen, F.; and Chen, Y.-Y. 2016. Recurrent neural networks for customer purchase prediction on twitter. In CBRecSys@ RecSys, 47–50.

http://pytorch.org/ https://code.google.com/p/word2vec/

Presenter

Presentation Notes

Yoon kim paper introduced the use of CNN with pre-trained embeddings for text classification tasks. Concluding remarks – With the use of a generic model, we are able to overcome two major drawbacks of hard-coded features: the high cost of making rules by analyzing and their low coverage. We reiterate the fact that word2vec word embeddings truly represent universal features and CNNs are very much capable of text classification. They give similar results when compared to basic RNN’s. Originally CNNs were designed for computer vision problems, but with the advent of superior vector representations CNNs can be used to solve specific NLP problems. The best deep learning model for our task is a RNN+LSTM architecture. With its ability to capture temporal relationships in a text and solve the vanishing gradient problems in vannila RNN’s, LSTM’s gave us an accuracy increase of 13 % compared to basic CNN’s and RNN’s.

http://pytorch.org/

https://code.google.com/p/word2vec/

Presenter

Presentation Notes

Q&A What kinds of things do we still get wrong? Examples. The possibility of getting a contextual error (if the word is not present in the pre-trained model then context might be lost.)

voc-dl: revisiting voice of customer using deep learning › assets › slides › voc-dl.pdf · :...

Documents