deep learning applications (dadada2017)

228
Deep Learning Applications (in industries and elsewhere) Abhishek Thakur @abhi1thakur

Upload: abhishek-thakur

Post on 24-Jan-2018

861 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Deep Learning Applications (dadada2017)

Deep Learning Applications(in industries and elsewhere)

Abhishek Thakur@abhi1thakur

Page 2: Deep Learning Applications (dadada2017)

About me● Chief Data Scientist @ Boost AI● Machine learning enthusiast● Kaggle junkie (highest world rank #3)● Interested in:

○ Automatic machine learning○ Large scale classification of text data○ Chatbots

I like big data and

I cannot lie

Page 3: Deep Learning Applications (dadada2017)

Agenda● Brief introduction to deep learning● Implementation of deepnets● Fine-tuning of pre-trained networks● 4 different industrial use cases● No maths!!!!

Page 4: Deep Learning Applications (dadada2017)

What is deep learning?

Page 5: Deep Learning Applications (dadada2017)

What is deep learning?● A buzzword

Page 6: Deep Learning Applications (dadada2017)

What is deep learning?● A buzzword● Neural networks

Page 7: Deep Learning Applications (dadada2017)

What is deep learning?● A buzzword● Neural networks● Removes manual feature extraction steps

Page 8: Deep Learning Applications (dadada2017)

What is deep learning?● A buzzword● Neural networks● Removes manual feature extraction steps● Not a black box

Page 9: Deep Learning Applications (dadada2017)

How have convnets evolved?

1989

Page 10: Deep Learning Applications (dadada2017)

How have convnets evolved?

2012

Page 11: Deep Learning Applications (dadada2017)

How have convnets evolved?

Page 12: Deep Learning Applications (dadada2017)

How have convnets evolved?

2014

Page 13: Deep Learning Applications (dadada2017)
Page 14: Deep Learning Applications (dadada2017)

What can deep learning do?

Page 15: Deep Learning Applications (dadada2017)

What can deep learning do?

Page 16: Deep Learning Applications (dadada2017)

What can deep learning do?

Page 17: Deep Learning Applications (dadada2017)

What can deep learning do?

Page 18: Deep Learning Applications (dadada2017)

What can deep learning do?

Page 19: Deep Learning Applications (dadada2017)

What can deep learning do?● Natural language processing

Page 20: Deep Learning Applications (dadada2017)

What can deep learning do?● Natural language processing● Speech processing

Page 21: Deep Learning Applications (dadada2017)

What can deep learning do?● Natural language processing● Speech processing● Computer vision● And more and more

Page 22: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?

Page 23: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

Page 24: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts

Page 25: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers

Page 26: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

Page 27: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune

Page 28: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune ○ Convert data

Page 29: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune ○ Convert data○ Define net

Page 30: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune ○ Convert data○ Define net○ Define solver

Page 31: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune ○ Convert data○ Define net○ Define solver○ Train

Page 32: Deep Learning Applications (dadada2017)

How can I implement my own DeepNets?● Implement them on your own

○ Decompose into smaller parts○ Implement layers○ Start training

● Save yourself some time and finetune ○ Convert data○ Define net○ Define solver○ Train

● Caffe (caffe.berkeleyvision.org)● Keras (www.keras.io)

Page 33: Deep Learning Applications (dadada2017)

Caffe

Page 34: Deep Learning Applications (dadada2017)

Caffe● Speed

Page 35: Deep Learning Applications (dadada2017)

Caffe● Speed● Openness

Page 36: Deep Learning Applications (dadada2017)

Caffe● Speed● Openness● Modularity

Page 37: Deep Learning Applications (dadada2017)

Caffe● Speed● Openness● Modularity● Expression - No coding knowledge? No problem!

Page 38: Deep Learning Applications (dadada2017)

Caffe● Speed● Openness● Modularity● Expression - No coding knowledge? No problem!● Community

Page 39: Deep Learning Applications (dadada2017)

What do you need for Caffe?

Page 40: Deep Learning Applications (dadada2017)

What do you need for Caffe?● Convert data

Page 41: Deep Learning Applications (dadada2017)

What do you need for Caffe?● Convert data● Define a network (prototxt)

Page 42: Deep Learning Applications (dadada2017)

What do you need for Caffe?● Convert data● Define a network (prototxt)● Define a solver (prototxt)

Page 43: Deep Learning Applications (dadada2017)

What do you need for Caffe?● Convert data● Define a network (prototxt)● Define a solver (prototxt)● Train the network (with or without pre-trained weights)

Page 44: Deep Learning Applications (dadada2017)

Prototxt● solver.prototxt

Page 45: Deep Learning Applications (dadada2017)

Prototxt● train.prototxt

Page 46: Deep Learning Applications (dadada2017)

Prototxt● train_val.prototxt

Page 47: Deep Learning Applications (dadada2017)

Training a net using Caffe

Page 48: Deep Learning Applications (dadada2017)

Training a net using Caffe

/PATH_TO_CAFFE/caffe train --solver=solver.prototxt

Page 49: Deep Learning Applications (dadada2017)

Fine Tuning!● Fine tuning using GoogleNet● Why?

○ It has Google in its name○ It won ILSVRC 2014○ It’s complicated and I wanted to play with it

● Caffe model zoo offers a lot of pretrained nets, including GoogleNet● Model Zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo

Page 50: Deep Learning Applications (dadada2017)

Honey Bee vs. Bumble Bee

Tougher Than

Page 51: Deep Learning Applications (dadada2017)

Honey Bee vs. Bumble Bee

The Metis Challenge: Naive Bees Classifier @ Drivendata.Org

Page 52: Deep Learning Applications (dadada2017)

An initial model

Page 53: Deep Learning Applications (dadada2017)

An initial model

Page 54: Deep Learning Applications (dadada2017)

Steps to finetune

Page 55: Deep Learning Applications (dadada2017)

Steps to finetune● Create training and test files

Page 56: Deep Learning Applications (dadada2017)

Steps to finetune● Create training and test files● Get the prototxt files from model zoo

Page 57: Deep Learning Applications (dadada2017)

Steps to finetune● Create training and test files● Get the prototxt files from model zoo● Modify them

Page 58: Deep Learning Applications (dadada2017)

Steps to finetune● Create training and test files● Get the prototxt files from model zoo● Modify them● Run the caffe solver

Page 59: Deep Learning Applications (dadada2017)

Generating training and validation sets

Page 60: Deep Learning Applications (dadada2017)

Generating training and validation sets

Page 61: Deep Learning Applications (dadada2017)

Changes in train_val.prototxt

Page 62: Deep Learning Applications (dadada2017)

Changes in train_val.prototxt

Page 63: Deep Learning Applications (dadada2017)

Changes in train_val.prototxt

Page 64: Deep Learning Applications (dadada2017)

Changes in train_val.prototxt

Page 65: Deep Learning Applications (dadada2017)

Changes in solver.prototxt

Page 66: Deep Learning Applications (dadada2017)

Changes in solver.prototxt

Page 67: Deep Learning Applications (dadada2017)

Changes in solver.prototxt

And that’s all

Page 68: Deep Learning Applications (dadada2017)

Finetune your network

/PATH_TO_CAFFE/caffe train -solver ./solver.prototxt -weights ./models/bvlc_googlenet.caffemodel

Page 69: Deep Learning Applications (dadada2017)

Did the net learn something new?

Page 70: Deep Learning Applications (dadada2017)

Did the net learn something new?

Page 71: Deep Learning Applications (dadada2017)

Did the net learn something new?

Page 72: Deep Learning Applications (dadada2017)

Did the net learn something new?

Page 73: Deep Learning Applications (dadada2017)

Breaking down the various layers of GoogLeNet

Random PretrainedFinetuned

● inception_3a● inception_3b● inception_4a● inception_4b● inception_4c● inception_4d● inception_4e● inception_5a● inception_5b

Page 74: Deep Learning Applications (dadada2017)

Why finetune?

74

Page 75: Deep Learning Applications (dadada2017)

Why finetune?● It is faster

Page 76: Deep Learning Applications (dadada2017)

Why finetune?● It is faster● It is better (most of the times)

Page 77: Deep Learning Applications (dadada2017)

Why finetune?● It is faster● It is better (most of the times)● Why reinvent the wheel?

Page 78: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!

Page 79: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface

Page 80: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow

Page 81: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow● Theano

Page 82: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow● Theano● Lasagne

Page 83: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow● Theano● Lasagne● Keras

Page 84: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow● Theano● Lasagne● Keras● Neon

Page 85: Deep Learning Applications (dadada2017)

Tell me how to train a deepnet in Python!● Caffe has a python interface● Tensorflow● Theano● Lasagne● Keras● Neon● And lots more…..

Page 86: Deep Learning Applications (dadada2017)

Classifying Search Queries

Page 87: Deep Learning Applications (dadada2017)

Why classify search queries?● For businesses

○ Find out user-intent○ Track keywords according to transactional buying cycle of user○ Optimize website content and focus on smaller keyword set

Page 88: Deep Learning Applications (dadada2017)

Why classify search queries?● For business

○ Find out user-intent○ Track keywords according to transactional buying cycle of user○ Optimize website content and focussing on smaller keyword set

● For data scientists○ 100s of millions of unlabeled keywords to play with○ Why Not!

Page 89: Deep Learning Applications (dadada2017)

Word2Vec in Search Queries

Page 90: Deep Learning Applications (dadada2017)

Word2Vec in Search Queries

Page 91: Deep Learning Applications (dadada2017)

Word2Vec in Search Queries

Page 92: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Page 93: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Page 94: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Page 95: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Page 96: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Sequence for LSTM

Page 97: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Sequence for LSTM

Page 98: Deep Learning Applications (dadada2017)

Feeding Data to LSTMs

the white house

Sequence for LSTM

❖ United States❖ President❖ Politician❖ Washington❖ Lawyer❖ Secretary

Page 99: Deep Learning Applications (dadada2017)

Performance of the Network

Page 100: Deep Learning Applications (dadada2017)

Navigational QueriesTransactional

Queries

Informational Queries

Awareness

Decision

Evaluation

Retention

Page 101: Deep Learning Applications (dadada2017)

Representing Queries as Images

David VillaWord2Vec representations of the top search result titles

Apple juice

Irish

Page 102: Deep Learning Applications (dadada2017)

I don’t see much difference!

Guild Wars or Apple Juice

Page 103: Deep Learning Applications (dadada2017)
Page 104: Deep Learning Applications (dadada2017)

Machine Learning Models● Boosted trees

○ Word2vec embeddings○ Titles from top results○ Additional features of the SERP page○ TF-IDF○ XGBoost!!!! (https://github.com/dmlc/xgboost)

Page 105: Deep Learning Applications (dadada2017)

Machine Learning Models● Convolutional Neural Networks:

○ Using images directly

○ Using random crops from the image

Page 106: Deep Learning Applications (dadada2017)

Machine Learning Models● Convolutional Neural Networks:

○ Using images directly

○ Using random crops from the image

Convolutional Neural Network

Page 107: Deep Learning Applications (dadada2017)

Machine Learning Models● Convolutional Neural Networks:

○ Using images directly

○ Using random crops from the image

Convolutional Neural Network

Convolutional Neural Network

Page 108: Deep Learning Applications (dadada2017)

Neural Networks with Keras

Convolutional Neural Network

https://github.com/fchollet/keras

Page 109: Deep Learning Applications (dadada2017)

Neural Networks with Keras

Convolutional Neural Network

Page 110: Deep Learning Applications (dadada2017)

Approaching “any” ML problem

Page 111: Deep Learning Applications (dadada2017)

Approaching “any” ML problem

AutoCompete: A Framework for Machine Learning Competitions, A.Thakur and A Krohn-Grimberghe, ICML AutoML Workshop, 2015

Page 112: Deep Learning Applications (dadada2017)

Optimizing neural networks

Page 113: Deep Learning Applications (dadada2017)

Optimizing neural networks

AutoML Challenge: Rules for tuning Neural Networks, A.Thakur, ICML AutoML Workshop, System Desc Track, 2016

Page 114: Deep Learning Applications (dadada2017)

Selecting NNet Architecture

Page 115: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)

Page 116: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

Page 117: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons

Page 118: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU

Page 119: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

Page 120: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:

Page 121: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:○ 1200-1500 neurons

Page 122: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:○ 1200-1500 neurons○ High dropout: 40-50%

Page 123: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:○ 1200-1500 neurons○ High dropout: 40-50%

● Very big network:

Page 124: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:○ 1200-1500 neurons○ High dropout: 40-50%

● Very big network:○ 8000-10000 neurons in each layer

Page 125: Deep Learning Applications (dadada2017)

Selecting NNet Architecture● Always use SGD or Adam (for fast convergence)● Start low:

○ Single layer with 120-500 neurons○ Batch normalization + ReLU○ Dropout: 10-20%

● Add new layer:○ 1200-1500 neurons○ High dropout: 40-50%

● Very big network:○ 8000-10000 neurons in each layer○ 60-80% dropout

Page 126: Deep Learning Applications (dadada2017)

The AutoML Challenge

Page 127: Deep Learning Applications (dadada2017)

Some Results

AutoML Final1 Results

Page 128: Deep Learning Applications (dadada2017)

Some Results

AutoML Final4 Results

Page 129: Deep Learning Applications (dadada2017)

Some Results

AutoML GPU Track Results

Page 130: Deep Learning Applications (dadada2017)

@abhi1thakur

10 Things You Didn’t Know About Clickbaits!

Page 131: Deep Learning Applications (dadada2017)

What are clickbaits?● 10 things Apple didn’t tell you about the new iPhone● What happened next will surprise you● This is what the actor/actress from 90s looks like now● What did Donald Trump just say about Obama and Clinton● 9 things you must have to be a good data scientist

@abhi1thakur

Page 132: Deep Learning Applications (dadada2017)

What are clickbaits?

@abhi1thakur

Page 133: Deep Learning Applications (dadada2017)

What are clickbaits?● Interesting titles● Frustrating titles● Seldomly good enough content

● Google penalizes clickbait content● Facebook does the same

@abhi1thakur

Page 134: Deep Learning Applications (dadada2017)

The data● Crawl buzzfeed, clickhole● Crawl new york times, cnn● ~10000 titles

○ Clickbaits: buzzfeed, clickhole○ Non-clickbaits: new york times, cnn○ ~5000 from either categories

@abhi1thakur

Page 135: Deep Learning Applications (dadada2017)

Good old TF-IDF

● Very powerful● Used both character and word analyzers

@abhi1thakur

Page 136: Deep Learning Applications (dadada2017)

Some interesting words

@abhi1thakur

Page 137: Deep Learning Applications (dadada2017)

Some interesting words

@abhi1thakur

Page 138: Deep Learning Applications (dadada2017)

Let’s build some models

@abhi1thakur

Page 139: Deep Learning Applications (dadada2017)

Logistic Regression

@abhi1thakur

● ROC AUC Score = 0.987319021551● Precision Score = 0.950326797386● Recall Score = 0.939276485788● F1 Score = 0.944769330734

Page 140: Deep Learning Applications (dadada2017)

XGBoost

@abhi1thakur

● ROC AUC Score = 0.969700677962● Precision Score = 0.95756718529● Recall Score = 0.874677002584● F1 Score = 0.914247130317

Page 141: Deep Learning Applications (dadada2017)

Is that it?● No!● Model predictions:

○ “Donald Trump” : 15% Clickbait○ “Barack Obama”: 80% Clickbait

● Something was very wrong! ● TF-IDF didn’t capture the meanings

@abhi1thakur

Page 142: Deep Learning Applications (dadada2017)

Word2Vec● Shallow neural networks● Generates vectors of high dimension for every word● Every word gets a position in space● Similar words cluster together

@abhi1thakur

Page 143: Deep Learning Applications (dadada2017)

Word2Vec

@abhi1thakur

Page 144: Deep Learning Applications (dadada2017)

XGBoost + W2V

@abhi1thakur

● ROC AUC Score = 0.981312768055● Precision Score = 0.939947780679● Recall Score = 0.93023255814● F1 Score = 0.935064935065

Page 145: Deep Learning Applications (dadada2017)

Performance● Fast to train● Good results

@abhi1thakur

Page 146: Deep Learning Applications (dadada2017)

@abhi1thakur

Page 147: Deep Learning Applications (dadada2017)

Does word2vec capture everything?

Do we have all we need only from titles?

What if content of website isn’t clickbait-y?

@abhi1thakur

Page 148: Deep Learning Applications (dadada2017)

The data● Crawl Buzzfeed, NYT, CNN, clickhole, etc.● Too much work● Simple models● Doubts about results

● Crawl public Facebook pages:○ Buzzfeed○ CNN○ The New York Times○ Clickhole○ StopClickBaitOfficial○ Upworthy○ Wikinews

Facebook page scrapper is available here:https://github.com/minimaxir/facebook-page-post-scraper

@abhi1thakur

Page 149: Deep Learning Applications (dadada2017)

The data

● link_name (the title of the URL shared)● status_type (whether it’s a link, photo or a video)● status_link (the actual URL)

@abhi1thakur

Page 150: Deep Learning Applications (dadada2017)

@abhi1thakur

Page 151: Deep Learning Applications (dadada2017)

Data Processing● Get the HTML content too● Clean the mess up!

@abhi1thakur

Page 152: Deep Learning Applications (dadada2017)

Feature Generation● Size of the HTML (in bytes)● Length of HTML● Total number of links ● Total number of buttons ● Total number of inputs● Total number of unordered lists● Total number of ordered lists● Total number of lists (ordered +

unordered)

@abhi1thakur

● Total Number of H1 tags● Total Number of H2 tags● Full length of all text in all H1

tags that were found● Full length of all text in all H2

tags that were found● Total number of images● Total number of html tags● Number of unique html tags

Page 153: Deep Learning Applications (dadada2017)

More Features● All H1 text● All H2 text● Meta description

@abhi1thakur

Page 154: Deep Learning Applications (dadada2017)

Feature Generation

@abhi1thakur

Page 155: Deep Learning Applications (dadada2017)

Number of lists

@abhi1thakur

Page 156: Deep Learning Applications (dadada2017)

Number of links

@abhi1thakur

Page 157: Deep Learning Applications (dadada2017)

Number of images

@abhi1thakur

Page 158: Deep Learning Applications (dadada2017)

Number of buttons

@abhi1thakur

Page 159: Deep Learning Applications (dadada2017)

Customary word clouds

@abhi1thakur

Clickbaits Non-Clickbaits

Page 160: Deep Learning Applications (dadada2017)

Final Features

@abhi1thakur

Page 161: Deep Learning Applications (dadada2017)

Deep Learning Models● Simple LSTM● Two dense layers● Dropout + Batch Normalization● Softmax Activation

@abhi1thakur

Page 162: Deep Learning Applications (dadada2017)

Deep Learning Models

@abhi1thakur

Page 163: Deep Learning Applications (dadada2017)

Deep Learning Models

@abhi1thakur

Page 164: Deep Learning Applications (dadada2017)

Deep Learning Models

@abhi1thakur

Page 165: Deep Learning Applications (dadada2017)

Results

@abhi1thakur

Page 166: Deep Learning Applications (dadada2017)

Detecting Duplicates in Quora Questions

Page 167: Deep Learning Applications (dadada2017)

The Problem➢ ~ 13 million questions (as of March, 2017)➢ Many duplicate questions➢ Cluster and join duplicates together➢ Remove clutter

➢ First public data release: 24th January, 2017

Page 168: Deep Learning Applications (dadada2017)

Duplicate Questions➢ How does Quora quickly mark questions as needing improvement?➢ Why does Quora mark my questions as needing improvement/clarification

before I have time to give it details? Literally within seconds…

➢ What practical applications might evolve from the discovery of the Higgs Boson?

➢ What are some practical benefits of discovery of the Higgs Boson?

➢ Why did Trump win the Presidency?➢ How did Donald Trump win the 2016 Presidential Election?

Page 169: Deep Learning Applications (dadada2017)

Non-Duplicate Questions➢ Who should I address my cover letter to if I'm applying for a big company like

Mozilla?➢ Which car is better from safety view?""swift or grand i10"".My first priority is

safety?

➢ Mr. Robot (TV series): Is Mr. Robot a good representation of real-life hacking and hacking culture? Is the depiction of hacker societies realistic?

➢ What mistakes are made when depicting hacking in ""Mr. Robot"" compared to real-life cybersecurity breaches or just a regular use of technologies?

➢ How can I start an online shopping (e-commerce) website?➢ Which web technology is best suitable for building a big E-Commerce

website?

Page 170: Deep Learning Applications (dadada2017)

The Data➢ 400,000+ pairs of questions➢ Initially data was very skewed➢ Negative samples from related questions➢ Not real distribution on Quora’s website➢ Noise exists (as usual)

https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs

Page 171: Deep Learning Applications (dadada2017)

The Data➢ 255045 negative samples (non-duplicates) ➢ 149306 positive samples (duplicates)➢ 40% positive samples

Page 172: Deep Learning Applications (dadada2017)

The Data➢ Average number characters in question1: 59.57➢ Minimum number of characters in question1: 1➢ Maximum number of characters in question1: 623

➢ Average number characters in question2: 60.14➢ Minimum number of characters in question2: 1➢ Maximum number of characters in question2: 1169

Page 173: Deep Learning Applications (dadada2017)

Basic Feature Engineering➢ Length of question1➢ Length of question2➢ Difference in the two lengths➢ Character length of question1 without spaces➢ Character length of question2 without spaces➢ Number of words in question1➢ Number of words in question2➢ Number of common words in question1 and question2

Page 174: Deep Learning Applications (dadada2017)

Basic Feature Engineering

➢ Basic feature set: fs-1

data['len_q1'] = data.question1.apply(lambda x: len(str(x)))

data['len_q2'] = data.question2.apply(lambda x: len(str(x)))

data['diff_len'] = data.len_q1 - data.len_q2

data['len_char_q1'] = data.question1.apply(lambda x: len(''.join(set(str(x).replace(' ', '')))))

data['len_char_q2'] = data.question2.apply(lambda x: len(''.join(set(str(x).replace(' ', '')))))

data['len_word_q1'] = data.question1.apply(lambda x: len(str(x).split()))

data['len_word_q2'] = data.question2.apply(lambda x: len(str(x).split()))

data['common_words'] = data.apply(lambda x:

len(set(str(x['question1']).lower().split()).intersection(set(str(x['question2']).lower().split()))), axis=1)

Page 175: Deep Learning Applications (dadada2017)

Fuzzy Features➢ pip install fuzzywuzzy

➢ Uses Levenshtein distance➢ QRatio➢ WRatio➢ Token set ratio➢ Token sort ratio➢ Partial token set ratio➢ Partial token sort ratio➢ etc. etc. etc.

https://github.com/seatgeek/fuzzywuzzy

Page 176: Deep Learning Applications (dadada2017)

Fuzzy Features

➢ Fuzzy feature set: fs-2

data['fuzz_qratio'] = data.apply(lambda x: fuzz.QRatio(str(x['question1']), str(x['question2'])), axis=1)

data['fuzz_WRatio'] = data.apply(lambda x: fuzz.WRatio(str(x['question1']), str(x['question2'])), axis=1)

data['fuzz_partial_ratio'] = data.apply(lambda x: fuzz.partial_ratio(str(x['question1']), str(x['question2'])), axis=1)

data['fuzz_partial_token_set_ratio'] = data.apply(lambda x: fuzz.partial_token_set_ratio(str(x['question1']), str(x['question2'])),

axis=1)

data['fuzz_partial_token_sort_ratio'] = data.apply(lambda x: fuzz.partial_token_sort_ratio(str(x['question1']),

str(x['question2'])), axis=1)

data['fuzz_token_set_ratio'] = data.apply(lambda x: fuzz.token_set_ratio(str(x['question1']), str(x['question2'])), axis=1)

data['fuzz_token_sort_ratio'] = data.apply(lambda x: fuzz.token_sort_ratio(str(x['question1']), str(x['question2'])), axis=1)

Page 177: Deep Learning Applications (dadada2017)

TF-IDF➢ TF(t) = Number of times a term t appears in a document / Total number of

terms in the document➢ IDF(t) = log(Total number of documents / Number of documents with term t in

it)➢ TF-IDF(t) = TF(t) * IDF(t)

tfidf = TfidfVectorizer(min_df=3, max_features=None,

strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',

ngram_range=(1, 2), use_idf=1, smooth_idf=1, sublinear_tf=1,

stop_words = 'english')

Page 178: Deep Learning Applications (dadada2017)

SVD➢ Latent semantic analysis➢ scikit-learn version of SVD➢ 120 components

svd = decomposition.TruncatedSVD(n_components=120)

xtrain_svd = svd.fit_transform(xtrain)

xtest_svd = svd.transform(xtest)

Page 179: Deep Learning Applications (dadada2017)

Fuzzy Features➢ Also known as approximate string matching➢ Number of “primitive” operations required to convert string to exact match➢ Primitive operations:

○ Insertion○ Deletion○ Substitution

➢ Typically used for:○ Spell checking○ Plagiarism detection○ DNA sequence matching○ Spam filtering

Page 180: Deep Learning Applications (dadada2017)

A Combination of TF-IDF & SVD➢ TF-IDF features: fs3-1

Page 181: Deep Learning Applications (dadada2017)

A Combination of TF-IDF & SVD➢ TF-IDF features: fs3-2

Page 182: Deep Learning Applications (dadada2017)

A Combination of TF-IDF & SVD➢ TF-IDF + SVD features: fs3-3

Page 183: Deep Learning Applications (dadada2017)

A Combination of TF-IDF & SVD➢ TF-IDF + SVD features: fs3-4

Page 184: Deep Learning Applications (dadada2017)

A Combination of TF-IDF & SVD➢ TF-IDF + SVD features: fs3-5

Page 185: Deep Learning Applications (dadada2017)

Word2Vec Features➢ Multi-dimensional vector for all the words in any dictionary➢ Always great insights➢ Very popular in natural language processing tasks➢ Google news vectors 300d

Page 186: Deep Learning Applications (dadada2017)

Word2Vec Features➢ Representing words➢ Representing sentences

def sent2vec(s):

words = str(s).lower().decode('utf-8')

words = word_tokenize(words)

words = [w for w in words if not w in stop_words]

words = [w for w in words if w.isalpha()]

M = []

for w in words:

M.append(model[w])

M = np.array(M)

v = M.sum(axis=0)

return v / np.sqrt((v ** 2).sum())

Page 187: Deep Learning Applications (dadada2017)

W2V Features: WMD

Kusner, M., Sun, Y., Kolkin, N. & Weinberger, K.. (2015). From Word Embeddings To Document Distances.

Page 188: Deep Learning Applications (dadada2017)

W2V Features: Skew➢ Skew = 0 for normal distribution➢ Skew > 0: more weight in left tail

Page 189: Deep Learning Applications (dadada2017)

W2V Features: Kurtosis➢ 4th central moment over the square of variance➢ Types:

○ Pearson○ Fisher: subtract 3.0 from result such that result is 0 for normal distribution

Page 190: Deep Learning Applications (dadada2017)

W2V Features➢ Word2Vec feature set: fs-4

scipy.spatial.distance

scipy.stats

minkowski

jaccard

manhattanbraycurtis

euclidean

cosine

canberra

kurtosisskew

Page 191: Deep Learning Applications (dadada2017)

Raw Word2Vec Vectors

https://www.kaggle.com/jeffd23/visualizing-word-vectors-with-t-sne

➢ Raw W2V feature set: fs-5

Page 192: Deep Learning Applications (dadada2017)

Features Snapshot

Page 193: Deep Learning Applications (dadada2017)

Feature Snapshot

Page 194: Deep Learning Applications (dadada2017)

Machine Learning Models

Page 195: Deep Learning Applications (dadada2017)

Machine Learning Models➢ Logistic regression➢ Xgboost➢ 5 fold cross-validation➢ Accuracy as a comparison metric (also, precision + recall)➢ Why accuracy?

Page 196: Deep Learning Applications (dadada2017)

Results

Page 197: Deep Learning Applications (dadada2017)

Deep Learning

Page 198: Deep Learning Applications (dadada2017)

LSTM➢ Long short term memory➢ A type of RNN➢ Learn long term dependencies➢ Used two LSTM layers

Page 199: Deep Learning Applications (dadada2017)

1D CNN➢ One dimensional convolutional layer➢ Temporal convolution➢ Simple to implement:

for i in range(sample_length):

y[i] = 0

for j in range(kernel_length):

y[i] += x[i-j] * h[j]

Page 200: Deep Learning Applications (dadada2017)

Embedding Layers➢ Simple layer➢ Converts indexes to vectors➢ [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]]

Page 201: Deep Learning Applications (dadada2017)

Time Distributed Dense Layer➢ TimeDistributed wrapper around dense layer➢ TimeDistributed applies the layer to every temporal slice of input➢ Followed by Lambda layer➢ Implements “translation” layer used by Stephen Merity (keras snli model)

model1 = Sequential()

model1.add(Embedding(len(word_index) + 1,

300,

weights=[embedding_matrix],

input_length=40,

trainable=False))

model1.add(TimeDistributed(Dense(300, activation='relu')))

model1.add(Lambda(lambda x: K.sum(x, axis=1), output_shape=(300,)))

Page 202: Deep Learning Applications (dadada2017)

GloVe Embeddings➢ Count based model➢ Dimensionality reduction on co-occurrence counts matrix➢ word-context matrix -> word-feature matrix➢ Common Crawl

○ 840B tokens, 2.2M vocab, 300d vectors

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation

Page 203: Deep Learning Applications (dadada2017)

Basis of Deep Learning Model➢ Keras-snli model: https://github.com/Smerity/keras_snli

Page 204: Deep Learning Applications (dadada2017)

Before Training DeepNets➢ Tokenize data➢ Convert text data to sequences

tk = text.Tokenizer(nb_words=200000)

max_len = 40

tk.fit_on_texts(list(data.question1.values) + list(data.question2.values.astype(str)))

x1 = tk.texts_to_sequences(data.question1.values)

x1 = sequence.pad_sequences(x1, maxlen=max_len)

x2 = tk.texts_to_sequences(data.question2.values.astype(str))

x2 = sequence.pad_sequences(x2, maxlen=max_len)

word_index = tk.word_index

Page 205: Deep Learning Applications (dadada2017)

Before Training DeepNets➢ Initialize GloVe embeddings

embeddings_index = {}

f = open('data/glove.840B.300d.txt')

for line in tqdm(f):

values = line.split()

word = values[0]

coefs = np.asarray(values[1:], dtype='float32')

embeddings_index[word] = coefs

f.close()

Page 206: Deep Learning Applications (dadada2017)

Before Training DeepNets➢ Create the embedding matrix

embedding_matrix = np.zeros((len(word_index) + 1, 300))

for word, i in tqdm(word_index.items()):

embedding_vector = embeddings_index.get(word)

if embedding_vector is not None:

embedding_matrix[i] = embedding_vector

Page 207: Deep Learning Applications (dadada2017)

Final Deep Learning Model

Page 208: Deep Learning Applications (dadada2017)
Page 209: Deep Learning Applications (dadada2017)

Final Deep Learning Model

Page 210: Deep Learning Applications (dadada2017)

Model 1 and Model 2

model1 = Sequential()

model1.add(Embedding(len(word_index) + 1,

300,

weights=[embedding_matrix],

input_length=40,

trainable=False))

model1.add(TimeDistributed(Dense(300, activation='relu')))

model1.add(Lambda(lambda x: K.sum(x, axis=1),

output_shape=(300,)))

model2 = Sequential()

model2.add(Embedding(len(word_index) + 1,

300,

weights=[embedding_matrix],

input_length=40,

trainable=False))

model2.add(TimeDistributed(Dense(300, activation='relu')))

model2.add(Lambda(lambda x: K.sum(x, axis=1),

output_shape=(300,)))

Page 211: Deep Learning Applications (dadada2017)

Final Deep Learning Model

Page 212: Deep Learning Applications (dadada2017)

Model 3 and Model 4

Page 213: Deep Learning Applications (dadada2017)

Model 3 and Model 4model3 = Sequential()

model3.add(Embedding(len(word_index) + 1,

300,

weights=[embedding_matrix],

input_length=40,

trainable=False))

model3.add(Convolution1D(nb_filter=nb_filter,

filter_length=filter_length,

border_mode='valid',

activation='relu',

subsample_length=1))

model3.add(Dropout(0.2))

.

.

.

model3.add(Dense(300))

model3.add(Dropout(0.2))

model3.add(BatchNormalization())

Page 214: Deep Learning Applications (dadada2017)

Final Deep Learning Model

Page 215: Deep Learning Applications (dadada2017)

Model 5 and Model 6

model5 = Sequential()

model5.add(Embedding(len(word_index) + 1, 300, input_length=40,

dropout=0.2))

model5.add(LSTM(300, dropout_W=0.2, dropout_U=0.2))

model6 = Sequential()

model6.add(Embedding(len(word_index) + 1, 300, input_length=40,

dropout=0.2))

model6.add(LSTM(300, dropout_W=0.2, dropout_U=0.2))

Page 216: Deep Learning Applications (dadada2017)

Final Deep Learning Model

Page 217: Deep Learning Applications (dadada2017)

Merged Model

Page 218: Deep Learning Applications (dadada2017)

Time to Train the DeepNet➢ Total params: 174,913,917➢ Trainable params: 60,172,917➢ Non-trainable params: 114,741,000

➢ NVIDIA Titan X

Page 219: Deep Learning Applications (dadada2017)
Page 220: Deep Learning Applications (dadada2017)

Combined Results

The deep network was trained on

an NVIDIA TitanX and took

approximately 300 seconds for

each epoch and took 10-15 hours

to train. This network achieved

an accuracy of 0.848 (~0.85).

Page 221: Deep Learning Applications (dadada2017)

Improving Further➢ Cleaning the text data, e.g correcting mis-spellings➢ POS tagging➢ Entity recognition➢ Combining deepnet with traditional ML models

Page 222: Deep Learning Applications (dadada2017)

Conclusion & References➢ The deepnet gives near state-of-the-art result➢ BiMPM model accuracy: 88%

Some reference:

➢ Zhiguo Wang, Wael Hamza and Radu Florian. "Bilateral Multi-Perspective Matching for Natural Language Sentences," (BiMPM)

➢ Matthew Honnibal. "Deep text-pair classification with Quora's 2017 question dataset," 13 February 2017. Retreived at https://explosion.ai/blog/quora-deep-text-pair-classification

➢ Bradley Pallen’s work: https://github.com/bradleypallen/keras-quora-question-pairs

Page 223: Deep Learning Applications (dadada2017)
Page 224: Deep Learning Applications (dadada2017)

Natural Language Processing

Pre-trained domain knowledge

Classification of intentIdentify entities

(extracting information)

API

Analytics

Delegation to customer support

Delegation to back-end robots

INSTANT PROCESSING and END-TO-END AUTOMATION

Monitoring and AI training

Chat Avatar

Text(Speech)

Page 225: Deep Learning Applications (dadada2017)

Pre-defined replyEnquiry

Intent classificationPre-processing of enquiry

Stemming Cross-languageMisspellings algorithm

1. Insurance2. Vehicle3. Car4.Rules for practice driving

Conversation without APIYou don’t need to adjust your car insurance when practise driving with a learner’s permit. In case of damage it’s the supervisor with a full driver’s license that shall write and sign the insurance claim

Hey you, do you knoww if my car insruacne covers practice driving??

Page 226: Deep Learning Applications (dadada2017)

Hi James, what’s the weather in Berlin on Thursday?

Thursday’s forecast for Berlin is partly sunny and mostly clouds.

Required value - Location

Optional value- Date

Conversation with APIRedirect to API- Weather

Page 228: Deep Learning Applications (dadada2017)

Thank you!Questions / Comments?

All The Code:

❖ github.com/abhishekkrthakur

Get in touch:

➢ E-mail: [email protected]➢ LinkedIn: bit.ly/thakurabhishek➢ Kaggle: kaggle.com/abhishek➢ Twitter: @abhi1thakur

If everything fails, use Xgboost