machine learning in nlp

Meandering through the machine learning maze

NLP & Machine Learning

Vijay Ganti

** I want to acknowledge full credit for the content of this deck to various sources including Stanford NLP & Deep Learning content, Machine Learning Lectures by Andrew Ng, Natural Language Processing with Python and many more. This was purely an educational presentation with no monetary gains for any parties

NLP powered by ML will change the way business gets done

❖ Conversational agents are becoming an important form of human-computer communication (Customer support interactions using chat-bots)

❖ Much of human-human communication is now mediated by computers (Email, Social Media, Messaging)

❖ An enormous amount of knowledge is now available in machine readable form as natural language text (web, proprietary enterprise content)

What do I hope to achieve with this talk?

My 3 cents

Make it accessible & realGet you excitedGive you a sense of what it takes

Make it real

Let’s start with a typical ML workflow

Training Data Feature Extractor ML Algo

Prediction Data Feature Extractor Classifier Model

Features

Features Label

Input + Labels

Input Only

Let’s walk through the workflow with an example

names_data = [(n,'male') for n in names.words('male.txt')]+ [(n,'female') for n in names.words('female.txt')]Training Data

Input + Labels

Feature Extractor

def gender_feature1(name):return {'Last Letter': name[-1]}

def gender_feature2(name):return {'last_letter': name[-1],'last_two_letters': name[-2:]}

Featuresfeature_set = [(gender_feature1(n), g) for n,g in names_data]

feature_set = [(gender_feature2(n), g) for n,g in names_data]

train, devtest, test = feature_set[1500:], feature_set[500:1500], feature_set[:500]Partition Data

classifier = nltk.NaiveBayesClassifier.train(train)

ML Algo

Classifier Model

Prediction_Accuracy = nltk.classify.accuracy(classifier, devtest)

Prediction = classifier.classify(gender_feature2(‘Kathryn’))Prediction = (classifier.classify(gender_feature2(‘Vijay'))Prediction = (classifier.classify(gender_feature2('Jenna'))

Classifier Model

Prediction

Exciting possibilities

Why is NLP hard?

Violinist Linked to JAL Crash Blossoms

Teacher Strikes Idle Kids

Red Tape Holds Up New Bridges

Hospitals Are Sued by 7 Foot Doctors

Juvenile Court to Try Shooting Defendant

Local High School Dropouts Cut in Half

Ambiguity

Tricky NEsInformal English

Why is NLP hard..er?

Segmentation issues

the New York-New Haven Railroad

the New York-New Haven Railroad

IdiomsKodak moment

As cool as cucumber

Hold your horses

Kick the bucket

NeologismsGamify

Double-click

Bromance

Unfriend

Great job @justinbieber! Were SOO PROUD of what youve accomplished! U taught us 2 #neversaynever & you yourself should never give up either♥

Where is A Bug’s Life playing …

Let It Be was recorded …

… a mutation on the for gene …

State of the language technology

Let’s go to Agra

Buy V1AGRA …

Spam Detection

Adj Noun Adv Noun Verb

big dog quickly China trump

Part of Speech Tagging (POS)

Person Company Place

Satya met with LinkedIn in Palo Alto

Named Entity Recognition (NER)

Sentiment Analysis

Vijay told Arnaud that he was wrong

Coreference Resolution

I need new batteries for my mouse

Word sense disambiguation

Information Extraction

MOSTLY SOLVED GOOD PROGRESS STILL VERY HARD

Questions and Answers

What kind of cable do I need to project from my Mac ?

Paraphrasing

Dell acquired EMC last year

EMC was taken over by Dell 8 months back.

Summarization

Brexit vote shocked everyone

Financial markets are in turmoil

Young people are not happy

Brexit has been disruptive

Chat/Dialog

What movie is playing this evening?

Casablanca. Do you want to buy tickets ?

Why is this a good time to solve hard problems in NLP?

❖ What has changed since 2006?

❖ New methods for unsupervised pre-training have been developed (Restricted Boltzmann Machines, auto-encoders, contrastive estimation, etc.)

❖ More efficient parameter estimation methods

❖ Better understanding of model regularization

❖ Changes in computing technology favor deep learning

❖ In NLP, speed has traditionally come from exploiting sparsity

❖ But with modern machines, branches and widely spaced memory accesses are costly

❖ Uniform parallel operations on dense vectors are faster These trends are even stronger with multi-core CPUs and GPUs

❖ Wide availability of ML implementations and computing infrastructure to run them

Come back to this slide

What does it take ?

What you need to learn to make machines learn ?❖ Machine Learning Process & Concepts

❖ Feature engineering

❖ Bias / Variance (overfitting, underfitting)

❖ Performance testing (accuracy, precision, recall, f-score)

❖ regularization

❖ Parameter selection

❖ Algo selection ( Wait for next slide for a quick summary of algos used at Netflix)

❖ Probability

❖ Linear Algebra (vector & matrices)

❖ Some calculus

❖ Python

❖ Octave/Matlab

Why is this a good time to solve hard problems in NLP?

❖ What has changed since 2006?

❖ New methods for unsupervised pre-training have been developed (ex. Restricted Boltzmann Machines, auto-encoders)

❖ More efficient parameter estimation methods

❖ Better understanding of model regularization

❖ Changes in computing technology favor deep learning

❖ In NLP, speed has traditionally come from exploiting sparsity

❖ But with modern machines, branches and widely spaced memory accesses are costly

❖ Uniform parallel operations on dense vectors are faster These trends are even stronger with multi-core CPUs and GPUs

❖ Wide availability of ML implementations and computing infrastructure to run them

Now this old slide will make sense

ML Sequence model approach to NER❖ Training

1. Collect a set of representative training documents

2. Label each token for its entity class or other (O)

3. Design feature extractors appropriate to the text and classes

4. Train a sequence classifier to predict the labels from the data

❖ Testing

1. Receive a set of testing documents

2. Run sequence model inference to label each token

3. Appropriately output the recognized entities

Old skills are still very relevant - Role of Reg Expression

theMisses capitalized examples

[tT]he Incorrectly returns other or theology

[^a-zA-Z][tT]he[^a-zA-Z]

Find me all instances of the word “the” in a text.

❖ Regular expressions play a surprisingly large role❖ Sophisticated sequences of regular expressions are often the first model for any

text processing❖ For many hard tasks, we use machine learning classifiers

❖ But regular expressions are used as features in the classifiers❖ Can be very useful in capturing generalizations

/^(?:(?:$?(?:00|\+)([1-4]\d\d|[1-9]\d?)$?)?[\-\.\ \\\/]?)?((?:$?\d{1,}$?[\-\.\ \\\/]?){0,})(?:[\-\.\ \\\/]?(?:#|ext\.?|extension|x)[\-\.\ \\\/]?(\d+))?$/i

Algo Selection example - Logistic Regression vs SVM

❖ If n is large (relative to m), then use logistic regression, or SVM without a kernel (the "linear kernel")

❖ If n is small and m is intermediate, then use SVM with a Gaussian Kernel

❖ If n is small and m is large, then manually create/add more features, then use logistic regression or SVM without a kernel.

machine learning in nlp

Data & Analytics