![Page 1: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/1.jpg)
Text ClassificationNatural Language Processing: Lecture 9
02.11.2017
Kairit Sirts
![Page 2: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/2.jpg)
2
![Page 3: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/3.jpg)
3
![Page 4: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/4.jpg)
4
![Page 5: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/5.jpg)
Text/document classification
• Spam detection
• Topic classification (sport/finance/travel etc)
• Genre classification (news/sports/fiction/social media etc)
• Sentiment analysis
5
![Page 6: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/6.jpg)
Authorship attribution
• Native language identification
• Clinical text classification – diagnosing psychiatric or cognitive impairments
• Identification of gender, dialect, educational background
6
![Page 7: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/7.jpg)
Types of classification tasks
• Binary classification (true/false, 1/0, 1/-1)• Naïve Bayes
• Logistic regression
• Multi-class classification (politics/finance/travel)
• Multi-label classification (image captioning)
• Clustering: mostly unsupervised• Topic modeling – important but I will not talk about it today
7
![Page 8: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/8.jpg)
Topics
• Generative vs discriminative classifiers
• Document representation
• Naïve Bayes classification
• Logistic regression
• Neural text classification
• Evaluation
• Slides mostly based on:• Text classification with Naïve Bayes by Sharon Goldwater• Text classification by Karl Moritz Hermann
8
![Page 9: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/9.jpg)
Generative task formulation
• Given document d and a set of class labels C, assign to d the most probable label ĉ.
9
Bayes rule
Likelihood
Prior probability
Posterior probability
Naïve Bayes
![Page 10: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/10.jpg)
Discriminative task formulation
• Given document d and a set of class labels C, assign to d the most probable label ĉ.
10
Logistic regression
![Page 11: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/11.jpg)
Generative vs discriminative models
• Generative (joint) models – P(c, d)• Model the probability of both input document and output label
• Can be used to generate a new document with a particular label
• N-gram models, HMMs, PCFGs, Naïve Bayes
• Discriminative (conditional) models - P(c|d)• Learn boundaries between classes. Input data is taken as given
• Logistic regression, maximum entropy models, CRFs
11
![Page 12: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/12.jpg)
How to represent document d?
• Bag of Words (BOW)• Easy, no effort required
• Ignores sentence structure
• Hand-crafted features• Can use NLP pipeline, class-specific features
• Incomplete, makes use of NLP pipeline
• Learned feature representation• Can learn to contain all relevant information
• Needs to be learned
12
![Page 13: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/13.jpg)
BOW representations
• Binary BOW
• Multinomial BOW
• Tf-idf
13
the your model cash Viagra class account orderz
1 1 0 1 1 0 1 1
the your model cash Viagra class account orderz
14 2 0 1 3 0 1 1
![Page 14: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/14.jpg)
Tf-idf
• Tf-idf = tf (term frequency) x idf (inverse document frequency)
• Term frequency:• Raw frequency
• Binary
• Raw frequency normalised by the document length
• Inverse document frequency• N - the number of documents
• nt – the number of documents containing term t
14
![Page 15: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/15.jpg)
Tf-idf example
• Tf-idf(doc4, the) =
• Tf-idf(doc4, your) =
• Tf-idf(doc4, model) =
• Tf-idf(doc4, cash) =
• Tf-idf(doc4, Viagra) =
• Tf-idf(doc4, class) =
• Tf-idf(doc4, account) =
• Tf-idf(doc4, orderz) =
15
the your model cash Viagra class account orderz
doc1 12 3 1 0 0 2 0 0
doc2 10 4 0 4 0 0 2 0
doc3 25 4 0 0 0 1 1 0
doc4 14 2 0 1 3 0 1 1
doc5 17 5 0 2 0 0 1 1
![Page 16: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/16.jpg)
Tf-idf example
• Tf-idf(doc4, the) = 0
• Tf-idf(doc4, your) = 0
• Tf-idf(doc4, model) = 0
• Tf-idf(doc4, cash) = log(5/3) = 0.51
• Tf-idf(doc4, Viagra) = 3log5 = 4.83
• Tf-idf(doc4, class) = 0
• Tf-idf(doc4, account) = log(5/4) = 0.22
• Tf-idf(doc4, orderz) = log(5/2) = 0.92
16
the your model cash Viagra class account orderz
doc1 12 3 1 0 0 2 0 0
doc2 10 4 0 4 0 0 2 0
doc3 25 4 0 0 0 1 1 0
doc4 14 2 0 1 3 0 1 1
doc5 17 5 0 2 0 0 1 1
![Page 17: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/17.jpg)
Naïve Bayes
17
![Page 18: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/18.jpg)
Naïve Bayes classifier
• Bayes rule
• Ignore the denominator because it does not depend on class label c:
• Assume that the document is represented as a list of features:
• How to compute the likelihood?
18
![Page 19: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/19.jpg)
Naïve Bayes assumption
• Assume that the features are conditionally independent given the class label c:
19
![Page 20: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/20.jpg)
Full model
• Given document with features and a set of labels C, choose:
20
![Page 21: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/21.jpg)
Naïve Bayes as generative model
• Naïve Bayes model describes a generative process
• Assumes the data (features in each document) were generated as follows:
1. For each document, sample its class c from the prior P(c)
2. For each document, sample the value of each feature from P(fi|c)
21
![Page 22: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/22.jpg)
Learning the class priors
• P(c) is usually estimated with MLE:
• Nc- the number of training documents with class c
• N – the total number of training documents
22
![Page 23: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/23.jpg)
Learning the class priors: example
• Given training documents with class labels:
• P(spam) =
• P(not spam) =
23
the your model cash Viagra class account orderz Spam?
doc1 12 3 1 0 0 2 0 0 -
doc2 10 4 0 4 0 0 2 0 +
doc3 25 4 0 0 0 1 1 0 -
doc4 14 2 0 1 3 0 1 1 +
doc5 17 5 0 2 0 0 1 1 +
![Page 24: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/24.jpg)
Learning the class priors: example
• Given training documents with class labels:
• P(spam) = 3/5
• P(not spam) = 2/5
24
the your model cash Viagra class account orderz Spam?
doc1 12 3 1 0 0 2 0 0 -
doc2 10 4 0 4 0 0 2 0 +
doc3 25 4 0 0 0 1 1 0 -
doc4 14 2 0 1 3 0 1 1 +
doc5 17 5 0 2 0 0 1 1 +
![Page 25: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/25.jpg)
Learning the feature probabilities
• P(fi|c) are normally estimated using smoothing:
• count(fi, c) – the number of times feature fi occurs with class c
• F – the set of possible features
• α - smoothing parameter, optimised on development set
25
![Page 26: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/26.jpg)
Learning the feature probabilities: example
26
the your model cash Viagra class account orderz Spam?
doc1 12 3 1 0 0 2 0 0 -
doc2 10 4 0 4 0 0 2 0 +
doc3 25 4 0 0 0 1 1 0 -
doc4 14 2 0 1 3 0 1 1 +
doc5 17 5 0 2 0 0 1 1 +
![Page 27: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/27.jpg)
Learning the feature probabilities: example
27
the your model cash Viagra class account orderz Spam?
doc1 12 3 1 0 0 2 0 0 -
doc2 10 4 0 4 0 0 2 0 +
doc3 25 4 0 0 0 1 1 0 -
doc4 14 2 0 1 3 0 1 1 +
doc5 17 5 0 2 0 0 1 1 +
![Page 28: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/28.jpg)
Classifying a new document: example
• Test document d: get your cash and your orderz
[your, cash, your orderz]
• Suppose that there are no other features than those given in the previous table
• Compute
• Compute
• Choose the class with larger value
28
![Page 29: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/29.jpg)
Advantages of Naïve Bayes
• Very easy to implement (scikit-learn also has it)
• Very fast to train and test
• Might work reasonably well even when there is not that much training data
• One of the default baselines for any text classification tasks
29
![Page 30: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/30.jpg)
Problems with Naïve Bayes
• It is naïve! The features usually really aren’t conditionally independent
• Consider the categories travel, finance, sport
• Are the following features independent given the category?• beach, sun, ski, snow, pitch, palm, football, relax, ocean
• Many feature types are themselves correlated (e.g. words and morphemes)
• The accuracy can be ok but the classifier might become over-confident because it treats the info given by correlated features as independent sources of evidence.
30
![Page 31: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/31.jpg)
Logistic regression
31
![Page 32: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/32.jpg)
Logistic regression
32https://en.wikipedia.org/wiki/Logistic_function
w – parameter vectorf(d) - document representation
![Page 33: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/33.jpg)
Advantages of logistic regression
• Still reasonably simple
• Results are interpretable (regularization makes it harder!)
• Features can be arbitrarily complex
• Do not assume statistical independence between features
33
![Page 34: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/34.jpg)
Drawbacks of logistic regression
• Harder to learn than Naïve Bayes
• Feature engineering is difficult
• Extracting complex features can be expensive
34
![Page 35: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/35.jpg)
Neural text classification
35
![Page 36: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/36.jpg)
Recurrent neural language model
36
• Agnostic to actual recurrent function (vanilla RNN, LSTM, GRU, ..)
• Reads inputs xi to accumulate state hi and predicts output yi
![Page 37: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/37.jpg)
Text representation with RNN
37
• hi is a function of x{0:i} and h{0:i-1}
• It contains information about all text read up to point i
• hi is basically a representation of the text x{0:i}
• Given input text xi, …, xn, hn is the representation of the whole text
• This representation can be viewed as an alternative to other representations (BOW, tf-idf, …)
![Page 38: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/38.jpg)
Text classification with an RNN
• hn as input given to a logistic regression (softmax) classifier
38
![Page 39: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/39.jpg)
Text classification with an RNN
39
![Page 40: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/40.jpg)
Training with cross-entropy loss
• yc’ = 1 if c’ is equal to the correct class label c and 0 otherwise
• The loss is 0 when the model predicts correctly P(c|d) = 1
• The loss is large when P(c|d) is small, e.g. the model predicts incorrectly
40
![Page 41: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/41.jpg)
Cross-entropy loss: example
• L(doc1) =
• L(doc2) =
41
Sports Travel Finance Politics Health
Doc1 0 0 1 0 0
Doc2 0 1 0 0 0
Predicted 0.1 0.02 0.87 0.006 0.004
![Page 42: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/42.jpg)
Cross-entropy loss: example
• L(doc1) = -log(0.87) = 0.14
• L(doc2) = -log(0.02) = 3.91
42
Sports Travel Finance Politics Health
Doc1 0 0 1 0 0
Doc2 0 1 0 0 0
Predicted 0.1 0.02 0.87 0.006 0.004
![Page 43: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/43.jpg)
Multi-label classification
• What if each document can have multiple labels?• Turn the multi-label classification problem into several binary classification
problems
• Use binary cross-entropy
• Form “super-classes” from label combinations and perform multi-class classification
43
![Page 44: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/44.jpg)
Dual Objective RNN
• In practice it may make sense to combine and LM objective with classifier training and to optimise the two losses jointly
44
![Page 45: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/45.jpg)
Bidirectional RNN classifier
45
![Page 46: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/46.jpg)
Non-sequential neural networks
• Recursive neural networks model the intrinsic hierarchical structure of language
• Convolutional neural networks were designed for image classification but can be adapted to sequential language data too
46
![Page 47: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/47.jpg)
Recursive neural networks
• Composition follows syntax
• Composition function
• xl – representation of the left branch
• xr – representation of the right branch
• Wl, Wr – parameter matrices
47
![Page 48: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/48.jpg)
Convolutional neural networks
48
![Page 49: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/49.jpg)
EvaluationWe have talked about it before
49
![Page 50: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/50.jpg)
Evaluation methods and measures
• Extrinsic evaluation: measure effects on a downstream task• Difficult
• Intrinsic evaluation: measure the accuracy on a test set• Simple, doesn’t always correlate with the tasks we are intrested in
• Accuracy: not suitable when the classes are unbalanced
• Precision: the proportion of correctly predicted among all predicted
• Recall: the proportion of correctly predicted among all that could have been predicted
• F1-score: combines precision and recall
50
![Page 51: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/51.jpg)
Precision, recall and F1-score
• TP – true positives – correctly predicted positives
• FP – false positives – negatives that were predicted as positives
• FN – false negatives – positives that were predicted as negatives
51
![Page 52: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/52.jpg)
Precision and recall: example
• True Positives =
• False Positives =
• False Negatives =
• Precision =
• Recall =
• F1-score =
52
Document about Sports?
Gold Y Y N N Y N N N Y N N
Predicted N Y N Y N N N N Y N N
![Page 53: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/53.jpg)
Precision and recall: example
• True Positives = 2
• False Positives = 1
• False Negatives = 2
• Precision = 2/3
• Recall = 2/4
• F1-score = 4/7
53
Document about Sports?
Gold Y Y N N Y N N N Y N N
Predicted N Y N Y N N N N Y N N
![Page 54: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/54.jpg)
Tuning precision vs recall
• The default classification threshold is 0.5• P(spam|doc) > 0.5 --> email is spam
• P(spam|doc) <= 0.5 --> email is not spam
• The threshold can be tuned to change precision and recall:• Raise threshold --> higher precision, lower recall
• Lower threshold --> lower precision, higher recall
54
![Page 55: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/55.jpg)
Precision-recall curve
55https://stackoverflow.com/questions/33294574/good-roc-curve-but-poor-precision-recall-curve
![Page 56: Natural Language ProcessingEvaluation methods and measures •Extrinsic evaluation: measure effects on a downstream task •Difficult •Intrinsic evaluation: measure the accuracy](https://reader035.vdocuments.site/reader035/viewer/2022071212/6025158b2c602135e24fabb3/html5/thumbnails/56.jpg)
Conclusion
• For solving a text classification task several decisions have to be made:• Representation: BOW, linguistic, distributed
• Classifier: Naïve Bayes, logistic regression, SVM, Decision tree
• In case of neural classifier the architecture: sequential, recursive, CNN
• Evaluation measure: accuracy, precision, recall, F1-score• Preferably compute all of them
56