information retrieval lecture 4 introduction to information retrieval (manning et al. 2007) chapter...
TRANSCRIPT
![Page 1: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/1.jpg)
Information Retrieval
Lecture 4Introduction to Information Retrieval (Manning et al. 2007)
Chapter 13
For the MSc Computer Science Programme
Dell ZhangBirkbeck, University of London
![Page 2: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/2.jpg)
Is this spam?
![Page 3: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/3.jpg)
Text Classification/Categorization Given:
A document, dD. A set of classes C = {c1, c2,…, cn}.
Determine: The class of d: c(d)C, where c(d) is a
classification function (“classifier”).
![Page 4: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/4.jpg)
Classification Methods (1)
Manual Classification For example,
Yahoo! Directory, DMOZ, Medline, etc. Very accurate when job is done by experts. Difficult to scale up.
![Page 5: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/5.jpg)
Classification Methods (2)
Hand-Coded Rules For example,
CIA, Reuters, SpamAssassin, etc. Accuracy is often quite high, if the rules have
been carefully refined over time by experts. Expensive to build/maintain the rules.
![Page 6: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/6.jpg)
Classification Methods (3)
Machine Learning (ML) For example
Automatic Email Classification: PopFile http://popfile.sourceforge.net/
Automatic Webpage Classification: MindSet http://mindset.research.yahoo.com/
There is no free lunch: hand-classified training data are required.
But the training data can be built up (and refined) easily by amateurs.
![Page 7: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/7.jpg)
Text Classification via ML
L Classifier U
Learning Predicting
TrainingDocuments
TestDocuments
![Page 8: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/8.jpg)
TrainingData:
TestData:
Classes:
Text Classification via ML - Example
Multimedia GUIGarb.Coll.SemanticsML Planning
planningtemporalreasoningplanlanguage...
programmingsemanticslanguageproof...
learningintelligencealgorithmreinforcementnetwork...
garbagecollectionmemoryoptimizationregion...
“planning language proof intelligence”
(AI) (Programming) (HCI)
... ...
![Page 9: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/9.jpg)
Evaluating Classification
Classification Accuracy The proportion of correct predictions
Precision, Recall F1 (for each class) macro-averaging: computes performance
measure for each class, and then computes a simple average over classes.
micro-averaging: pools per-document predictions across classes, and then computes performance measure on the pooled contingency table.
![Page 10: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/10.jpg)
Sample Learning Curve
Yahoo Science Data
![Page 11: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/11.jpg)
Bayesian Methods for Classification Before seeing the content of document d
Classify d to the class with maximum prior probability Pr[c].
For each class cjC, Pr[cj] could be estimated from the training data:
j j
jj N
Nc ]Pr[
Nj: the number of documents in the class cj
![Page 12: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/12.jpg)
Bayesian Methods for Classification After seeing the content of document d
Classify d to the class with maximum a posterio probability Pr[c|d].
For each class cjC, Pr[cj|d] could be computed by the Bayes’ Theorem.
![Page 13: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/13.jpg)
Bayes’ Theorem
]Pr[
]Pr[]|Pr[]|Pr[
d
ccddc
prior probability]Pr[c
class-conditional probability]|Pr[ cd
a posterior probability]|Pr[ dc
a constant]Pr[d
![Page 14: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/14.jpg)
Naïve Bayes: Classification
]|Pr[argmax)( dcdc jCc j
]Pr[
]Pr[]|Pr[argmax
d
ccd jj
Cc j
]Pr[]|Pr[argmax jjCc
ccdj
as Pr[d] is a constant
How can we compute Pr[d|cj] ?
![Page 15: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/15.jpg)
Naive Bayes Assumptions
To facilitate the computation of Pr[d|cj], two simplifying assumptions are made. Conditional Independence Assumption
Given the doc’s topic, word in one position tells us nothing about words in other positions.
Positional Independence Assumption Each doc as a bag-of-words: the occurrence of word
does not depend on position.
Then Pr[d|cj] is given by the class-specific unigram language model Essentially a multinomial distribution.
![Page 16: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/16.jpg)
Unigram Language Model
0.2 the
0.1 a
0.01 man
0.01 woman
0.03 said
0.02 likes
…
Model for cj
dw
jij
i
cwcd ]|Pr[]|Pr[
the man likes the woman
0.2 0.01 0.02 0.2 0.01
multiply
01.02.002.001.02.0]|Pr[ jcd000000080.
![Page 17: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/17.jpg)
Naïve Bayes: Learning
Given the training data for each class cjC
estimate Pr[cj] (as before)
for each term wi in the vocabulary V
estimate Pr[wi|cj]
i ji
jiji T
Tcw
1
1]|Pr[
Tji: the number of occurrences of term i in documents of class cj
![Page 18: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/18.jpg)
Smoothing
Why not just use MLE?
i ji
jiji T
Tcw ]|Pr[
If a term w (in a test doc d) did not occur in the training data, Pr[w|cj] would be 0, and then Pr[d|cj] would be 0 no matter how strongly other terms in d are associated with class cj.
Add-One (Laplace) Smoothing
i ji
jiji T
Tcw
1
1]|Pr[
![Page 19: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/19.jpg)
Naïve Bayes is Not So Naïve
Fairly Effective The Bayes optimal classifier if the independence
assumptions do hold. Often performs well even if the independence
assumptions are badly violated. Usually yields highly accurate classification
(though the estimated probabilities are not so accurate).
The 1st & 2nd place in KDD-CUP 97 competition, among 16 (then) state-of-the-art algorithms.
A good dependable baseline for text classification (though not the best).
![Page 20: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/20.jpg)
Naïve Bayes is Not So Naïve
Very Efficient Linear time complexity for learning/classification. Low storage requirements.
![Page 21: Information Retrieval Lecture 4 Introduction to Information Retrieval (Manning et al. 2007) Chapter 13 For the MSc Computer Science Programme Dell Zhang](https://reader031.vdocuments.site/reader031/viewer/2022020417/5697bfab1a28abf838c9b1a2/html5/thumbnails/21.jpg)
Take Home Messages
Text Classification via Machine Learning Bayes’ Theorem Naïve Bayes
]Pr[]|Pr[argmax)( jdw
jiCc
ccwdcij
i ji
jiji T
Tcw
1
1]|Pr[
j j
jj N
Nc ]Pr[Learning
Classification
]Pr[
]Pr[]|Pr[]|Pr[
d
ccddc