![Page 1: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/1.jpg)
Information RetrievalSearch Engine Technology
(5&6)http://tangra.si.umich.edu/clair/ir09
Prof. Dragomir R. [email protected]
![Page 2: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/2.jpg)
Final projects• Two formats:
– A software system that performs a specific search-engine related task. We will create a web page with all such code and make it available to the IR community.
– A research experiment documented in the form of a paper. Look at the proceedings of the SIGIR, WWW, or ACL conferences for a sample format. I will encourage the authors of the most successful papers to consider submitting them to one of the IR-related conferences.
• Deliverables:– System (code + documentation + examples) or Paper (+ code,
data)– Poster (to be presented in class)– Web page that describes the project.
![Page 3: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/3.jpg)
SET/IR – W/S 2009
…9. Text classification
Naïve Bayesian classifiers Decision trees…
![Page 4: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/4.jpg)
Introduction
• Text classification: assigning documents to predefined categories: topics, languages, users
• A given set of classes C• Given x, determine its class in C• Hierarchical vs. flat• Overlapping (soft) vs non-overlapping (hard)
![Page 5: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/5.jpg)
Introduction
• Ideas: manual classification using rules (e.g., Columbia AND University EducationColumbia AND “South Carolina” Geography
• Popular techniques: generative (knn, Naïve Bayes) vs. discriminative (SVM, regression)
• Generative: model joint prob. p(x,y) and use Bayesian prediction to compute p(y|x)
• Discriminative: model p(y|x) directly.
![Page 6: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/6.jpg)
Bayes formula
)()|()()|(
ApBApBpABp
Full probability
![Page 7: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/7.jpg)
Example (performance enhancing drug)
• Drug(D) with values y/n• Test(T) with values +/-• P(D=y) = 0.001• P(T=+|D=y)=0.8• P(T=+|D=n)=0.01• Given: athlete tests positive• P(D=y|T=+)=
P(T=+|D=y)P(D=y) / (P(T=+|D=y)P(D=y)+P(T=+|D=n)P(D=n)=(0.8x0.001)/(0.8x0.001+0.01x0.999)=0.074
![Page 8: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/8.jpg)
Naïve Bayesian classifiers• Naïve Bayesian classifier
• Assuming statistical independence
• Features = words (or phrases) typically
),()()|,...,(),...,|(
,...21
2121
k
kk FFFP
CdPCdFFFPFFFCdP
k
j j
k
j jk
FP
CdPCdFPFFFCdP
1
121
)(
)()|(),...,|(
![Page 9: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/9.jpg)
Example
• p(well)=0.9, p(cold)=0.05, p(allergy)=0.05– p(sneeze|well)=0.1– p(sneeze|cold)=0.9– p(sneeze|allergy)=0.9– p(cough|well)=0.1– p(cough|cold)=0.8– p(cough|allergy)=0.7– p(fever|well)=0.01– p(fever|cold)=0.7– p(fever|allergy)=0.4
Example from Ray Mooney
![Page 10: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/10.jpg)
Example (cont’d)
• Features: sneeze, cough, no fever• P(well|e)=(.9) * (.1)(.1)(.99) / p(e)=0.0089/p(e)• P(cold|e)=(.05) * (.9)(.8)(.3) / p(e)=0.01/p(e)• P(allergy|e)=(.05) * (.9)(.7)(.6) / p(e)=0.019/p(e)• P(e) = 0.0089+0.01+0.019=0.379• P(well|e)=.23• P(cold|e)=.26• P(allergy|e)=.50
Example from Ray Mooney
![Page 11: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/11.jpg)
Issues with NB• Where do we get the values – use
maximum likelihood estimation (Ni/N)• Same for the conditionals – these are based on
a multinomial generator and the MLE estimator is (Tji/Tji)
• Smoothing is needed – why?• Laplace smoothing ((Tji+1)/Tji+1))• Implementation: how to avoid floating point
underflow
)( CdP
![Page 12: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/12.jpg)
Spam recognitionReturn-Path: <[email protected]>X-Sieve: CMU Sieve 2.2From: "Ibrahim Galadima" <[email protected]>Reply-To: [email protected]: [email protected]: Tue, 14 Jan 2003 21:06:26 -0800Subject: Gooday
DEAR SIR
FUNDS FOR INVESTMENTS
THIS LETTER MAY COME TO YOU AS A SURPRISE SINCE I HADNO PREVIOUS CORRESPONDENCE WITH YOU
I AM THE CHAIRMAN TENDER BOARD OF INDEPENDENTNATIONAL ELECTORAL COMMISSION INEC I GOT YOURCONTACT IN THE COURSE OF MY SEARCH FOR A RELIABLEPERSON WITH WHOM TO HANDLE A VERY CONFIDENTIALTRANSACTION INVOLVING THE ! TRANSFER OF FUND VALUED ATTWENTY ONE MILLION SIX HUNDRED THOUSAND UNITED STATESDOLLARS US$20M TO A SAFE FOREIGN ACCOUNT
THE ABOVE FUND IN QUESTION IS NOT CONNECTED WITHARMS, DRUGS OR MONEY LAUNDERING IT IS A PRODUCT OFOVER INVOICED CONTRACT AWARDED IN 1999 BY INEC TO A
![Page 13: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/13.jpg)
SpamAssassin
• http://spamassassin.apache.org/• http://spamassassin.apache.org/
tests_3_1_x.html
![Page 14: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/14.jpg)
Feature selection: The 2 test• For a term t:
• C=class, it = feature• Testing for independence:
P(C=0,It=0) should be equal to P(C=0) P(It=0)– P(C=0) = (k00+k01)/n– P(C=1) = 1-P(C=0) = (k10+k11)/n– P(It=0) = (k00+K10)/n– P(It=1) = 1-P(It=0) = (k01+k11)/n
It0 1
C 0 k00 k011 k10 k11
![Page 15: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/15.jpg)
Feature selection: The 2 test
• High values of 2 indicate lower belief in independence.
• In practice, compute 2 for all words and pick the top k among them.
))()()(()(
0010011100011011
2011000112
kkkkkkkkkkkknΧ
![Page 16: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/16.jpg)
Feature selection: mutual information
• No document length scaling is needed• Documents are assumed to be generated
according to the multinomial model• Measures amount of information: if the
distribution is the same as the background distribution, then MI=0
• X = word; Y = class
x y yPxP
yxPyxPYXMI)()(
),(log),(),(
![Page 17: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/17.jpg)
Well-known datasets
• 20 newsgroups– http://people.csail.mit.edu/u/j/jrennie/public_html/
20Newsgroups/ • Reuters-21578
– http://www.daviddlewis.com/resources/testcollections/reuters21578/
– Cats: grain, acquisitions, corn, crude, wheat, trade…• WebKB
– http://www-2.cs.cmu.edu/~webkb/ – course, student, faculty, staff, project, dept, other– NB performance (2000)– P=26,43,18,6,13,2,94– R=83,75,77,9,73,100,35
![Page 18: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/18.jpg)
Evaluation of text classification
• Microaveraging – average over classes• Macroaveraging – uses pooled table
![Page 19: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/19.jpg)
Vector space classification
x1
x2
topic2topic1
![Page 20: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/20.jpg)
Decision surfaces
x1
x2
topic2topic1
![Page 21: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/21.jpg)
Decision trees
x1
x2
topic2topic1
![Page 22: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/22.jpg)
Classification usingdecision trees
• Expected information need
• I (s1, s2, …, sm) = - pi log (pi)
• s = data samples• m = number of classes
![Page 23: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/23.jpg)
RID Age Income student credit buys?1 <= 30 High No Fair No
2 <= 30 High No Excellent No
3 31 .. 40 High No Fair Yes
4 > 40 Medium No Fair Yes
5 > 40 Low Yes Fair Yes
6 > 40 Low Yes Excellent No
7 31 .. 40 Low Yes Excellent Yes
8 <= 30 Medium No Fair No
9 <= 30 Low Yes Fair Yes
10 > 40 Medium Yes Fair Yes
11 <= 30 Medium Yes Excellent Yes
12 31 .. 40 Medium No Excellent Yes
13 31 .. 40 High Yes Fair Yes
14 > 40 Medium no excellent no
![Page 24: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/24.jpg)
Decision tree induction
• I(s1,s2)= I(9,5) = = - 9/14 log 9/14 – 5/14 log 5/14 == 0.940
![Page 25: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/25.jpg)
Entropy and information gain
•E(A) = I (s1j,…,smj) S1j + … + smj
s
Entropy = expected information based on the partitioning intosubsets by A
Gain (A) = I (s1,s2,…,sm) – E(A)
![Page 26: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/26.jpg)
Entropy
• Age <= 30s11 = 2, s21 = 3, I(s11, s21) = 0.971
• Age in 31 .. 40s12 = 4, s22 = 0, I (s12,s22) = 0
• Age > 40s13 = 3, s23 = 2, I (s13,s23) = 0.971
![Page 27: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/27.jpg)
Entropy (cont’d)
• E (age) =5/14 I (s11,s21) + 4/14 I (s12,s22) + 5/14 I (S13,s23) = 0.694
• Gain (age) = I (s1,s2) – E(age) = 0.246
• Gain (income) = 0.029, Gain (student) = 0.151, Gain (credit) = 0.048
![Page 28: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/28.jpg)
Final decision tree
excellent
age
student credit
no yes no yes
yes
no
31 .. 40
> 40
yes fair
![Page 29: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/29.jpg)
Other techniques
• Bayesian classifiers• X: age <=30, income = medium, student =
yes, credit = fair• P(yes) = 9/14 = 0.643• P(no) = 5/14 = 0.357
![Page 30: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/30.jpg)
Example
• P (age < 30 | yes) = 2/9 = 0.222P (age < 30 | no) = 3/5 = 0.600P (income = medium | yes) = 4/9 = 0.444P (income = medium | no) = 2/5 = 0.400P (student = yes | yes) = 6/9 = 0.667P (student = yes | no) = 1/5 = 0.200P (credit = fair | yes) = 6/9 = 0.667P (credit = fair | no) = 2/5 = 0.400
![Page 31: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/31.jpg)
Example (cont’d)• P (X | yes) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044• P (X | no) = 0.600 x 0.400 x 0.200 x 0.400 = 0.019• P (X | yes) P (yes) = 0.044 x 0.643 = 0.028• P (X | no) P (no) = 0.019 x 0.357 = 0.007
• Answer: yes/no?
![Page 32: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/32.jpg)
SET/IR – W/S 2009
…10. Linear classifiers Kernel methods Support vector machines…
![Page 33: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/33.jpg)
Linear boundary
x1
x2
topic2topic1
![Page 34: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/34.jpg)
Vector space classifiers
• Using centroids• Boundary = line that is equidistant from
two centroids
![Page 35: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/35.jpg)
Generative models: knn
• Assign each element to the closest cluster• K-nearest neighbors
• Very easy to program• Tessellation; nonlinearity• Issues: choosing k, b?• Demo:
– http://www-2.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html
)(
),(),(qdkNNd
qcq ddsbdcscore
![Page 36: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/36.jpg)
Linear separators
• Two-dimensional line:w1x1+w2x2=b is the linear separator
w1x1+w2x2>b for the positive class
bxwT
• In n-dimensional spaces:
![Page 37: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/37.jpg)
Example 1
x1
x2
topic2topic1
w
![Page 38: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/38.jpg)
Example 2• Classifier for “interest”
in Reuters-21578• b=0• If the document is
“rate discount dlr world”, its score will be0.67*1+0.46*1+(-0.71)*1+(-0.35)*1= 0.05>0
Example from MSR
wi xi wi xi
0.70 prime -0.71 dlrs
0.67 rate -0.35 world
0.63 interest -0.33 sees
0.60 rates -0.25 year
0.46 discount -0.24 group
0.43 bundesbank -0.24 dlr
![Page 39: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/39.jpg)
Example: perceptron algorithm
Input:
Algorithm:
Output:
}1,1{,)),,(),...,,(( 111 i
Nnn yxyxyxS
ENDEND
1
take //mis0)( IF TO 1 FOR
0,0
1
0
kkxyww
xwyni
kw
iikk
iki
kw
![Page 40: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/40.jpg)
[Slide from Chris Bishop]
![Page 41: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/41.jpg)
Linear classifiers• What is the major shortcoming of a
perceptron?• How to determine the dimensionality of the
separator?– Bias-variance tradeoff (example)
• How to deal with multiple classes?– Any-of: build multiple classifiers for each class– One-of: harder (as J hyperplanes do not
divide RM into J regions), instead: use class complements and scoring
![Page 42: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/42.jpg)
Support vector machines
• Introduced by Vapnik in the early 90s.
![Page 43: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/43.jpg)
Issues with SVM
• Soft margins (inseparability)• Kernels – non-linearity
![Page 44: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/44.jpg)
The kernel idea
before after
![Page 45: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/45.jpg)
Example32:
),2,(),,(),( 2221
2132121 xxxxzzzxx
(mapping to a higher-dimensional space)
![Page 46: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/46.jpg)
The kernel trick
)',(',)',''2,')(,2,()'(),( 22221
21
2221
21 xxkxxxxxxxxxxxx T
dcxxxxk ))',()',(
)',tanh()',( xxkxxk
))2/('exp()',( 22 xxxxk
Polynomial kernel:
Sigmoid kernel:
RBF kernel:
Many other kernels are useful for IR:e.g., string kernels, subsequence kernels, tree kernels, etc.
![Page 47: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/47.jpg)
SVM (Cont’d)
• Evaluation:– SVM > knn > decision tree > NB
• Implementation– Quadratic optimization– Use toolkit (e.g., Thorsten Joachims’s
svmlight)
![Page 48: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/48.jpg)
Semi-supervised learning
• EM• Co-training• Graph-based
![Page 49: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/49.jpg)
Exploiting Hyperlinks – Co-training
• Each document instance has two sets of alternate view (Blum and Mitchell 1998)– terms in the document, x1– terms in the hyperlinks that point to the document, x2
• Each view is sufficient to determine the class of the instance– Labeling function that classifies examples is the
same applied to x1 or x2– x1 and x2 are conditionally independent, given the
class
[Slide from Pierre Baldi]
![Page 50: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/50.jpg)
Co-training Algorithm
• Labeled data are used to infer two Naïve Bayes classifiers, one for each view
• Each classifier will– examine unlabeled data – pick the most confidently predicted positive and
negative examples– add these to the labeled examples
• Classifiers are now retrained on the augmented set of labeled examples
[Slide from Pierre Baldi]
![Page 51: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/51.jpg)
Conclusion
• SVMs are widely considered to be the best method for text classification (look at papers by Sebastiani, Christianini, Joachims), e.g. 86% accuracy on Reuters.
• NB also good in many circumstances
![Page 52: Information Retrieval Search Engine Technology (5&6) Prof. Dragomir R. Radev](https://reader036.vdocuments.site/reader036/viewer/2022062503/5a4d1b027f8b9ab059986de7/html5/thumbnails/52.jpg)
Readings
• MRS18• MRS17, MRS19