ranking definitions with supervised learning methods j.xu, y.cao, h.li and m.zhao www 2005...
TRANSCRIPT
Ranking Definitions with Supervised Learning Methods
J.Xu, Y.Cao, H.Li and M.Zhao
WWW 2005
Presenter: Baoning Wu
Motivation
People may need to find definitions of terms from Web.
Traditional information retrieval is designed to search for relevant document, not suitable for this.
Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order.
Task for definition search
Receive a query term, usually a noun.Extract definition candidates from the
document collection.Rank the candidates according to the
degree to which each one is good.Output the result.
Definition search is useful
Candidates are not all good definitions
Three categories of definitions
Good: must contain the general notion of the term and several important properties.
Bad: neither describes the general notion nor the properties of the term.
Indifferent: between good and bad.
First step: collecting candidates
Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with <term> is the first Base NP of the first sentence. Two Base NPs separated by “of” or “for” are considered
as <term>
Extract definition candidates with patterns: <term> is a|an|the * <term>, *, a,|an|the * <term> is one of *
Second step: Ranking candidates
Ranking based on Ordinal Regression (ordinal classification). Ranking SVM is used.
Ranking based on classification SVM is used.
Ranking based on Ordinal Regression
Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories.
Ranking SVM is used as the model.For each candidate x,
U(x)=wTx, where w represents a vector of weights.
The higher of U(x), the better x is as a definition
Ranking based on Classification
Only good and bad definitions are used. It is a binary classification.
SVM is used as the model.F(x)= wTx+b
Features
Removing redundant candidates
After ranking, duplicate definition may exist.
Use Edit distance to remove the one with a lower ranking score.
Sample result
Evaluation metric
Results: For intranet data
Results: For TREC.gov data
Results: for definitional sentences
Conclusions
Address the issue of searching for definitions by definition ranking.
Results are better than traditional IR.Enterprise search system has been
developed.Not limited to search of definitions.