ranking definitions with supervised learning methods j.xu, y.cao, h.li and m.zhao www 2005...

Post on 18-Jan-2016

220 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ranking Definitions with Supervised Learning Methods

J.Xu, Y.Cao, H.Li and M.Zhao

WWW 2005

Presenter: Baoning Wu

Motivation

People may need to find definitions of terms from Web.

Traditional information retrieval is designed to search for relevant document, not suitable for this.

Google’s definition search may suffer from relying on glossary pages and ranking in alphabetic order.

Task for definition search

Receive a query term, usually a noun.Extract definition candidates from the

document collection.Rank the candidates according to the

degree to which each one is good.Output the result.

Definition search is useful

Candidates are not all good definitions

Three categories of definitions

Good: must contain the general notion of the term and several important properties.

Bad: neither describes the general notion nor the properties of the term.

Indifferent: between good and bad.

First step: collecting candidates

Parse all sentences with a Base NP (base noun phrase) parser and identify <term> with <term> is the first Base NP of the first sentence. Two Base NPs separated by “of” or “for” are considered

as <term>

Extract definition candidates with patterns: <term> is a|an|the * <term>, *, a,|an|the * <term> is one of *

Second step: Ranking candidates

Ranking based on Ordinal Regression (ordinal classification). Ranking SVM is used.

Ranking based on classification SVM is used.

Ranking based on Ordinal Regression

Ordinal regression is a problem in which the classifiers classifies instances into a number of ordered categories.

Ranking SVM is used as the model.For each candidate x,

U(x)=wTx, where w represents a vector of weights.

The higher of U(x), the better x is as a definition

Ranking based on Classification

Only good and bad definitions are used. It is a binary classification.

SVM is used as the model.F(x)= wTx+b

Features

Removing redundant candidates

After ranking, duplicate definition may exist.

Use Edit distance to remove the one with a lower ranking score.

Sample result

Evaluation metric

Results: For intranet data

Results: For TREC.gov data

Results: for definitional sentences

Conclusions

Address the issue of searching for definitions by definition ranking.

Results are better than traditional IR.Enterprise search system has been

developed.Not limited to search of definitions.

top related