wikipedia as sence inventory to improve diversity in web search results
DESCRIPTION
Wikipedia as Sence Inventory to Improve Diversity in Web Search Results. Celina Santamar´ıa, Julio Gonzalo and Javier Artiles UNED , c/Juan del Rosal, 16, 28040 Madrid, Spain ( National University of Distance Education ) ACL 2010. Outline. Abstract Motivation Test Set Set of Words - PowerPoint PPT PresentationTRANSCRIPT
WIKIPEDIA AS SENCE INVENTORY TO IMPROVE DIVERSITY IN WEB SEARCH RESULTSCelina Santamar´ıa, Julio Gonzalo and Javier ArtilesUNED, c/Juan del Rosal, 16, 28040 Madrid, Spain(National University of Distance Education)ACL 2010
OUTLINE
Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
ABSTRACT
Is it possible to use sense inventories to improve Web search results diversity for one word queries? focus on two broad-coverage lexical resources
WordNet Wikipedia
results indicate that Wikipedia has a much better coverage distribution of senses result =>
internal graph structure of wikipedia and the relative number of visits received by each sense in
wikipedia associating Web pages to Wikipedia senses
we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings.
MOTIVATION
Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
MOTIVATION
The application of Word Sense Disambiguation (WSD) to Information Retrieval (IR) has been subject of a significant research effort in the recent past.
In practice, however, there are two main difficulties (i) for long queries (ii) for very short queries
FOR VERY SHORT QUERIES
for very short queries one word disambiguation may not be possible
In Web search, there are at least three ways to be able to deal with ambiguity: Promoting diversity in the search results
(Clarke et al., 2008) Search engines are supposed to handle diversity as one of
the multiple factors that influence the ranking. Presenting the results as a set of (labelled) clusters
rather than as a ranked list t (Carpineto et al., 2009)
Complementing search results with search suggestions (Anick, 2003).
FOR VERY SHORT QUERIES
All of them rely on the ability of the search engine to cluster search results, detecting topic similarities.
In all of them, disambiguation is implicit, a side effect of the process but not its explicit target. Example :
Clustering may detect that documents about the Oasis band and the Oasis fashion store deal with unrelated topics, but it may as well detect a group of documents discussing why one of the Oasis band members is leaving the band, and another group of documents about Oasis band lyrics;
both are different aspects of the broad topic Oasis band.
A perfect hierarchical clustering should distinguish between the different Oasis senses at a first level, and then discover different topics within each of the senses.
QUESTION
Is it possible to use sense inventories to improve Web search results diversity for one word queries? focus on two broad-coverage lexical resources
WordNet Wikipedia
Hypothesis: any of the above mechanisms might benefit from an
explicit disambiguation using a wide-coverage sense inventory.
FOUR RELEVANT ASPECT OF THE PROBLEM Coverage:
Are Wikipedia/Wordnet senses representative of search results?
If the answer to (1) is positive: can we estimate search results diversity using our sense
inventories? Sense frequencies:
Is it possible to estimate Web sense frequencies from currently available information?
Classification: The association of Web pages to word senses must be
done with some unsupervised algorithm Can this classification be done accurately? Can it be effective to promote diversity in search results?
TEST SET
Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
TEST SET
Set of Words Set of Documents Manual Annotation
SET OF WORDS
one-word query focus on nouns which are used to denote one
or more named entities(NE)
CORPUS ANNOTATION
Corpus annotation two annotator handle 40 nouns
15 nouns from the Senseval-3 lexical sample dataset 25 additional words which satisfy two conditions:
they are all ambiguous, and they are all names for music bands in one of their
senses
CORPUS
The Senseval set is : {argument, arm, atmosphere, bank, degree,
difference, disc, image, paper, party, performance, plan, shelter, sort, source}.
The bands set is : {amazon, apple, camel, cell, columbia, cream,
foreigner, fox, genesis, jaguar, oasis, pioneer, police, puma, rainbow, shell, skin, sun, tesla, thunder, total, traffic, trapeze, triumph, yes}
TABLE 1: COVERAGE OF SEARCH RESULTS: WIKIPEDIA VS. WORDNET
For each noun, we looked up all its possible senses in WordNet 3.0 and in Wikipedia (using Wikipedia disambiguation pages).
Wikipedia has an average of 22 senses per noun (25.2 in the Bands set and 16.1 in the Senseval
set) Wordnet a much smaller figure, 4.5
(3.12 for the Bands set and 6.13 for the Senseval set).
SET OF DOCUMENTS
Step 1: We retrieved the 150 first ranked documents for
each noun, by submitting the nouns as queries to a Web search engine (Google).
Step 2: for each document, we stored both the snippet
and whole HTML document
assume a ”one sense per document”
MANUAL ANNOTATION
Wordnet and Wikipedia sense The annotators had to decide, for every
document, whether there was one or more appropriate senses in each of the dictionaries. Two annotators
They provide annotations for 100 documents per noun 150 -> 100 documents per noun If an URL in the list was corrupt or not available,
it had to be discarded Each judge provided annotations for the
4,000 documents in the final data set.
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
COVERAGE OF WEB SEARCH RESULTS:WIKIPEDIA VS WORDNET Top 100
Wikipedia senses cover much more search results(56%) than Wordnet (32%).
Top 10 Wikipedia covers 68%
THE TOP TEN RESULT ARE NOT COVER BY WIKIPEDIA manually examined all documents in the top
ten results that are not covered by Wikipedia a majority of the missing senses consists of
names of (generally not well-known) companies (45%) products or services (26%);
The other frequent type (12%) of non annotated document is disambiguation pages
DEGREE OF OVERLAP BETWEEN WIKIPEDIA AND WORNNET SENSES
Being two different types of lexical resource, they might have some degree of complementarity.
just 3% fit wordnet only. Therefore, Wikipedia seems to extend the
coverage of Wordnet rather than providing complementary sense information.
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
DIVERSITY IN GOOGLE SEARCH RESULTS
Once we know that Wikipedia senses are a representative subset of actual Web senses, we can test how well search results respect diversity in terms of this subset of senses.
TABLE 3
diversity is not a major priority for ranking results the top ten results only cover, in average, 3
Wikipedia senses average number of senses listed in Wikipedia is 22
First 100 documents, this number grows up to 6.85 senses per noun.
THE MOST FREQUENT SENSE FOR EACH WORD in average, 63% of the pages in search
results belong to the most frequent sense of the query word
This is roughly comparable with most frequent sense figures in standard annotated corpora such as Semcor (Miller et al., 1993) and the Senseval/Semeval data sets
diversity may not play a major role in the current Google ranking algorithm.
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
SENSE FREQUENCY ESTIMATORS FOR WIKIPEDIA Wikipedia disambiguation don’t contain the
relative importance of senses for a given word. Internal relevance
incoming links for the URL of a given sense in Wikipedia.
stable External relevance
number of visits for the URL of a given sense (as reported in http://stats.grok.se).
Not stable
MEASURED CORRELATION
for each noun w and for each sense wi, we consider three values: proportion of documents retrieved for w which
are manually assigned to each sense inlinks(wi):
Relative amount of incoming links to each sense wi
visits(wi): relative number of visits to the URL for each sense wi.
MEASURED CORRELATION
We have measured the correlation between these three values using a linear regression correlation coefficient, correlation value of .54 for the number of visits correlation value of .71 for the number of
incoming links. Both estimators seem to be positively correlated
MEASURED CORRELATION
We have experimented with weighted combinations of both indicators, using weights of the form (k, 1−k), k 2 {0, 0.1, 0.2 . . . 1}, reaching a maximal correlation of .73 for the following weights:
This weighted estimator provides a slight advantage over the use of incoming links only (0.73 vs 0.71)
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
ASSOCIATION OF WIKIPEDIA SENSES TOWEB PAGES We want to test whether the information
provided by Wikipedia can be used to classify search results accurately.
Given a Web page p returned by the search engine for the query w, and the set of senses w1 . . .wn listed in Wikipedia, the task is to assign the best candidate sense to p.
TWO DIFFERENT TECHNIQUES
Two different techniques Vector Space Model (VSM) WSD system
Two baselines A random assignment of senses
Precision : inverse of the number of senses, for every test case
A most frequent sense heuristic Assigns the same sense to all documents.
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
VSM APPROACH
For each word sense represent its Wikipedia page in a (unigram)
vector space model Assigning standard tf*idf weights to the words in
the document idf weights are computed in two different
ways VSM :
IDF in the collection of retrieved documents VSM-GT:
uses the statistics provided by the Google Terabyte collection
VSM-mixed: VSM + VSM-GT
VSM APPROACH
The document p is represented in the same vector space as the Wikipedia senses, and it is compared with each of the candidate senses wi via the cosine similarity metric The sense with the highest similarity to p is
assigned to the document In case of ties (which are rare), we pick the first
sense in the Wikipedia disambiguation page
VSM-GT+FREQ
VSM-GT+freq we consider those cases where two or more
senses have a similar score in particular, all senses with a score greater or equal
than 80% of the highest score we pick up the one which has the largest frequency
according to our estimator Applied VSM-GT
WSD APPROACH
TiMBL a state-of-the-art supervised WSD system uses Memory-Based Learning. TiMBL-core
uses only the examples found in the page for the sense being trained.
TiMBL-inlinks uses the examples found in Wikipedia pages pointing
to the sense being trained. TiMBL-all
uses both sources of examples.
TIMBL
first : disambiguate all occurrences of w in the page p.
Then : we choose the sense which appears most
frequently in the page according to TiMBL results.
In case of ties : pick up the first sense listed in the Wikipedia
disambiguation page.
TIMBL-CORE+FREQ
(i) when there is a tie between two or more senses we pick up the sense with the highest frequency
according to our estimator (ii) when no sense reaches 30% of the cases
in the page to be disambiguated we also resort to the most frequent sense
heuristic
TABLE 4:
Precision: the number of
pages correctly classified divided by the total number of predictions.
TABLE 4:
All systems are significantly better than the random and most frequent sense baselines
Overall, both approaches lead to similar results (.67 vs. .69), which would make VSM preferable because it is a simpler and more efficient approach. (p = 0.066 according to the t-test). VSM is unsupervised WSD is supervised
FIGURE 1
solid line: 40 noun Excluding non-assigned document
dotted line: the algorithm attempts to disambiguate all
documents retrieved by Google, whether they can in fact be assigned to a Wikipedia sense or not.
COVERAGE OF WEB SEARCH RESULTS: WIKIPEDIA VS WORDNET Abstract Motivation Test Set
Set of Words Set of Documents Manual Annotation
Coverage of Web Search Results: Wikipedia vs Wordnet Diversity in Google Search Results Sense Frequency Estimators for Wikipedia Association of Wikipedia Senses to Web Pages
VSM Approach WSD Approach Classification Results Precision/Coverage Trade-off Using Classification to Promote Diversity
Related Work Conclusions
USING CLASSIFICATION TO PROMOTE DIVERSITY we fill each position in the rank (starting at
rank 1), with the document which has the highest similarity to some of the senses which are not yet represented in the rank;
ALTERNATIVE RANKING FOR COMPARISON
clustering (centroids): this method applies Hierarchical Agglomerative Clustering
– which proved to be the most competitive clustering algorithm in a similar task (Artiles et al., 2009)
clustering (top ranked): this time the top ranked document (in the original Google
rank) of each cluster is selected. random:
Randomly selects ten documents from the set of retrieved results.
upper bound: coverage is not 100% because some words have more than ten meanings in
Wikipedia and we are only considering the top ten documents.
Coverage : the ratio of senses that appear in the top ten results
compared to the number of senses that appear in all search results.
Coverage of senses going from 49% to 70% the coverage of Wikipedia senses in the top ten
results is 70% larger than in the original ranking
Using Wikipedia to enhance diversity seems to work much better than clustering
bias only Wikipedia senses are considered to estimate
diversity. our results do not imply that the Wikipedia
modified rank is better than the original Google rank.
Wikipedia can be used as a reference to improve search results diversity for one-word queries.
CONCLUSIONS
We have investigated whether generic lexical resources can be used to promote diversity in Web search results for one-word, ambiguous queries. We have compared WordNet and Wikipedia (i) unsurprisingly, Wikipedia has a much better
coverage of senses in search results, and is therefore more appropriate for the task;
(ii) the distribution of senses in search results can be estimated using the internal graph structure of the Wikipedia and the relative number of visits received by each sense in Wikipedia
(iii) associating Web pages to Wikipedia senses with simple and efficient algorithms, we can produce modified rankings that cover 70% more Wikipedia senses than the original search engine rankings.