1/ 30. problems for classical ir models introduction & background(lsi,svd,..etc) example...
Post on 21-Dec-2015
216 views
TRANSCRIPT
• Problems for classical IR models• Introduction & Background(LSI,SVD,..etc)• Example• Standard query method• Analysis standard query method• Seeking the best• Experimental results• SVR Vs. IRR• SVR• Conclusion• Future work
2/ 30
Problems for classical IR models
Synonymy: Various words and phrases refer to the same concept (lowers recall).
Polysemy: Individual words have more than one meaning (lowers precision)
Independence: No significance is given to two terms that frequently appear together
4/ 30
Latent Semantic Analysis• General idea– Map documents (and terms) to a low-dimensional
representation.– Design a mapping such that the low-dimensional
space reflects semantic associations (latent semantic space).
– Compute document similarity based on the inner product in the latent semantic space.
• Goals– Similar terms map to similar location in low
dimensional space.– Noise reduction by dimension reduction.
5/ 30
Vector Model
6/ 30
• Set of document:• A finite set of terms :• Every document can be
displayed as vector:• the same to the query:• Similarity of query q and
document d:• Given a threshold , all
documents with similarity > threshold are retrieved
1 2{ , ,..., }mD D D D
1 2{ , ,..., }nT t t t
1 2( , ,..., )j j j njd w w w
1 2( , ,..., )q q nqq w w w
i
j
dj
q
( , )( , ) cos( )
(|| || * || ||)
d qsimilarity q d
d q
SVD and low-rank approximations
•This optimality property of very useful in, e.g., Principal Component Analysis (PCA), LSI, etc.
Truncate the SVD by keeping n ≤ k terms:
7/ 30
orthogonal matrix containing the top k left (right) singular vectors of A.
orthogonal matrix containing the top k left (right) singular vectors of A.
diagonal matrix containing the top k singular values of A. ordered non-increasingly.
rank of A, the number of non-zero singular values.
diagonal matrix containing the top k singular values of A. ordered non-increasingly.
rank of A, the number of non-zero singular values.
the “best” matrix among all rank-k matrices wrt. to the spectral and Frobenius norms
the “best” matrix among all rank-k matrices wrt. to the spectral and Frobenius norms
• TREC-4 data set.” http://trec.nist.gov/ ”• randomly chose 5305 documents.• tested with 20 queries.• Stemming “Porter Stemmer” and stop-word were used.”
http://www.tartarus.org/~martin/PorterStemmer/”;” http://www.lextek.com/manuals/onix/stopwords1.html”
• term-by-document matrix was of dimension 16,571 x 5305 and was determined to have a full rank of 5305 through the SVD process.
15/ 30
• T , measuring the area covered between the IRP curve and the horizontal axis of Recall and representing the average interpolated precision over the full range ([0, 1]) of recall
16/ 30
19/ 30
SVR (singular value rescaling) IRR (iterative residual rescaling)
scaled the singular values in matrix S the residual vectors
technique built on top of SVD. independent of SVD
# of scalingonly once (though some trial-and-error process) to produce all the
basis vectors
multiple times, the #of which is determined by the #of desired basis vectors; each time, the
scaling of residual vectors produces the next basis vector
• Mathematical analysis showed that:– The difference between the results of version A and
version B is a factor of S2 with S being the diagonal matrix of singular values in the dimension-reduced model.
– The retrieval results from version B and version B’ are always identical if the Equivalency Principle is satisfied.
– Version B (B’) should be a better option than version A.
23/ 30
• Experiments on standardized TREC data set confirmed that:
– 5.9% The improvement ratio of Using SVR in addition to the conventional LSI over using the conventional LSI alone.
– SVR is computationally as efficient as the best standard
query method ”Version B”. – SVR performs better than IRR.
24/ 30
• Applying SVR to other fields of IR such as image retrieval and video/audio retrieval.
• Seeking mathematical justification of SVR, including the relationship between the optimal rescaling factor S_exp and the characteristics of any particular data set.
25/ 30