ir presentation
DESCRIPTION
chapter 9TRANSCRIPT
By Bushra Al-Za’areer
introducing
Signature File – Suffix Tree & Suffix Array
Chapter 9 Indexing & Searching
introducingSignature File – Suffix Tree & Suffix Array
1Signature File
2Suffi x Tree
3Suffi x Array
Signature File
Signature File – Suffix Tree & Suffix Array
1
Signature File chapter 9
• Consider:• H(information) = 010001• H(text) = 010010• H(data) = 110000• H(retrieval) = 100010
• The block signatures of a document D containing the text“textual retrieval and information retrieval” (after removingStop words and stemming) for a block size of two terms –would be:oB1D = 110010 andoB2D = 110011
Signature File chapter 9
To search for a given term we compare whether the term’s bit string could be “inside” the block signatures:• Consider we are searching for “text” in document Do H(text) = 010010 and B1D = 110010o H(text) bit-wise-AND B1D = 010010 = H(text)o Therefore “text” could be in B1D (it is in this particularocase)
• Consider we are now searching for “data”o H(data) bit-wise-AND B1D = 110000 = H(data)o H(data) bit-wise-AND B2D = 110000 = H(data)o Though “data” is not in either block !
• Signature files may yield false hits …
Signature File chapter 9
How to keep the probability of a false alarms low ?How to predict how good a signature is ?
o False drop occurs a document signature matches a query’s signature but the query’s word doesn’t match any word on document.
• The rate of false drop depends on:o The size of the signature.o The number of word per-block.
Signature File chapter 9
• Inverted or Signature? Inverted Files:
1. Slower retrieval2. More accurate 3. Easier to maintain
• In fact, inverted files are still the most popular storage for information retrieval.
2 Suffix Tree summary
Chapter 9
Signature File chapter 9
• Example:
3 Suffix Array summary
Chapter 9
Signature File chapter 9
• Suffix Trees and Suffix Arrays indexes see the text as one long string. Each position in the text is considered as a text suffix. Each suffix is thus uniquely identified by its position.
• Index points are selected from the text, which point to the beginning of the text positions which will be retrievable.
• This structure can be used to index words or characters.
Signature File chapter 9
• This structure can be used to index words or characters.
Signature File chapter 9
• Suffix arrays provide essentially the same functionality as suffix trees with much less space requirements.
• A suffix array is simply an array containing all the pointers to the text suffixes listed in lexicographical order.
• Suffix arrays are designed to allow binary searches done by comparing the contents of each pointer.
Signature File chapter 9
• With suffix trees and suffix arrays we can search for– Words– Prefixes & suffixes– Phrases.
? Any Question???Ask me!
Chapter 9
The most popular storage for information retrieval
inverted files…
Conclusion
What’s Your Message?Thank You