ir the power of “failing”. ttt 2 not perfectly true but

15
IR The power of “failing

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IR The power of “failing”. TTT 2 Not perfectly true but

IR

The power of “failing”

Page 2: IR The power of “failing”. TTT 2 Not perfectly true but
Page 3: IR The power of “failing”. TTT 2 Not perfectly true but
Page 4: IR The power of “failing”. TTT 2 Not perfectly true but
Page 5: IR The power of “failing”. TTT 2 Not perfectly true but
Page 6: IR The power of “failing”. TTT 2 Not perfectly true but
Page 7: IR The power of “failing”. TTT 2 Not perfectly true but

TTT 2

Page 8: IR The power of “failing”. TTT 2 Not perfectly true but

Not perfectly true but...

Page 9: IR The power of “failing”. TTT 2 Not perfectly true but

0

0,01

0,02

0,03

0,04

0,05

0,06

0,07

0,08

0,09

0,1

0 1 2 3 4 5 6 7 8 9 10

Fa

lse

po

siti

ve

rate

Hash functions

m/n = 8Opt k = 5.45...

We do have an

explicit formula

for the optimal k

Page 10: IR The power of “failing”. TTT 2 Not perfectly true but
Page 11: IR The power of “failing”. TTT 2 Not perfectly true but
Page 12: IR The power of “failing”. TTT 2 Not perfectly true but

Other advantage: no key storage

Page 13: IR The power of “failing”. TTT 2 Not perfectly true but

Crawling

What data structures should we use to keep

track of the visited URLs of a crawler?

URLs are long

Check should be very fast

No care about small errors (≈ page not crawled)

Bloom Filter

over crawled URLs

Page 14: IR The power of “failing”. TTT 2 Not perfectly true but

Anti-virus detection

D is a dictionary of virus-checksum of some given length z. For each position i, check…

Brute-force check: O( |D| * |F| ) time Trie check: O( z * |F| ) time Better Solution ?

Build a BF on D.

Check T[i,i+z-1] є D, if BF answers YES

then “warn the user” or explicitly scan D

FVji i+z

O(k*|F|)

or even better...

Page 15: IR The power of “failing”. TTT 2 Not perfectly true but