peer-to-peer search that works, djoerd hiemstra

53
PEER-TO-PEER SEARCH THAT WORKS Djoerd Hiemstra http://www.cs.utwente.nl/~hiemstra Yandex, Moscow, 27 April 2011

Upload: yaevents

Post on 05-Dec-2014

3.169 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Peer-to-peer search that works, Djoerd Hiemstra

PEER­TO­PEER SEARCH THAT WORKS

Djoerd Hiemstrahttp://www.cs.utwente.nl/~hiemstra

Yandex, Moscow, 27 April 2011

Page 2: Peer-to-peer search that works, Djoerd Hiemstra

2/50

WHAT DOES A SEARCH ENGINE 

LOOK LIKE?

?

Page 3: Peer-to-peer search that works, Djoerd Hiemstra

3/50

  A DATA CENTER...?

Goose Creak, California

Page 4: Peer-to-peer search that works, Djoerd Hiemstra

4/50

  A DATA CENTER...?

Page 5: Peer-to-peer search that works, Djoerd Hiemstra

5/50

  A DATA CENTER...?

In Eemshaven... ? Biggest data center in Europe 100,000 servers,  19000 m2,  Uses electricity equal to 80.000 households 

Page 6: Peer-to-peer search that works, Djoerd Hiemstra

6/50

  A DATA CENTER...?

… where the * is Eemshaven?

 Close to a power plant Close to the sea (cooling!)

Page 7: Peer-to-peer search that works, Djoerd Hiemstra

7/50

WHAT ELSE DOES A SEARCH ENGINE 

LOOK LIKE?

?

Page 8: Peer-to-peer search that works, Djoerd Hiemstra

8/50

A “BIG BROTHER” ?

Page 9: Peer-to-peer search that works, Djoerd Hiemstra

9/50

A “BIG BROTHER” ?

Page 10: Peer-to-peer search that works, Djoerd Hiemstra

10/50

NO REALLY, WHAT DOES A SEARCH 

ENGINE LOOK LIKE?

?

Page 11: Peer-to-peer search that works, Djoerd Hiemstra

11/50

… FINDS WHAT YOU NEED ?

Page 12: Peer-to-peer search that works, Djoerd Hiemstra

12/50

… FINDS WHAT YOU NEED ?

Page 13: Peer-to-peer search that works, Djoerd Hiemstra

13/50

… FINDS WHAT YOU NEED ?

Page 14: Peer-to-peer search that works, Djoerd Hiemstra

14/50

SO, NOT NECESSARILY...

Green; environmentally friendly respecting privacy, objective... nor democratic.

Page 15: Peer-to-peer search that works, Djoerd Hiemstra

15/50

WHAT SHOULD A SEARCH ENGINE 

LOOK LIKE?

?

Page 16: Peer-to-peer search that works, Djoerd Hiemstra

16/50

YOUR PERSONAL SYSTEM:

Page 17: Peer-to-peer search that works, Djoerd Hiemstra

17/50

 PEER­TO­PEER SEARCH

Page 18: Peer-to-peer search that works, Djoerd Hiemstra

18/50

YOUR PERSONAL SYSTEM:

Each user brings processing power: As search consumer and search supplier

 Green! Democratic No “big brother” 

Page 19: Peer-to-peer search that works, Djoerd Hiemstra

19/50

 PEER­TO­PEER SEARCH

Moscow

Results for “Moscow”

Page 20: Peer-to-peer search that works, Djoerd Hiemstra

20/50

 PEER­TO­PEER SEARCH

RuSSIRGo to peer 74

Page 21: Peer-to-peer search that works, Djoerd Hiemstra

21/50

 PEER­TO­PEER SEARCH

RuSSIRGo to peer 74R

uSS

IRR

esul

ts fo

r “R

uSS

R”

Page 22: Peer-to-peer search that works, Djoerd Hiemstra

22/50

 PEER­TO­PEER SEARCH

RuSSIR

Go to peer 2

Page 23: Peer-to-peer search that works, Djoerd Hiemstra

23/50

 PEER­TO­PEER SEARCH

RuSSIR

Results for “R

uSSR”

RuSSIR

Go to peer 2

Page 24: Peer-to-peer search that works, Djoerd Hiemstra

24/50

OVERVIEW

1. Caching in P2P networks 

2. Query­based sampling using snippets

3. Deep web querying

Page 25: Peer-to-peer search that works, Djoerd Hiemstra

25/50

P2P LOAD BALANCING BY CACHING

If you do not index documents, cache them!

Handles query bursts: (e.g., “michael jackson's death”)

Page 26: Peer-to-peer search that works, Djoerd Hiemstra

26/50

QUERY LOG & CACHING POTENTIAL

Page 27: Peer-to-peer search that works, Djoerd Hiemstra

27/50

SHARE RATIOS

Page 28: Peer-to-peer search that works, Djoerd Hiemstra

28/50

CACHE SIZES

Page 29: Peer-to-peer search that works, Djoerd Hiemstra

29/50

EFFECT OF TEXT PROCESSING

Page 30: Peer-to-peer search that works, Djoerd Hiemstra

30/50

CHURN

Page 31: Peer-to-peer search that works, Djoerd Hiemstra

31/50

DISCUSSION

About 55 % from cache in ideal case About 78 % from cache with subsumption, 

stemming, etc. About 33 % from cache if bounded cache 

and churn (but no subsumption)

Page 32: Peer-to-peer search that works, Djoerd Hiemstra

32/50

OVERVIEW

1. Caching in P2P networks

2. Query­based sampling using snippets 

3. Deep web querying

Page 33: Peer-to-peer search that works, Djoerd Hiemstra

33/50

QUERY­BASED SAMPLING

Never download any documents Instead, use the search results 

snippets to learn about documents

Page 34: Peer-to-peer search that works, Djoerd Hiemstra

34/50

DO SAMPLES                      RESEMBLE THE FULL INDEX?

Page 35: Peer-to-peer search that works, Djoerd Hiemstra

35/50

DO SAMPLES                      RESEMBLE THE FULL INDEX?

Page 36: Peer-to-peer search that works, Djoerd Hiemstra

36/50

DO SAMPLES                      RESEMBLE THE FULL INDEX?

Page 37: Peer-to-peer search that works, Djoerd Hiemstra

37/50

CAN WE DO             BETTER THAN RANDOM?

Page 38: Peer-to-peer search that works, Djoerd Hiemstra

38/50

CAN WE DO             BETTER THAN RANDOM?

Page 39: Peer-to-peer search that works, Djoerd Hiemstra

39/50

DISCUSSION

1. Sampling snippets is as effective as sampling full documents

2. Can be done at no extra costs(!)3. Random sampling is an effective strategy

Page 40: Peer-to-peer search that works, Djoerd Hiemstra

40/50

OVERVIEW

1. Caching in P2P networks

2. Query­based sampling using snippets

3. Deep web querying 

Page 41: Peer-to-peer search that works, Djoerd Hiemstra

41/50

DEEP WEB QUERYING

Opportunity: while we are sending queries to search engines directly...… we might as well                                   search the deep web!

Page 42: Peer-to-peer search that works, Djoerd Hiemstra

42/50

YOUR  TYPICAL DEEP WEB SITEYOUR  TYPICAL DEEP WEB SITEhttp://www.ns.nlhttp://www.ns.nl

Page 43: Peer-to-peer search that works, Djoerd Hiemstra

43/50

NATURAL LANGUAGE QUERYING

Page 44: Peer-to-peer search that works, Djoerd Hiemstra

44/50

EASY TO SPECIFY

Page 45: Peer-to-peer search that works, Djoerd Hiemstra

45/50

 USER STUDY

Page 46: Peer-to-peer search that works, Djoerd Hiemstra

46/50

 USER STUDY

Page 47: Peer-to-peer search that works, Djoerd Hiemstra

47/50

USER STUDY

A = fromB = toV = viaD = dateT = time

Page 48: Peer-to-peer search that works, Djoerd Hiemstra

48/50

DISCUSSION

1. Users like the interface2. Users perform the tasks faster3. Considerable query variation between 

subjects: No “one size fits all”!

Page 49: Peer-to-peer search that works, Djoerd Hiemstra

49/50

CONCLUSIONS

Peer­to­peer is a viable approach to large scale search

Peer­to­peer search will make Google, Yahoo, Bing and Yandex irrelevant ;­)

Page 50: Peer-to-peer search that works, Djoerd Hiemstra

50/50

PUBLICATIONS Almer Tigelaar, Djoerd Hiemstra, and Dolf Trieschnigg, Search 

Result Caching in P2P Information Retrieval Networks, Proceedings of the 2nd Information Retrieval Facility Conference (IRFC), 2011.

Almer Tigelaar and Djoerd Hiemstra, Query­Based Sampling using Snippets, In Proceedings of the SIGIR 2010 Workshop on Large­Scale Distributed Systems for Information Retrieval, 2010.

Kien Tjin­Kam­Jet, Dolf Trieschnigg, and Djoerd Hiemstra, Free­Text Search versus Complex Web Forms, Proceedings of the European Conference on Information Retrieval (ECIR), 2011.

Page 51: Peer-to-peer search that works, Djoerd Hiemstra

51/50

ACKNOWLEDGEMENTS

             Netherlands Organization for Scientific Research

Almer Tigelaar Kien Tjin­Kam­Jet Dolf Trieschnigg

Page 52: Peer-to-peer search that works, Djoerd Hiemstra

52/50

Page 53: Peer-to-peer search that works, Djoerd Hiemstra

53/50

“MAIL” RESULTS FROM YANDEX ?