hyper search ing the web soumen chakrabarti, byron dom, s. ravi kumar, prabhakar raghavan, sridhar...

20
Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS 572 (Spring 2011) | Class Presentation | June 21, 2011

Upload: ulysses-finnie

Post on 29-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Hypersearching the Web

Soumen Chakrabarti, Byron Dom, S. Ravi Kumar,Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins

Jacob Kalakal JosephCS 572 (Spring 2011) | Class Presentation | June 21, 2011

Page 2: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Outline• Characteristics of the WWW• Motivation for building search engines• Traditional SEs and the challenges• Improvements the associated problems• CLEVER• Power of hyperlinks• Hubs and Authorities• Algorithm• Evaluate CLEVER• Future scope• Answer questions and class discussion

Page 3: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

WWW ~ Universe

Page 4: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Motivation for search engines

Page 5: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Initial Attempts

• Ranking functions based on simple heuristics

Page 6: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Challenges: Synonymy

Page 7: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Challenges: Polysemy

Page 8: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Challenges: Spamming

• Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets Cheap airtickets

• White font on White background

Page 9: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Improvements

Semantic Networks Human selectors

Helps synonymy but worsens polysemy Impractical

Page 10: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Hyperlinks - What a CLEVER idea!

Page 11: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Hubs & Authorities

Page 12: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

How it works

Page 13: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Clever vs. Google

Google’s faster! Clever looks back also

Page 14: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Pros

• Rapid convergence (5 iterations for root set of 3000 pages)• Independent of the initial H, A scores• Get info even before we actually crawl

Page 15: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Segregation of web into clusters

Page 16: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Cons

• The underlying assumption – “Web links confer authority” – could be incorrect!– Navigation

– Advertisement

– Disapproval

Page 17: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Cons

• Ignores the Anchor text• It is not necessary for every page to be either

a hub or an authority• Universally popular Websites like Wikipedia

will be an authority on almost everything• May return a General result for a Narrow topic

search

Page 18: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

What’s next?

Page 19: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

References• S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar,

P. Raghavan, S. Rajagopalan, A. Tomkins,Hypersearching the Web. Scientific American, June 1999.

• CLEVER project (http://www.almaden.ibm.com/projects/clever.shtml)

• J. Kleinberg.Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998

• S. Brin, L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems. Vol. 30, No. 1-7, pp. 107-117, 1998.

• WordNet Project (http://wordnet.princeton.edu/)

Page 20: Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS

Group Discussion