applications of data structures and...

19
1 Applications of Data Structures and Algorithms Danfeng Yao CS 16 3/13/2006

Upload: others

Post on 26-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

1

Applications of Data Structures and Algorithms

Danfeng YaoCS 16

3/13/2006

Page 2: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

2

Overview

Data structures in web interfaceGoogle – Indexing– PageRanking– Crawling

Page 3: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

3

Forward/backward buttons of Browser

Page 4: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

4

Forward/backward buttons and Stack

www.brown.edu

Back Forward

Page 5: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

5

Forward/backward buttons and Stack

www.dam.brown.edu

www.cs.brown.edu

www.brown.edu

www.engin.brown.edu

Back Forward

Old pages

Page 6: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

6

Then click the Back button once

www.cs.brown.edu

www.engin.brown.edu

www.brown.edu

www.dam.brown.edu

Back Forward

Page 7: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

7

Then click the Forward button once

www.dam.brown.edu

www.cs.brown.edu

www.engin.brown.edu

www.brown.edu

ForwardBack

Page 8: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

8

Planar point location: where is the mouse click?

Page 9: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

9

Simple point location: Binary decision tree

Build a binary tree– internal nodes corresponding to line segments – external nodes corresponding to regions

a

b

c

a

b

d

below

c

d

below

left

above

above

left

right

right

Page 10: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

10

The need for search engines to scale up What a search engine faces– Storage for index files (and maybe

documents themselves too)– Index system processes hundreds of

gigabytes– Queries at a rate of thousands per

secondAdvances in hardware technology– Faster CPU– Cheaper memory and disk space

But still, slow disk seek (~10ms) and operating system instabilityWhat will make today’s search engine scale?

Page 11: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

11

Google facts

Will be on the stock market soon – Estimated annual profit $150 million to

$350 million– Estimated annual revenue $500 million to

$1 billion– Estimated market value $12 billion to $20

billion

The heart of Google software is PageRankTM

Google has integrity– No one can buy a higher PageRank

Sergey Brin

Larry Page

Page 12: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

12

Data structures in Google– Compact data structures – Avoid disk seeks whenever possible

Data structures for indexing– Link structures– Inverted index

Page rankingCrawling

Google’s approach

Page 13: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

13

Web as a directed graph

(Brown, Brown CS)

(Brown CS, CS16)

(Brown CS, rt)

A hyperlink is an edge

A web page is a vertex

Page 14: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

14

Storing the link structures

Want to maintain the link relationship of two pages– Used for crawling, ranking, …

Main problem: how to store the set of pairs efficientlyURL is too long and has variable length– Storing (URLi, URLj) has too

much overhead and is slowUse a more compact docID and support fast docID/URLconversion docID URL

http://www.cs.brown.edu/people/rt/

Page 15: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

15

Forward index and inverted index

vitae securityinformationdesignalgorithm

algorithm

vitae graphdrawingdesignalgorithm

Forward

Inverted

graph

Hit 3: algorithmHit 2: algorithmHit 1: algorithm

Page 16: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

16

PageRanking: bringing order to the web

Page 17: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

17

GoogleBot: where to crawl?Through addURL forms or linksDeep crawling– BFS or DFS?– Rumored to crawl in

PageRank orderFresh crawling– Recrawling to keep index

updated

Page 18: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

18

More Google facts

Free lunch every day!Bring pets to workOn-site massage

Page 19: Applications of Data Structures and Algorithmscs.brown.edu/courses/cs016/2010/Resource/old_lectures/Google.pdf · The need for search engines to scale up What a search engine faces

19

Bibliography

The Anatomy of a Large-Scale Hypertextual Web Search Engine. Sergey Brin and Lawrence PageThe PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page