link counts

24
1 Link Counts Linked by 2 Important Pages Linked by 2 Unimportant pages Sep’s Home Page Taher’s Home Page Yahoo! CNN DB Pub Server CS361 GOOGLE Page Rank engine needs speedup adapted from G. Golub et al

Upload: castor-rivas

Post on 02-Jan-2016

45 views

Category:

Documents


4 download

DESCRIPTION

GOOGLE Page Rank engine needs speedup. Link Counts. Taher’s Home Page. Sep’s Home Page. CS361. DB Pub Server. CNN. Yahoo!. Linked by 2 Unimportant pages. Linked by 2 Important Pages. adapted from G. Golub et al. importance of page i. importance of page j. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Link Counts

1

Link Counts

Linked by 2 Important Pages

Linked by 2 Unimportant

pages

Sep’s Home Page

Taher’s Home Page

Yahoo! CNNDB Pub Server CS361

GOOGLE Page Rank engine needs speedup

adapted from G. Golub et al

Page 2: Link Counts

2

Definition of PageRank

The importance of a page is given by the importance of the pages that link to it.

jBj j

i xN

xi

1

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 3: Link Counts

3

Definition of PageRank

1/2 1/2 1 1

0.1 0.10.1

0.05

Yahoo!CNNDB Pub Server

Taher Sep

0.25

Page 4: Link Counts

4

PageRank Diagram

Initialize all nodes to rank

0.333

0.333

0.333

nxi

1)0(

Page 5: Link Counts

5

PageRank Diagram

Propagate ranks across links(multiplying by link weights)

0.167

0.167

0.333

0.333

Page 6: Link Counts

6

PageRank Diagram

0.333

0.5

0.167

)0()1( 1j

Bj ji x

Nx

i

Page 7: Link Counts

7

PageRank Diagram

0.167

0.167

0.5

0.167

Page 8: Link Counts

8

PageRank Diagram

0.5

0.333

0.167

)1()2( 1j

Bj ji x

Nx

i

Page 9: Link Counts

9

PageRank Diagram

After a while…

0.4

0.4

0.2

jBj j

i xN

xi

1

Page 10: Link Counts

10

Computing PageRank Initialize:

Repeat until convergence:

)()1( 1 kj

Bj j

ki x

Nx

i

nxi

1)0(

importance of page i

pages j that link to page i

number of outlinks from page j

importance of page j

Page 11: Link Counts

11

Matrix Notation

jBj j

i xN

xi

1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

.1

.3

.2

.3

.1

.1TP

x

Page 12: Link Counts

12

Matrix Notation

.1

.3

.2

.3

.1

.1

0 .2 0 .3 0 0 .1 .4 0 .1=

.1

.3

.2

.3

.1

.1

.2

xPx TFind x that satisfies:

Page 13: Link Counts

13

Power Method Initialize:

Repeat until convergence:

(k)T1)(k xPx

T(0)x

nn

1...

1

Page 14: Link Counts

14

PageRank doesn’t actually use PT. Instead, it uses A=cPT + (1-c)ET.

So the PageRank problem is really:

not:

A side note

AxxFind x that satisfies:

xPx TFind x that satisfies:

Page 15: Link Counts

15

Power Method And the algorithm is really . . .

Initialize:

Repeat until convergence:

T(0)x

nn

1...

1

(k)1)(k Axx

Page 16: Link Counts

16

Power Method

u1

1u2

2

u3

3

u4

4

u5

5

Express x(0) in terms of eigenvectors of A

Page 17: Link Counts

17

Power Method

u1

1u2

22

u3

33

u4

44

u5

55

)(1x

Page 18: Link Counts

18

Power Method)2(x

u1

1u2

222

u3

332

u4

442

u5

552

Page 19: Link Counts

19

Power Method

u1

1u2

22k

u3

33k

u4

44k

u5

55k

)(kx

Page 20: Link Counts

20

Power Method

u1

1u2

u3

u4

u5

)(x

Page 21: Link Counts

21

Why does it work?

Imagine our n x n matrix A has n distinct eigenvectors ui.

ii uAu i

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

Then, you can write any n-dimensional vector as a linear combination of the eigenvectors of A.

Page 22: Link Counts

22

Why does it work? From the last slide:

To get the first iterate, multiply x(0) by A.

First eigenvalue is 1.

Therefore:

...;1 211

n0 uuux n ...221)(

n

n

(0)(1)

uuu

AuAuAu

Axx

nn

n

...

...

22211

221

n(1) uuux nn ...2221

All less than 1

Page 23: Link Counts

23

Power Method

n0 uuux n ...221)(

u1

1u2

2

u3

3

u4

4

u5

5

u1

1u2

22

u3

33

u4

44

u5

55

n(1) uuux nn ...2221

n)( uuux 2

22221

2 ... nn u1

1u2

222

u3

332

u4

442

u5

552

Page 24: Link Counts

24

The smaller 2, the faster the convergence of the Power Method.

Convergence

n)( uuux k

nnkk ...2221

u1

1u2

22k

u3

33k

u4

44k

u5

55k