the math behind pagerank

The math behind PageRank

A detailed analysis of the mathematical aspects of PageRankComputational Mathematics class presentation

Ravi S SinhaLIT lab, UNT

Partial citations of references

• The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page

• Inside PageRank Monica Bianchini, Marco Gori, and Franco Scarselli

• Deeper Inside PageRank Amy Langville and Carl Meyer

• Efficient Computation of PageRank Taher Haveliwala

• Topic Sensitive PageRank Taher Haveliwala

Overview of the talk

• Why PageRank• What is PageRank• How PageRank is used• Math• More math• Remaining math

Why PageRank• Need to build a better automatic search engine

Why?• Human maintained lists subjective and expensive to

build (non-automatic)• Automatic engines based on keyword matching do a

horrible job (just page content is not enough; cleverly placed words in a page can mislead search engines)

• Advertisers sometimes mislead search engines

• Solution: Google [modern day: much more than PageRank; getting smarter] Exact technology: not public domain Core technology: PageRank (utilizes link structure)

• Other uses Any problem that can be visualized as a graph

problem where the centrality of the vertices needs to be computed (NLP, etc.)

What is PageRank

• A way to find the most ‘important’ vertices in a graph

• PR(A) = (1-d) + d [ PR(T1) / C(T1) + … + PR(Tn) / C(Tn) ]

• Forms a probability distribution over the vertices [sum = 1]

• How does this relate to Web search? Vertices = pages Incoming edges = hyperlinks from other pages Outgoing edges = hyperlinks to other pages

Simple visualization: the simplest variant of PageRank in use [user behavior]

Random surfer

Damping factor

Only one incoming link, yet high PageRank

Lexical Substitution: A crash course

There are different types of managed care systems

Trivial for humans, not for machines

Math, statistics, linguistics wrapped within computer programs and algorithms

Information retrieval, machine translation, question answering, information security [information hiding in text]

PageRank in use: Lexical Substitution

Weights: word similarityDirected/ undirected: whole other realm

And now, the cool stuff

The math behind PageRank

• Intuitive correctness• Mathematical foundation• Stability• Complexity of computational scheme• Critical role of the parameters involved• The distribution of the page score• Role of dangling pages• How to promote certain vertices (Web pages)

Intuitive correctness

• Concept of ‘voting’ Related to citation in scientific literature More citations indicate great/ important piece of

• Random surfer / random walk• A page with many links to it must be important• A very important page must point to something

equally important

Mathematical foundation

• Most researchers: Markov chains Caveat: Only applicable in absence of dangling nodes

• Basic idea: authority of a Web page unrelated to its contents [comes from the link structure]

• Simple representation

• Vector representation

NIdxWdx )1(

IN = [1, 1, 1 … 1]’

Transition matrix: ∑(each column) = 1 or 0

Mathematical foundation (2)

NId)()(txWd(t)x

11Google’s iterative version: converges to a stationary solution

Jacobi algorithm

txWdtx )1(1)1()(

Alternative computation

)1()1()1( txWdtxt

||x(t)||1 = 1; normalized

Web communities: Energy balance [measure of authority]

the math behind pagerank

Documents

the math behind escher's picture gallery

pagerank . pagerank . pagerank google - aut

pagerank . pagerank . pagerank...

pagerank and related algorithms - pagerank and...

some math behind m.c. escher’s circle limit...

pagerank (1)

lec5 pagerank

blackjack: the math behind the cards

the math behind (and beyond) digital transformation

math 312 - markov chains, google's pagerank...

the research behind the carnegie learning math series

math behind the market

2. get real - math behind binomial model

the math behind availability

pagerank (from google)

the math behind the supply chain index

math 312 - markov chains, google's pagerank algorithm

pagerank proseminar

hits + pagerank

some math behind m.c. escher's circle limit patterns -...