![Page 1: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/1.jpg)
PageRank
Data Mining Lab, Big Data Research Center, UESTCSchool of Computer Science and Engineering Email: [email protected]
Xiaolin Yang
DMLESS IS MORE
2015/12/09
![Page 2: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/2.jpg)
CONTENTS 01 Background
02 Markov Chain
03 The Basic PageRank Model
04 The Power Method
05 Discussion about the Model
06 Other Topics
![Page 3: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/3.jpg)
Background
![Page 4: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/4.jpg)
Background DMLESS IS MOREBackground
At the Seventh International World Wide Web conference(WWW98), Sergey Brin and Larry Page’s paper “The PageRank citation ranking: Bringing order to the Web” made small ripples in the information science community that quickly turned into waves.
![Page 5: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/5.jpg)
Background DMLESS IS MOREBackground
Great Success Hyperlink Structure PageRank
![Page 6: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/6.jpg)
Background
![Page 7: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/7.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain:
Stock Market
Irreducible(不可约):any state can reached from any state.
Absorbing states(吸收态):the probability of leaving this state is zero
![Page 8: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/8.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain:
Limit Theorem(极限定理):
A homogeneous(齐次), irreducible(不可约), aperiodic(非周期) and positive recurrent(正常返) Markoc Chain has:
a limiting distribution:
also a stationary distribution:
![Page 9: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/9.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain:
Limit Theorem(极限定理):
![Page 10: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/10.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain(Aperiodic):
Period of a state:
. . : the greatest common divisor
![Page 11: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/11.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain(Positive Recurrent):
Recurrent (常返):
The first return probability(首返概率):
Positive Recurrent (正常返):
![Page 12: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/12.jpg)
Background DMLESS IS MOREMarkov Chain
Markov Chain:
1 2 3 4
p p
1
1
Positive Recurrent (正常返):
State 1:
State 2:
positive recurrent !
transient !
![Page 13: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/13.jpg)
Background
![Page 14: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/14.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Page Links:
• is linked by many pages;
If a page
• which link to it is authoritative
Then it will gain high PageRank!
![Page 15: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/15.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
![Page 16: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/16.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
• Row/Column sums are +1• Every element is NonnegativeMarkov matrix/stochastic matrix:
• Spectral radius(the supremum among the absolute values of spectrums) is 1;
Properties:
![Page 17: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/17.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
Every irreducible markov matrix has a stationary vector:
It’s independent of the initial state
If every element is positive, Markov matrix is irreducible(strongly connected):
![Page 18: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/18.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
The independence of the initial state:
![Page 19: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/19.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
Practical web graphs are not necessarily strongly connected.• For Dangling Nodes:
• For Irreducible:
![Page 20: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/20.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
![Page 21: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/21.jpg)
Background DMLESS IS MOREThe Basic PageRank Model
Transition Probability Matrix:
Another Interpretation for is:
Even though the customer always browse webpages by hyperlinks, but he can also use URL to “teleport” to a new page.
![Page 22: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/22.jpg)
Background
![Page 23: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/23.jpg)
Background DMLESS IS MOREThe Power Method
The Power Method:
![Page 24: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/24.jpg)
Background DMLESS IS MOREThe Power Method
Four advantages:
• and are never formed or stored.
• At each iteration, the power method only requires the storage of one vector, the current iterate.
• Converges quickly.
![Page 25: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/25.jpg)
Background
![Page 26: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/26.jpg)
Background DMLESS IS MOREDiscussion about the Model
Google always using = 0.85, so why this choice for ?
• The larger is, the more the true hyperlink structure of the web is used to determine webpage importance.
• The smaller is, the faster the convergence for power method.
1. a trade-off:
![Page 27: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/27.jpg)
Background DMLESS IS MOREDiscussion about the Model
![Page 28: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/28.jpg)
Background DMLESS IS MORE
• 5/6 of the time a Web surfer randomly clicks on hyperlinks.
• 1/6 of the time this Web surfer will go to the URL line and type the address of a new page.
2. intuitive reality:
Discussion about the Model
![Page 29: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/29.jpg)
Background DMLESS IS MORE
• control spamming done by the so-called link farms.
A link farm
Discussion about the Model
![Page 30: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/30.jpg)
Background DMLESS IS MOREDiscussion about the Model
![Page 31: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/31.jpg)
Background DMLESS IS MORE
Forcing Irreducibility:enforce every node is directly connected to every other node. (alter the true nature of the Web)
add a dummy node to the Web which connects to every other node and to which every other node is connected.
Discussion about the Model
![Page 32: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/32.jpg)
Background DMLESS IS MORE
Forcing Irreducibility:LeaderRank:
Ground
• self-adaptive• parameter-free
Discussion about the Model
![Page 33: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/33.jpg)
Background
![Page 34: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/34.jpg)
Background DMLESS IS MOREOther Topics
• Storage and Speed“The World’s Largest Matrix Computation”
• Spam
• The evolution and dynamics of the Web
• Web’s structure:how to use the scale-free structure to improve PageRank computations
• Community:how do changes within the community affect the PageRank of community pages?
![Page 35: PageRank - UESTCdm.uestc.edu.cn/wp-content/uploads/seminar/20151209_PageRank.… · Data Mining Lab, Big Data Research Center, UESTC School of Computer Science and Engineering](https://reader034.vdocuments.site/reader034/viewer/2022051408/600d91f6302361685711cbb3/html5/thumbnails/35.jpg)
Background
DMLESS IS MORE