cs224w: social and information network analysis jure...

58
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

Upload: others

Post on 21-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

http://cs224w.stanford.edu

Page 2: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

We are surrounded by hopelessly complex systems: Society is a collection of six billion individuals Communication systems link electronic devices Information and knowledge is organized and linked Thousands of genes in our cells work together in a

seamless fashion Our thoughts are hidden in the connections

between billions of neurons in our brain

These systems, random looking at first, display signatures of order and self-organization

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 2

Page 3: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Each such system can be represented as a network, that defines the interactions between the components

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 3

Page 4: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 4

Graph of the Internet (Autonomous Systems)

Page 5: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 5

Connections between political blogs

Page 6: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 6

Seven Bridges of Königsberg (Euler 1735)

Return to the starting point by traveling each link of the graph once and only once.

London Underground

Page 7: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 7

: departments

: consultants

: external experts

Page 8: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 8

Nodes:

Links:

Companies

Investment

Pharma

Research Labs

Public

Biotechnology

Collaborations

Financial

R&D

Bio-tech companies, 1991

Page 9: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 9

Human brain has between 10-100 billion neurons

Page 10: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 10

Metabolic networks: Nodes: Metabolites and enzymes

Edges: Chemical reactions

Protein-Protein Interaction Networks: Nodes: Proteins

Edges: ‘physical’ interactoins

Page 11: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Behind each such system there is an intricate wiring diagram, a network, that

defines the interactions between the components

We will never understand a complex

system unless we understand the networks behind it

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 11

Page 12: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

How do we reason about networks Empirical: Study network data to find organizational

principles Mathematical models: Probabilistic, graph theory Algorithms for analyzing graphs

What do we hope to achieve from models of networks? Patterns and statistical properties of network data Design principles and models Understand why networks are organized the way they

are (Predict behavior of networked systems)

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 12

Page 13: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

What do we study in networks? Structure and evolution: What is the structure of a network? Why and how did it became to have

such structure? Processes and dynamics: Networks provide “skeleton”

for spreading of information, behavior, diseases

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 13

Page 14: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Age and size of networks 9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 14

Page 15: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 15

Page 16: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 16

Page 17: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Why is the role of networks expanding? Data availability Rise of the Web 2.0 and Social media

Universality Networks from various domains of science, nature,

and technology are more similar than one would expect

Shared vocabulary between fields Computer Science, Social science, Physics,

Economics, Statistics, Biology

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 17

Page 18: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 18

Page 19: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Intelligence and fighting (cyber) terrorism

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 19

Page 20: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Predicting epidemics

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 20

Real Predicted

Page 21: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Interactions of human disease Drug design

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 21

Page 22: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

If you were to understand the spread of diseases, can you do it without networks?

If you were to understand the WWW

structure and information, hopeless without invoking the Web’s topology.

If you want to understand human diseases, it

is hopeless without considering the wiring diagram of the cell.

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 22

Page 23: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the
Page 24: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Covers a wide range of network analysis techniques – from basic to state-of-the-art

You will learn about things you heard about: Six degrees of separation, small-world, page rank, network effects, P2P

networks, network evolution, spectral graph theory, virus propagation, link prediction, power-laws, scale free networks, core-periphery, network communities, hubs and authorities, bipartite cores, information cascades, influence maximization, …

Covers algorithms, theory and applications It’s going to be fun

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 24

Page 25: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Good background in: Algorithms Graph theory Probability and Statistics Linear algebra

Programming: You should be able to write non-trivial programs

4 recitation sessions: 2 to review basic mathematical concepts 2 to review programming tools (SNAP, NetworkX)

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 25

Page 26: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Course website: http://cs224w.stanford.edu

Slides posted at least 30 min before the class Required readings: Mostly chapters from Easley&Kleinberg book Papers

Optional readings: Papers and pointers to additional literature This will be very useful for reaction paper and

project proposal

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 26

Page 27: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Recommended textbook: D. Easley, J. Kleinberg: Networks, Crowds,

and Markets: Reasoning About a Highly Connected World Freely available at:

http://www.cs.cornell.edu/home/kleinber/networks-book/

Optional books: Matthew Jackson: Social and Economic

Networks Mark Newman: Networks: An introduction

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 27

Page 28: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Assignment Due on

Homework 1 October 13

Reaction paper October 20

Project proposal October 27

Homework 2 November 3

Competition November 10

Project milestone November 17

Project write-up December 11 (no late days!)

Project poster presentation

December 16 12:15-3;15pm

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 28

Page 29: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Final grade will be composed of: 2 homeworks: 15% each Reaction paper: 10% Substantial class project: 60% Proposal: 15% Project milestone: 15% Final report: 60% Poster session: 10%

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 29

Page 30: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Assignments (homeworks, write-ups, reports) take time. Start early!

How to submit? Paper: Box outside the class and in Gates basement We will grade on paper! You should also submit electronic copy: 1 PDF/ZIP file (writeups, experimental results, code) Submission website: http://www.stanford.edu/class/cs224w/submit/

SCPD: Only submit electronic copy & send us email 7 late days for the quarter: Max 4 late days per assignment

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 30

Page 31: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Substantial course project: Experimental evaluation of algorithms and models

on an interesting network dataset A theoretical project that considers a model, an

algorithm and derives a rigorous result about it An in-depth critical survey of one of the course

topics and offering a novel perspective on the area Performed in groups of (exactly) 3 students

Project is the main work for the class

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 31

Page 32: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 32

Borja Peleato (head TA)

Evan Rosen

Chenguang Zhu

Dakan Wang

Page 33: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Piazza Q&A website: http://piazza.com/stanford/fall2011/cs224w If you don’t have @stanford.edu email address, send us

email and we will register you to Piazza

For e-mailing course staff, always use: [email protected]

For course announcements subscribe to: [email protected]

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 33

Page 34: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

You are welcome to sit-in and audit the class Please send us email saying that you will be

auditing the class

To receive announcements, subscribe to the mailing list: [email protected]

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 34

Page 35: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the
Page 36: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Network is a collection of objects where some pairs of objects are connected by links What is the structure of the network?

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 36

Page 37: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Objects: nodes, vertices N Interactions: links, edges E System: network, graph G(N,E)

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 37

Page 38: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Network often refers to real systems Web, Social network, Metabolic network Language: Network, node, link

Graph: mathematical representation of a network Web graph, Social graph (a Facebook term) Language: Graph, vertex, edge

We will try to make this distinction whenever it is appropriate, but in most cases we will use the two terms interchangeably

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 38

Page 39: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 39

Peter Mary

Albert

Albert

co-worker

friend brothers

friend

Protein 1 Protein 2 Protein 5

Protein 9

Movie 1

Movie 3 Movie 2

Actor 3

Actor 1 Actor 2

Actor 4

|N|=4 |E|=4

Page 40: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Choice of the proper network representation determines our ability to use networks successfully: In some cases there is a unique, unambiguous

representation In other cases, the representation is by no means

unique The way you assign links will determine the nature

of the question you can study

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 40

Page 41: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

If you connect individuals that work with each other, you will explore a professional network

If you connect those that have a sexual relationship, you will be exploring sexual networks

If you connect scientific papers that cite each other, you will be studying the citation network

If you connect all papers with the same word in the title, you will be exploring what? It is a network, nevertheless 9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 41

Page 42: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Undirected Links: undirected

(symmetrical)

Undirected links: Collaborations Friendship on Facebook

Directed Links: directed

(arcs)

Directed links: Phone calls Following on Twitter

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 42

A

B

D

C

L

M F

G

H

I

A G

F

B C

D

E

Page 43: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Bridge edge: If we erase it, the graph becomes disconnected. Articulation point: If we erase it, the graph becomes disconnected.

Largest Component: Giant Component Isolated node (F)

Connected (undirected) graph: Any two vertices can be joined by a path.

A disconnected graph is made up by two or more connected components

D C

A

B

F

F

G

D C

A

B

F

F

G

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 43

Page 44: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

E

C

A

B

G

F

D

Strongly connected directed graph has a path from each node to every other node

and vice versa (e.g., A-B path and B-A path) Weakly connected directed graph is connected if we disregard the edge directions

Graph on the left is not strongly connected.

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 44

Page 45: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Q: What does Web “look like” at a global level? Web as a graph: Nodes = pages Edges = hyperlinks

What is a node? Problems: Dynamic pages created on the fly “dark matter” – inaccessible

database generated pages

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 45

Page 46: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 46

I teach a class on

Networks. CS224W:

Classes are in the

Computer Science

building Computer

Science Department at Stanford

Stanford University

Page 47: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

In early days of the Web links were navigational Today many links are transactional

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 47

I teach a class on

Networks. CS224W:

Classes are in the

Computer Science

building Computer

Science Department at Stanford

Stanford University

Page 48: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 48

Citations References in an Encyclopedia

Page 49: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 49

Page 50: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

How is the Web linked? What is the “map” of the Web?

Web as a directed graph [Broder et al. 2000]: Given node v, what can v reach? What other nodes can reach v?

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 50

In(v) = {w | w can reach v} Out(v) = {w | v can reach w}

E

C

A

B

G

F

D

In(A) = {B,C,E,G} Out(A)={B,C,D,F}

Page 51: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Two types of directed graphs: Strongly connected: Any node can reach any node

via a directed path In(A)=Out(A)={A,B,C,D,E}

DAG – Directed Acyclic Graph: Has no cycles: if u can reach v,

then v can not reach u

Any directed graph can be expressed in terms of these two types

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 51

E

C

A

B

D

E

C

A

B

D

Page 52: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Strongly connected component (SCC) is a set of nodes S so that: Every pair of nodes in S can reach each other There is no larger set containing S with this

property

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 52

E

C

A

B

G

F

D

Strongly connected components of the graph: {A,B,C,G}, {D}, {E}, {F}

Page 53: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Fact: Every directed graph is a DAG on its SCCs (1) SCCs partitions the nodes of G Each node is in exactly one SCC

(2) If we build a graph G’ whose nodes are SCCs, and with an edge between nodes of G’ if there is an edge between corresponding SCCs in G, then G’ is a DAG

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 53

E

C

A

B

G

F

D

(1) Strongly connected components of graph G: {A,B,C,G}, {D}, {E}, {F}

(2) G’ is a DAG:

G

G’ {A,B,C,G}

{E}

{D}

{F}

Page 54: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Why is (1) true? SCCs partitions the nodes of G. Suppose node v is a member of 2 SCCs S and S’. Then S∪S’ is one large SCC:

Why is (2) true? G’ (graph of SCCs) is a DAG If G’ is not a DAG, then we have a

directed cycle. Now all nodes on the cycle are

mutually reachable, and all are part of the same SCC.

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 54

v S S’

{A,B,C,G}

{E}

{D}

{F}

Now {A,B,C,G,E,F} is a SCC

G’

Page 55: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Take a large snapshot of the Web and try to understand how it’s SCCs “fit together” as a DAG

Computational issue: Want to find SCC containing node v? Observation: Out(v) … nodes that can be reached from v SCC containing v is: Out(v) ∩ In(v) = Out(v,G) ∩ Out(v,G), where G is G with all edge directions flipped

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 55

v

Out(v)

Page 56: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

There is a giant SCC There won’t be 2 giant SCCs: Just takes 1 page from one SCC to link to the

other SCC If the components have millions of pages the

likelihood of this is very large

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 56

[Broder et al., ‘00]

Giant SCC1 Giant SCC2

Page 57: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

250 million pages, 1.5 billion links 9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 57

[Broder et al., ‘00]

Page 58: CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/01-intro.pdf · Each such system can be represented as a network, that defines the

Learn: Some conceptual organization of the Web (i.e., bowtie)

Not learn: Treats all pages as equal Google’s homepage == my homepage

What are the most important pages How many pages have k in-links as a function of k? The degree distribution: ~1/k2+ε

Link analysis ranking -- as done by search engines (PageRank) Internal structure inside giant SCC Clusters, implicit communities?

How far apart are nodes in the giant SCC: Distance = # of edges in shortest path Avg = 16 [Broder et al.]

9/27/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis 58