stanford university  · universal language for describing complex data we are surrounded by...

30
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

CS224W: Social and Information Network AnalysisJure Leskovec, Stanford University

http://cs224w.stanford.edu

Page 2: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

How do we reason about networks?

Page 3: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

How do we reason about networks? Empirical: Study network data to find organizational principles Mathematical models: Probabilistic, graph theory Algorithms for analyzing graphs

What do we hope to achieve from models of networks? Patterns and statistical properties of network data Design principles and models Understand why networks are organized the way they are (Predict behavior of networked systems)

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

Page 4: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

What do we study in networks? Structure and evolution: What is the structure of a network? Why and how did it became to have such structure?

Processes and dynamics: Networks provide “skeleton”for spreading of information,behavior, diseases

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

Page 5: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Network diameter Edge clustering Scale‐free networks Strength of weak ties Core‐periphery structure Densification power law Shrinking diameters Structural Balance Status Theory Memetracking Small‐world model Erdös‐Renyi model Preferential attachment  Network cascades

Independent cascade model Decentralized search PageRank Hubs and authorities Girvan‐Newman Modularity Clique percolation Supervised random walks Influence maximization Outbreak detection Linear Influence Model Network Inference Kronecker Graphs Bow‐tie structure

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

Page 6: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

ObservationsObservations

Small diameter, Edge clusteringSmall diameter, Edge clustering

Scale‐free

Strength of weak ties, Core‐periphery

Densification power law,Shrinking diameters

Patterns of signed edge creation

Viral Marketing, Blogosphere, Memetracking

ModelsModels

Small‐world model,Erdös‐Renyi model

Preferential attachment, Copying model

Kronecker Graphs

Microscopic model of evolving networks

Structural balance, Theory of status

Independent cascade model, Game theoretic model

AlgorithmsAlgorithms

Decentralized search

PageRank, Hubs and authorities

Community detection: Girvan‐Newman, Modularity

Link prediction,Supervised random walks

Models for predicting edge signs

Influence maximization, Outbreak detection, LIM

Page 7: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Six degrees of separation Networks have small diameters

Edges in the networks cluster Clustering coefficient

Models: Erdös‐Renyi model Baseline model for networks

The Small‐World model Small diameter and clustered edges

Algorithms: Decentralized search in networks Kleinberg’s model and algorithm

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

Ci=1/3

→ ~ ,

Page 8: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Power‐law degrees Degrees are heavily skewed

Network resilience Networks are resilient to random attacks

Models: Preferential attachment Rich get richer

Algorithms: Hubs and Authorities Recursive:  ∑ → ,  ∑ →

PageRank Recursive formulation, Random jumps

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

Page 9: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Strength of weak ties Core‐periphery structure

Models: Kronecker graphs model

Algorithms: Girvan‐Newman (Betweeness centrality) Modularity optimization #edges within group – E[#edges within group] Clique Percolation Method Overlapping communities

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

Page 10: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Densification Power Law

Shrinking Diameter Models: Microscopic Network Evolution Exponential life‐times, Evolving sleeping times Random‐Random edge attachment

Algorithms: Link prediction Supervised Random Walks

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

a=1.2

1st edgeof node i

last edgeof node i

Edge creation events

ssss

Page 11: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Signed link creation +links are more embedded

Models: Structural Balance Coalition structure of networks

Status Theory Global node status ordering

Algorithms: Predicting edge signs

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

BA

X ++

‐ ‐+

+ +‐

‐ ‐‐

+ ++

Balanced Unbalanced

+ +‐

3

1

2u v

‐+++

‐‐+‐

Page 12: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Tracking contagions Viral Marketing Hyperlinks

Models – Decision Based Game theoretic model: Payoffs, Competing products

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

wA B

Page 13: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Models – Probabilistic Independent Cascade Model Each node infects a neighborwith some probability

Algorithms: Influence Maximization Set of k nodes producing largest expected cascade size if activated Submodularity Greedy hill‐climbing Outbreak Detection

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

0.4

0.4

0.4

0.4 0.2

0.2

0.2

0.4 0.30.3

0.3

0.3

0.30.3

0.2

c

be

a d

g

fh

i

Influence set of b

Influence set of a

Page 14: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Observations: Meme‐tracking Blogs trail mass media

Models/Algorithms: Linear Influence Model Predict information popularitybased on influence functions

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

Volu

me

Iv

Iw

Iu

tu tv tw

Page 15: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

ObservationsObservations

Small diameter, Edge clusteringSmall diameter, Edge clustering

Scale‐free

Strength of weak ties, Core‐periphery

Densification power law,Shrinking diameters

Patterns of signed edge creation

Viral Marketing, Blogosphere, Memetracking

ModelsModels

Small‐world model,Erdös‐Renyi model

Preferential attachment, Copying model

Kronecker Graphs

Microscopic model of evolving networks

Structural balance, Theory of status

Independent cascade model, Game theoretic model

AlgorithmsAlgorithms

Decentralized search

PageRank, Hubs and authorities

Community detection: Girvan‐Newman, Modularity

Link prediction,Supervised random walks

Models for predicting edge signs

Influence maximization, Outbreak detection, LIM

Page 16: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Page 17: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Link prediction Suggest friends in networks

Trust and distrust Predict who are your friends/foes, who you trust

Community detection Find clusters and communities in social networks

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Page 18: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Marketing and advertising Finding influencers Tracing information flows

Diffusion of information How to trace information asit spreads How to efficiently detect epidemics and information outbreaks

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Page 19: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Intelligence and fighting (cyber) terrorism

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

Page 20: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Predicting epidemics

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

Real Predicted

Page 21: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Interactions of human disease Drug design

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Page 22: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

ObservationsObservations

Small diameter, Edge clusteringSmall diameter, Edge clustering

Scale‐free

Strength of weak ties, Core‐periphery

Densification power law,

Shrinking diameters

Patterns of signed edge creation

Viral Marketing, Blogosphere, Memetracking

ModelsModels

Small‐world model,Erdös‐Renyi model

Preferential attachment, Copying 

model

Kronecker Graphs

Microscopic model of evolving networks

Structural balance, Theory of status

Independent cascade model, Game theoretic 

model

AlgorithmsAlgorithms

Decentralized search

PageRank, Hubs and authorities

Community detection: Girvan‐Newman, 

Modularity

Link prediction,Supervised random 

walks

Models for predicting edge signs

Influence maximization, Outbreak detection, 

LIM

Page 23: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Availability of network data: Web & Social media: a “telescope” into humanity

Task: Find patterns, rules, clusters, … … in large static and evolving graphs … in processes spreading over the networks

Goal: Predict/anticipate future behaviors Detect outliers Design novel applications

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Page 24: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Universal language for describing complex data We are surrounded by hopelessly complex systems Society is a collection of six billion individuals Communication systems link electronic devices Information and knowledge is organized and linked

Networks from various domains of science, nature, and technology are more similar than expected

Shared vocabulary between fields Computer Science, Social science, Physics, Economics, Statistics, Biology

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

Page 25: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Project writeups Due Monday Dec 10 at 11:59AM pacific time Due just before the poster session NO LATE DAYS!

Poster session Monday Dec 10 from 12:15 ‐ 3:15 in Packard Atrium All groups which have at least one non‐SCPD member are expected to present Prepare a 3‐minute elevator pitch of your poster One group member should be at the poster at all times, but the goal of this is to give you a chance to see what your classmates have been working on, so make sure to explore There will be snacks!

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

Page 26: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Seminars: RAIN Seminar: http://rain.stanford.edu InfoSeminar: http://i.stanford.edu/infoseminar

Conferences: WWW: ACM World Wide Web Conference WSDM: ACM Web search and Data Mining ICWSM: AAAI Int. Conference on Web‐blogs and Social Media KDD: ACM Conference on Knowledge Discovery and Data Mining

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

Page 27: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

CS246:Mining Massive Datasets (Winter 2013) Data Mining & Machine Learning for big data  (big=does’ fit in memory/single machine) MapReduce, Hadoop and similar

CS341: Project in Data Mining (Spring 2013) Do a research project on big data Groups of 3 students We provide interesting data, projects and unlimited access to the Amazon computing infrastructure Nice way to finish up your class project & publish it!

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

Page 28: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

Other relevant courses: CS276: Information Retrieval and Web Search CS229: Machine Learning CS245: Database System Principles CS347: Transaction Processing and Distributed Databases CS448g: Interactive Data Analysis

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

Page 29: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

You Have Done a Lot!!! And (hopefully) learned a lot!!! Answered questions and proved many interesting results Implemented a number of methods And did excellently on the final project!

12/5/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

Thank You for theHard Work!!!

Page 30: Stanford University  · Universal language for describing complex data We are surrounded by hopelessly complex systems Societyis a collection of six billion individuals Communication

30Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu12/5/2012