cse 701: fast algorithms for graph analyticssariyuce.com/sem/firstclass.pdf · •dec...

29
CSE 701: Fast Algorithms for Graph Analytics A. Erdem Sariyuce

Upload: others

Post on 21-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

CSE 701: Fast Algorithms for Graph

Analytics

A. Erdem Sariyuce

Page 2: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Who am I?• My name is Erdem• Office: 323 Davis Hall• Office hours: Tuesday 10-12

• Research on graph (network) mining & management• Practical algorithms

• Streaming, distributed, parallel• Leverage the characteristics of real-world data• Fast graph analytics

Page 3: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Heard About Big-Data?• Yes, I do that• For graphs, mostly

• Not only big, but also• Dynamic

• Incomplete

• Noisy

• Distributed

Page 4: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Social Information

Protein-interactionRouters

4

Graphs are everywhere

Page 5: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

What’s this class about?• Mining graphs to get hidden insights• By finding patterns in complex structure• New models and algorithms

• On large data that cannot be examined manually• Computationally challenging• Fast algorithms needed

• On dynamic, incomplete, noisy data

Page 6: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

What’s this class about?• We will cover a range of topics about• Structure of real-world networks (graphs)

• Small-world

• Community structure

• …

• Practical algorithms for fast graph analytics• Centrality computation

• Community detection

• Graph partitioning

• …

Page 7: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

You?• MS or PhD? What year?

• Any particular objective in this course?

• Any research interests?

Page 8: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Course Structure• Presentations

• Questions before class

• Discussion in class

• Literature survey (or one additional presentation)• If taking 3 credits

Page 9: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Presentations• Each week two papers• Paper list is available at the course website

• http://sariyuce.com/seminar.html

• Each paper will be presented by a single student• No groups

• Each student will present one paper (more on this later)

Page 10: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Before class• Questions on Piazza before each class (except presenter)• Everyone will read the papers!• I’ll post some guides on how to read a paper

• Questions are due Tuesday night 11.59pm• Open-ended• Thought-provoking• Unique• ‘What does Fig. 4 tell?’ is NOT a question

Page 11: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

In class: Presentation• For each paper;• At least 1 hour presentation

• It will be highly interactive; by me and others

• You can find slides online, ask authors if needed• But don’t rely on those too much!

• Conference presentations are only for 15/20 mins

• Citation analysis for at least 10 mins

• References: Which papers are cited in this paper?

• Cited by: Which papers have cited this paper?• Google Scholar, Microsoft Academic Search

• You can get feedback on slides/talk before the class!• Ask timely

Page 12: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

In class: Citation Analysis• References: Which papers are cited in this paper?• Briefly explain 5 references that form basis for the paper

• Cited by: Which papers have cited this paper?• Google scholar

• https://scholar.google.com/• Microsoft Academic search

• https://academic.microsoft.com/• Check the ones at top venues

• SIGKDD, WWW, WSDM, VLDB, SIGMOD, Nature, Science …

• Check the ones that got most citations• Microsoft Academic search has that

• Briefly explain 5 of those; what’s new there?

Page 13: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

In Class: Discussion• Presenter will read the posted questions• From Piazza

• And initiate discussion• Give his/her opinion, others should chip in as well

• Class participation points will be earned here

• I might force you to be a volunteer :)

• Each class is 150 mins, so we have plenty of time

Page 14: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Grading is S/U• This is a seminar class! 75% is needed for an S

• 1 or 2 credits;• Paper presentation: 40%• Piazza questions: 30%• Class participation: 30%

• 3 credits;• Paper presentation: 40%• Piazza questions: 20%• Class participation: 20%• Literature survey (individual): 20%

• Or one additional presentation (we have 26 papers in total)• 16 students in total; 8 with 3 credits• Students can do an extra presentation to waive the literature survey

Page 15: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Literature Survey• On a particular subject• Find, read, and summarize/categorize the previous work

• Talk to me for the topic

• Report is required by the end of semester• Update on 6th or 7th week, will let you know

• If done well;• We can go for a paper!• Survey papers are cited most, can be quite impactful

Page 16: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Paper List and Schedule•We will decide on Piazza

• I’ll post it at the end of the class

•First Come First Serve, be quick!

Page 17: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Sep 11: Four Degrees of Separation

• Web Science Conference, 2012

• Cited by +500

• Small-world phenomena

• Analysis of the entire Facebook network!

• Sep 11: Graph structure in the Web• Computer Networks, 2000

• Cited by +3900

• Characterizes the web

• Bow-tie structure

Page 18: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Sep 18: Statistical properties of community structure in large

social and information networks• World Wide Web Conference (WWW), 2008• Cited by +860• Detailed analysis of real communities in a variety of domains• Interesting conclusions on community size

• Sep 18: Uncovering the overlapping community structure of complex networks in nature and society• Nature, 2005• Cited by +5000• Overlapping communities• Clique based formulation

Page 19: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Sep 25: Authoritative Sources in a Hyperlinked Environment

• Journal of ACM, 1999• Cited by +9400• Hubs and authorities• One of the most influential works

• Sep 25: Graphs over time: densification laws, shrinking diameters and possible explanations• SIGKDD, 2005• Cited by +2200• Best-paper in ’05, Test-of-time award in ‘16• First work on graph evolution

Page 20: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Oct 2: The Link Prediction Problem for Social Networks

• JASIST 2007• Cited by +9400• Hubs and authorities• One of the most influential works

• Oct 2: Simplicial closure and higher-order link prediction• PNAS 2018• Beyond pair-wise• Higher-order relationships• Adapting triangle closure

Page 21: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Oct 9: A Fast and High Quality Multilevel Scheme for Partitioning

Irregular Graphs• SIAM Journal on Scientific Computing (SISC), 1998• Cited by +4800, widely used• Ground-breaking work on graph partitioning• Efficient multi-level heuristics

• Oct 9: Experimental Analysis of Streaming Algorithms for Graph Partitioning• SIGMOD, 2019• Streaming graph partitioning• Comparative survey• Very important for production-level deployments

Page 22: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Oct 16: Incremental k-core decomposition: algorithms and

evaluation• Very Large Data Bases Journal (VLDBJ), 2016• Maintaining graph analytics for streaming graphs• k-core decomposition is a fundamental operation• Density pointers

• Oct 16: A Fast Order-Based Approach for Core Maintenance• ICDE, 2017• Better performance than the paper above• Additional indexing• Runtime vs. space trade-off

Page 23: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Oct 23: Maximizing the Spread of Influence through a Social

Network• SIGKDD 2003• Cited by +6300• Formalizing the viral marketing• Very influential

• Oct 23: Influence Maximization on Social Graphs: A Survey• TKDE, 2018• Comprehensive survey• All follow-ups since the paper above

Page 24: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Oct 30: Signed Networks in Social Media

• CHI 2010• Cited by +940• What happens if we have +/- labels on edges?• Structural balance theory verified

• Oct 30: A Survey of Signed Network Mining in Social Media• ACM CSUR 2016• Another comprehensive survey• Covers all graph mining works on signed networks

Page 25: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Nov 6: Finding a Maximum Density Subgraph

• Berkeley TR 1984• Beautiful theory paper• Finding a subgraph with largest average degree• Influenced many works

• Nov 6: Denser than the Densest Subgraph: Extracting Optimal Quasi-Cliques with Quality Guarantees• SIGKDD 2013• How to generalize the paper above for triangles?• Novel quasi-clique formulation

Page 26: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Nov 13: Trusses: Cohesive Subgraphs for Social Network Analysis

• NSA TR 2008• Generalization of k-core model to triangles• Influenced many works

• Nov 13: Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions• WWW 2015• Unification of core/truss models for higher orders• Hierarchical dense subgraph discovery

Page 27: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Nov 20: Network Motifs: Simple Building Blocks of Complex

Networks• Science, 2002

• Cited by +6100

• Small induced subgraphs

• Fundamental units of complex networks

• Nov 20: Uncovering Biological Network Function via Graphlet Degree Signatures• Cancer Informatics, 2008

• Cited by +270

• Extends degree concept to motifs

• Very simple local statistics to capture the node function

Page 28: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Nov 27: A Faster Algorithm for Betweenness Centrality

• Journal of Mathematical Sociology, 2001• Cited by +3100• Finding nodes that are central in the graph• Reduces complexity from O(V^3) to O(V.E)

• Nov 27: Centrality and network flow• Social Networks 2004 • Cited by +2700• Considers flows on edges• Not true that all paths are equally useful

Page 29: CSE 701: Fast Algorithms for Graph Analyticssariyuce.com/sem/firstclass.pdf · •Dec 4:Higher-order organization of complex networks •Science, 2016 •Cited by +150 •Analyzing

Papers• Dec 4: Higher-order organization of complex networks

• Science, 2016• Cited by +150• Analyzing the higher-order structures (Not pair-wise relations)• How triangles and other small motifs impact the structure

• Dec 4: Representing higher-order dependencies in networks• Science Advances, 2016 • A different approach to model higher-order structures• Non-Markovian property