seok hee hong - visual analytics of big data

76
Visual Analytics of Big Data Seok-Hee Hong University of Sydney Bioinformatics Winter School 2014

Upload: australian-bioinformatics-network

Post on 10-May-2015

524 views

Category:

Science


1 download

DESCRIPTION

Recent technological advances have led to the production of a big data, and consequently have led to many massive complex network models in many domains including science and engineering. Examples include biological networks such as phylogenetic network, gene regulatory network, metabolic pathways, biochemical network and protein‐protein interaction networks. Other examples are social networks such as facebook network, twitter network, linked‐in network, telephone call network, patent network, citation network and collaboration network. Visualization is an effective analysis tool for such networks. Good visualization reveals the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new findings and predictions. However, constructing good visualization of big data can be very challenging. In this talk, I will present a framework for visual analytics of big data. Visual Analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Our framework is based on the tight integration of network analysis methods with visualization methods to address the scalability and complexity issues. I will present a number of case studies using various networks derived from big data, in particular social networks and biological networks. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

TRANSCRIPT

Page 1: Seok Hee Hong - Visual analytics of big data

Visual Analytics of

Big Data

Seok-Hee Hong

University of Sydney

Bioinformatics Winter School 2014

Page 2: Seok Hee Hong - Visual analytics of big data

Big Data and

The Scale Problem

Page 3: Seok Hee Hong - Visual analytics of big data

Social networks: Facebook users

2004 2005 2006 2007

50M

40M

30M

20M

10M

5M

0

Page 4: Seok Hee Hong - Visual analytics of big data

Biological networks: KEGG database

1982 1988 1994 2000 2006

108

107

106

105

104

103

102

Page 5: Seok Hee Hong - Visual analytics of big data

Internet Movie Data Base

Year 1937

Page 6: Seok Hee Hong - Visual analytics of big data

1995

Page 7: Seok Hee Hong - Visual analytics of big data

The scale problem Data sets are growing much faster than

computing systems/tools to analyse them.

Existing algorithms/methods do not scale

well enough to be efficient/effective on the big data sets.

Page 8: Seok Hee Hong - Visual analytics of big data

Big Graph/Network

Page 9: Seok Hee Hong - Visual analytics of big data
Page 10: Seok Hee Hong - Visual analytics of big data

Erdos networks Lincoln Lu

Page 11: Seok Hee Hong - Visual analytics of big data
Page 12: Seok Hee Hong - Visual analytics of big data

Visual Analytics

Page 13: Seok Hee Hong - Visual analytics of big data

Good visualisation can enable users:

to understand the structure

to discover new knowledge/insight

to find regular/abnormal patterns/behavior

to generate/confirm/reject hypothesis

to confirm expected and discover unexpected

to reveal the hidden truth

to predict the future

Visual Analytics

Page 14: Seok Hee Hong - Visual analytics of big data

Visual Data Mining

Page 15: Seok Hee Hong - Visual analytics of big data

Key Scientific Challenge

1. Scalability

2. Visual Complexity

3. Domain Complexity

Page 16: Seok Hee Hong - Visual analytics of big data

Visual Analysis Framework for Big Graph

Big Data Graph Picture

interaction

visualisation analysis

Page 17: Seok Hee Hong - Visual analytics of big data

GEOMI (Geometry for Maximum Insight)

Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney

Page 18: Seok Hee Hong - Visual analytics of big data

GEOMI (GEOmetry for Maximum Insight)

Network Analysis

Interaction

Graph Layout

Page 19: Seok Hee Hong - Visual analytics of big data

GEOMI Features Network/graph generator

Scale-free networks Clustered graph Hierarchical graph

Network analysis

Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness

Group analysis: blockmodelling, clustering, k-core, structural equivalence

Graph algorithms: filtering, shortest path, giant component Interaction/Navigation

Zoom, panning, rotation Selection Graph layout interaction/navigation Animation Head gesture interaction

Page 20: Seok Hee Hong - Visual analytics of big data

Graph/Network Layout Node-link representation

Trees Planar graphs General undirected graphs Directed graphs Clustered graphs Hierarchical graphs Scale-free networks Dynamic/Temporal networks Multi-relational networks Multi-variate networks Overlapping networks

Map representation Tree/Radial tree map Voronoi map Temporal map

Hybrid representation

Page 21: Seok Hee Hong - Visual analytics of big data
Page 22: Seok Hee Hong - Visual analytics of big data

Interaction with Cool Toys

Page 23: Seok Hee Hong - Visual analytics of big data

IMDB (Internet Movie Data Base) Network Analysis

Kevin Bacon Network

Page 24: Seok Hee Hong - Visual analytics of big data

Days of Thunder (1990)

Far and Away (1992) A Few Good Man

Hollywood Movie Actor Collaboration Network

Kevin Bacon Network

IMDB (Internet Movie DataBase)

Page 25: Seok Hee Hong - Visual analytics of big data

Kevin Bacon

Tom Cruise:

Bacon #1

Nicole Kidman: Bacon#2

Page 26: Seok Hee Hong - Visual analytics of big data

Evolution of Kevin Bacon Network

Page 27: Seok Hee Hong - Visual analytics of big data

GD05: Evolution of IMDB Kevin Bacon #1: 2000

Page 28: Seok Hee Hong - Visual analytics of big data

WOS (Web of Science) Analysis

Social Network Co-citation Network

Page 29: Seok Hee Hong - Visual analytics of big data

Evolution of Co-citation Network in WOS

Page 30: Seok Hee Hong - Visual analytics of big data

co-citation network of year 2003

Page 31: Seok Hee Hong - Visual analytics of big data

co-citation network of Year 2006

Page 32: Seok Hee Hong - Visual analytics of big data

Information Visualisation

Network Analysis

Page 33: Seok Hee Hong - Visual analytics of big data

Evolution of research area

Page 34: Seok Hee Hong - Visual analytics of big data

Info Vis Collaboration Network

Page 35: Seok Hee Hong - Visual analytics of big data

Email Network Virus Detection

Page 36: Seok Hee Hong - Visual analytics of big data
Page 37: Seok Hee Hong - Visual analytics of big data

History of World Cup

Page 38: Seok Hee Hong - Visual analytics of big data
Page 39: Seok Hee Hong - Visual analytics of big data

World Cup 2002

Page 40: Seok Hee Hong - Visual analytics of big data

Edge Bundling with centrality analysis & k-core analysis

Page 41: Seok Hee Hong - Visual analytics of big data

US Airline Network Analysis

Page 42: Seok Hee Hong - Visual analytics of big data
Page 43: Seok Hee Hong - Visual analytics of big data
Page 44: Seok Hee Hong - Visual analytics of big data
Page 45: Seok Hee Hong - Visual analytics of big data
Page 46: Seok Hee Hong - Visual analytics of big data
Page 47: Seok Hee Hong - Visual analytics of big data

Integration with Clustering

Clustered Graph Layout

Page 48: Seok Hee Hong - Visual analytics of big data
Page 49: Seok Hee Hong - Visual analytics of big data

Metabolic Pathway Visualisation

Page 50: Seok Hee Hong - Visual analytics of big data

GO-defined Protein Interaction Network

Page 51: Seok Hee Hong - Visual analytics of big data

2.5D Scale-free Network Visualisation

Page 52: Seok Hee Hong - Visual analytics of big data

Scale-free Network

[Barabasi and Albert 99] Exponential Growth Preferential attachment

Properties Power-law degree distribution Sparse, but locally dense Small-world property: O(loglogn) average path length High clustering coefficient Resilient to random attack, but vulnerable to designed

attack Examples: Webgraph Social networks Biological networks

Page 53: Seok Hee Hong - Visual analytics of big data

Parallel Plane/Concentric Sphere Layout

G1

G3

G2

G1

G3

G2

Page 54: Seok Hee Hong - Visual analytics of big data

PPI networks Hawoong Jeong

Page 55: Seok Hee Hong - Visual analytics of big data
Page 56: Seok Hee Hong - Visual analytics of big data
Page 57: Seok Hee Hong - Visual analytics of big data

Visualisation of Patterns

Motif

Page 58: Seok Hee Hong - Visual analytics of big data
Page 59: Seok Hee Hong - Visual analytics of big data

Overlapping Network Visualisation for

Integrated Analysis

Page 60: Seok Hee Hong - Visual analytics of big data

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Page 61: Seok Hee Hong - Visual analytics of big data

Two Overlapping Networks

Page 62: Seok Hee Hong - Visual analytics of big data

Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli

9 overlap

1-neighborhood network

Page 63: Seok Hee Hong - Visual analytics of big data
Page 64: Seok Hee Hong - Visual analytics of big data

Gene Regulatory Network [RegulonDB] and PPI: E. Coli

periphery proteins

6 hubs: no overlap

Page 65: Seok Hee Hong - Visual analytics of big data

bottleneck proteins

Page 66: Seok Hee Hong - Visual analytics of big data

Three Overlapping Networks

Page 67: Seok Hee Hong - Visual analytics of big data

GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

Page 68: Seok Hee Hong - Visual analytics of big data

6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp

No overlap

Page 69: Seok Hee Hong - Visual analytics of big data

3

aceE

Page 70: Seok Hee Hong - Visual analytics of big data

3

aceE

aceF

Page 71: Seok Hee Hong - Visual analytics of big data

3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

Page 72: Seok Hee Hong - Visual analytics of big data
Page 73: Seok Hee Hong - Visual analytics of big data

3

ptsG: overlap between 3 layers

Page 74: Seok Hee Hong - Visual analytics of big data

Propagation Animation in Diffusion Network

Page 75: Seok Hee Hong - Visual analytics of big data
Page 76: Seok Hee Hong - Visual analytics of big data