Transcript
Page 1: Seok Hee Hong - Visual analytics of big data

Visual Analytics of

Big Data

Seok-Hee Hong

University of Sydney

Bioinformatics Winter School 2014

Page 2: Seok Hee Hong - Visual analytics of big data

Big Data and

The Scale Problem

Page 3: Seok Hee Hong - Visual analytics of big data

Social networks: Facebook users

2004 2005 2006 2007

50M

40M

30M

20M

10M

5M

0

Page 4: Seok Hee Hong - Visual analytics of big data

Biological networks: KEGG database

1982 1988 1994 2000 2006

108

107

106

105

104

103

102

Page 5: Seok Hee Hong - Visual analytics of big data

Internet Movie Data Base

Year 1937

Page 6: Seok Hee Hong - Visual analytics of big data

1995

Page 7: Seok Hee Hong - Visual analytics of big data

The scale problem Data sets are growing much faster than

computing systems/tools to analyse them.

Existing algorithms/methods do not scale

well enough to be efficient/effective on the big data sets.

Page 8: Seok Hee Hong - Visual analytics of big data

Big Graph/Network

Page 9: Seok Hee Hong - Visual analytics of big data
Page 10: Seok Hee Hong - Visual analytics of big data

Erdos networks Lincoln Lu

Page 11: Seok Hee Hong - Visual analytics of big data
Page 12: Seok Hee Hong - Visual analytics of big data

Visual Analytics

Page 13: Seok Hee Hong - Visual analytics of big data

Good visualisation can enable users:

to understand the structure

to discover new knowledge/insight

to find regular/abnormal patterns/behavior

to generate/confirm/reject hypothesis

to confirm expected and discover unexpected

to reveal the hidden truth

to predict the future

Visual Analytics

Page 14: Seok Hee Hong - Visual analytics of big data

Visual Data Mining

Page 15: Seok Hee Hong - Visual analytics of big data

Key Scientific Challenge

1. Scalability

2. Visual Complexity

3. Domain Complexity

Page 16: Seok Hee Hong - Visual analytics of big data

Visual Analysis Framework for Big Graph

Big Data Graph Picture

interaction

visualisation analysis

Page 17: Seok Hee Hong - Visual analytics of big data

GEOMI (Geometry for Maximum Insight)

Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney

Page 18: Seok Hee Hong - Visual analytics of big data

GEOMI (GEOmetry for Maximum Insight)

Network Analysis

Interaction

Graph Layout

Page 19: Seok Hee Hong - Visual analytics of big data

GEOMI Features Network/graph generator

Scale-free networks Clustered graph Hierarchical graph

Network analysis

Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness

Group analysis: blockmodelling, clustering, k-core, structural equivalence

Graph algorithms: filtering, shortest path, giant component Interaction/Navigation

Zoom, panning, rotation Selection Graph layout interaction/navigation Animation Head gesture interaction

Page 20: Seok Hee Hong - Visual analytics of big data

Graph/Network Layout Node-link representation

Trees Planar graphs General undirected graphs Directed graphs Clustered graphs Hierarchical graphs Scale-free networks Dynamic/Temporal networks Multi-relational networks Multi-variate networks Overlapping networks

Map representation Tree/Radial tree map Voronoi map Temporal map

Hybrid representation

Page 21: Seok Hee Hong - Visual analytics of big data
Page 22: Seok Hee Hong - Visual analytics of big data

Interaction with Cool Toys

Page 23: Seok Hee Hong - Visual analytics of big data

IMDB (Internet Movie Data Base) Network Analysis

Kevin Bacon Network

Page 24: Seok Hee Hong - Visual analytics of big data

Days of Thunder (1990)

Far and Away (1992) A Few Good Man

Hollywood Movie Actor Collaboration Network

Kevin Bacon Network

IMDB (Internet Movie DataBase)

Page 25: Seok Hee Hong - Visual analytics of big data

Kevin Bacon

Tom Cruise:

Bacon #1

Nicole Kidman: Bacon#2

Page 26: Seok Hee Hong - Visual analytics of big data

Evolution of Kevin Bacon Network

Page 27: Seok Hee Hong - Visual analytics of big data

GD05: Evolution of IMDB Kevin Bacon #1: 2000

Page 28: Seok Hee Hong - Visual analytics of big data

WOS (Web of Science) Analysis

Social Network Co-citation Network

Page 29: Seok Hee Hong - Visual analytics of big data

Evolution of Co-citation Network in WOS

Page 30: Seok Hee Hong - Visual analytics of big data

co-citation network of year 2003

Page 31: Seok Hee Hong - Visual analytics of big data

co-citation network of Year 2006

Page 32: Seok Hee Hong - Visual analytics of big data

Information Visualisation

Network Analysis

Page 33: Seok Hee Hong - Visual analytics of big data

Evolution of research area

Page 34: Seok Hee Hong - Visual analytics of big data

Info Vis Collaboration Network

Page 35: Seok Hee Hong - Visual analytics of big data

Email Network Virus Detection

Page 36: Seok Hee Hong - Visual analytics of big data
Page 37: Seok Hee Hong - Visual analytics of big data

History of World Cup

Page 38: Seok Hee Hong - Visual analytics of big data
Page 39: Seok Hee Hong - Visual analytics of big data

World Cup 2002

Page 40: Seok Hee Hong - Visual analytics of big data

Edge Bundling with centrality analysis & k-core analysis

Page 41: Seok Hee Hong - Visual analytics of big data

US Airline Network Analysis

Page 42: Seok Hee Hong - Visual analytics of big data
Page 43: Seok Hee Hong - Visual analytics of big data
Page 44: Seok Hee Hong - Visual analytics of big data
Page 45: Seok Hee Hong - Visual analytics of big data
Page 46: Seok Hee Hong - Visual analytics of big data
Page 47: Seok Hee Hong - Visual analytics of big data

Integration with Clustering

Clustered Graph Layout

Page 48: Seok Hee Hong - Visual analytics of big data
Page 49: Seok Hee Hong - Visual analytics of big data

Metabolic Pathway Visualisation

Page 50: Seok Hee Hong - Visual analytics of big data

GO-defined Protein Interaction Network

Page 51: Seok Hee Hong - Visual analytics of big data

2.5D Scale-free Network Visualisation

Page 52: Seok Hee Hong - Visual analytics of big data

Scale-free Network

[Barabasi and Albert 99] Exponential Growth Preferential attachment

Properties Power-law degree distribution Sparse, but locally dense Small-world property: O(loglogn) average path length High clustering coefficient Resilient to random attack, but vulnerable to designed

attack Examples: Webgraph Social networks Biological networks

Page 53: Seok Hee Hong - Visual analytics of big data

Parallel Plane/Concentric Sphere Layout

G1

G3

G2

G1

G3

G2

Page 54: Seok Hee Hong - Visual analytics of big data

PPI networks Hawoong Jeong

Page 55: Seok Hee Hong - Visual analytics of big data
Page 56: Seok Hee Hong - Visual analytics of big data
Page 57: Seok Hee Hong - Visual analytics of big data

Visualisation of Patterns

Motif

Page 58: Seok Hee Hong - Visual analytics of big data
Page 59: Seok Hee Hong - Visual analytics of big data

Overlapping Network Visualisation for

Integrated Analysis

Page 60: Seok Hee Hong - Visual analytics of big data

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Page 61: Seok Hee Hong - Visual analytics of big data

Two Overlapping Networks

Page 62: Seok Hee Hong - Visual analytics of big data

Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli

9 overlap

1-neighborhood network

Page 63: Seok Hee Hong - Visual analytics of big data
Page 64: Seok Hee Hong - Visual analytics of big data

Gene Regulatory Network [RegulonDB] and PPI: E. Coli

periphery proteins

6 hubs: no overlap

Page 65: Seok Hee Hong - Visual analytics of big data

bottleneck proteins

Page 66: Seok Hee Hong - Visual analytics of big data

Three Overlapping Networks

Page 67: Seok Hee Hong - Visual analytics of big data

GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

Page 68: Seok Hee Hong - Visual analytics of big data

6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp

No overlap

Page 69: Seok Hee Hong - Visual analytics of big data

3

aceE

Page 70: Seok Hee Hong - Visual analytics of big data

3

aceE

aceF

Page 71: Seok Hee Hong - Visual analytics of big data

3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

Page 72: Seok Hee Hong - Visual analytics of big data
Page 73: Seok Hee Hong - Visual analytics of big data

3

ptsG: overlap between 3 layers

Page 74: Seok Hee Hong - Visual analytics of big data

Propagation Animation in Diffusion Network

Page 75: Seok Hee Hong - Visual analytics of big data
Page 76: Seok Hee Hong - Visual analytics of big data

Top Related