seok hee hong - visual analytics of big data

Post on 10-May-2015

524 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Recent technological advances have led to the production of a big data, and consequently have led to many massive complex network models in many domains including science and engineering. Examples include biological networks such as phylogenetic network, gene regulatory network, metabolic pathways, biochemical network and protein‐protein interaction networks. Other examples are social networks such as facebook network, twitter network, linked‐in network, telephone call network, patent network, citation network and collaboration network. Visualization is an effective analysis tool for such networks. Good visualization reveals the hidden structure of the networks and amplifies human understanding, thus leading to new insights, new findings and predictions. However, constructing good visualization of big data can be very challenging. In this talk, I will present a framework for visual analytics of big data. Visual Analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Our framework is based on the tight integration of network analysis methods with visualization methods to address the scalability and complexity issues. I will present a number of case studies using various networks derived from big data, in particular social networks and biological networks. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

TRANSCRIPT

Visual Analytics of

Big Data

Seok-Hee Hong

University of Sydney

Bioinformatics Winter School 2014

Big Data and

The Scale Problem

Social networks: Facebook users

2004 2005 2006 2007

50M

40M

30M

20M

10M

5M

0

Biological networks: KEGG database

1982 1988 1994 2000 2006

108

107

106

105

104

103

102

Internet Movie Data Base

Year 1937

1995

The scale problem Data sets are growing much faster than

computing systems/tools to analyse them.

Existing algorithms/methods do not scale

well enough to be efficient/effective on the big data sets.

Big Graph/Network

Erdos networks Lincoln Lu

Visual Analytics

Good visualisation can enable users:

to understand the structure

to discover new knowledge/insight

to find regular/abnormal patterns/behavior

to generate/confirm/reject hypothesis

to confirm expected and discover unexpected

to reveal the hidden truth

to predict the future

Visual Analytics

Visual Data Mining

Key Scientific Challenge

1. Scalability

2. Visual Complexity

3. Domain Complexity

Visual Analysis Framework for Big Graph

Big Data Graph Picture

interaction

visualisation analysis

GEOMI (Geometry for Maximum Insight)

Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney

GEOMI (GEOmetry for Maximum Insight)

Network Analysis

Interaction

Graph Layout

GEOMI Features Network/graph generator

Scale-free networks Clustered graph Hierarchical graph

Network analysis

Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness

Group analysis: blockmodelling, clustering, k-core, structural equivalence

Graph algorithms: filtering, shortest path, giant component Interaction/Navigation

Zoom, panning, rotation Selection Graph layout interaction/navigation Animation Head gesture interaction

Graph/Network Layout Node-link representation

Trees Planar graphs General undirected graphs Directed graphs Clustered graphs Hierarchical graphs Scale-free networks Dynamic/Temporal networks Multi-relational networks Multi-variate networks Overlapping networks

Map representation Tree/Radial tree map Voronoi map Temporal map

Hybrid representation

Interaction with Cool Toys

IMDB (Internet Movie Data Base) Network Analysis

Kevin Bacon Network

Days of Thunder (1990)

Far and Away (1992) A Few Good Man

Hollywood Movie Actor Collaboration Network

Kevin Bacon Network

IMDB (Internet Movie DataBase)

Kevin Bacon

Tom Cruise:

Bacon #1

Nicole Kidman: Bacon#2

Evolution of Kevin Bacon Network

GD05: Evolution of IMDB Kevin Bacon #1: 2000

WOS (Web of Science) Analysis

Social Network Co-citation Network

Evolution of Co-citation Network in WOS

co-citation network of year 2003

co-citation network of Year 2006

Information Visualisation

Network Analysis

Evolution of research area

Info Vis Collaboration Network

Email Network Virus Detection

History of World Cup

World Cup 2002

Edge Bundling with centrality analysis & k-core analysis

US Airline Network Analysis

Integration with Clustering

Clustered Graph Layout

Metabolic Pathway Visualisation

GO-defined Protein Interaction Network

2.5D Scale-free Network Visualisation

Scale-free Network

[Barabasi and Albert 99] Exponential Growth Preferential attachment

Properties Power-law degree distribution Sparse, but locally dense Small-world property: O(loglogn) average path length High clustering coefficient Resilient to random attack, but vulnerable to designed

attack Examples: Webgraph Social networks Biological networks

Parallel Plane/Concentric Sphere Layout

G1

G3

G2

G1

G3

G2

PPI networks Hawoong Jeong

Visualisation of Patterns

Motif

Overlapping Network Visualisation for

Integrated Analysis

protein-gene interactions

protein-protein interactions

PROTEOME

GENOME

Citrate Cycle

METABOLISM

Bio-chemical reactions

Two Overlapping Networks

Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli

9 overlap

1-neighborhood network

Gene Regulatory Network [RegulonDB] and PPI: E. Coli

periphery proteins

6 hubs: no overlap

bottleneck proteins

Three Overlapping Networks

GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp

No overlap

3

aceE

3

aceE

aceF

3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)

3

ptsG: overlap between 3 layers

Propagation Animation in Diffusion Network

top related