the future of big data visualization - universitetet i bergenii.uib.no › ... ›...

17
Helwig Hauser, University of Bergen The Future of Big Data Visualization ... also attributed to others, including the Danish poet Piet Hein: "det er svært at spå – især om fremtiden" [wikiquote.org/wiki/Niels_Bohr]

Upload: others

Post on 27-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

1

Helwig Hauser, University of Bergen

The Future of Big Data Visualization

... also attributed to others,

including the Danish poet Piet Hein:

"det er svært at spå – især om fremtiden"

[wikiquote.org/wiki/Niels_Bohr]

Page 2: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

2

Plan

Some context

– visualization, vis. in Bergen

– typical data visualization

Big data

– a definition

– a note on relevance

Big data visualization & its future

– big data visualization challenges

– upscaling visualization

– new approaches

Page 3: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

3

Visualization – from data / models / … to insight

Data, models, etc. visualization

EARTH

SCIENCES MEDICINE BIOLOGY ENGINEERING

Visualization & Vis.-Research at UiB

Visualization

– how to get insight into heterogeneous,

high-dimensional, time-dependent,

multi-modal, and/or large data (models)?

– CS research,

targeting interactive visual means, usually,

to aid the exploration, analysis, and presentation of data, etc.

Research group at UiB’s Dept. of Informatics (one of six) [1]

– profs. Helwig Hauser (>20 years in vis.) & Stefan Bruckner (>10)

– group established 2007 at UiB, currently 15 heads (Dec. 2016),

12 graduated visualization PhDs so far (since 2009)

– internationally recognized in visualization

– collaborative research with medicine, biology, geosciences, etc.

[1] www.ii.UiB.no/vis

Page 4: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

4

(Typical) Data Visualization (selected examples)

[D3js.org]

Tabular Data Visualization

Tableau (was Polaris), see Tableau.com

also Visplore by

Page 5: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

5

Graph Visualization

[van den Elzen, Holten, Blaas, van Wijk: Reducing Snapshots to Points: ... (IEEE VAST 2015)]

See also: Synerscope demo, youtu.be/SDb19tK0Mg0

Scientific Data Visualization

[Kohlmann, Bruckner, Kanitsar, Gröller: LiveSync: ... (IEEE TVCG 2007)]

Page 6: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

6

Big Data

[Blogs.Gartner.com/doug-laney/big-datas-10-biggest-vision-and-strategy-questions]

C++ is like teenage sex.

– it is on everyone's mind all the time.

– everyone talks about it all the time.

– everyone thinks everyone else is doing it.

– almost no one is really doing it.

– the few who are doing it are:

▪ doing it poorly,

▪ sure it will be better next time,

▪ not practicing it safely.

Grafitti in a toilet at the Faculty of comp.sci., Technion, IIT, Haifa, Israel; Nov. 8, 1993.

[panix.com/~clp/humor/computers/programming/OOP_jokes.html]

[WhatsThePont.com/2014/04/06/google-flu-big-data-and-the-woozle-effect]

Page 7: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

7

Big Data

What is “Big Data”?

– well, lots of data, right? … we come back to this...

– a new(?) buzz-word! … but a relevant one!

Examples

– big data from large sensor networks (Internet of Things, …)

– big data from healthcare (National Health Database, …)

Broadly used definition

– 3V-def.: “Big data” is high-volume, -velocity & -variety

information assets that demand cost-effective, innovative

forms of information processing for enhanced insight and

decision making. [Doug Laney, 2001 / Gartner]

Big Data, V#1: Volume

Big Data (usually) refers to a lot of data!

“Big data” refers to datasets

whose size is beyond the ability of typical database

software tools to capture, store, manage, and analyze.

[McKinsey Global Institute 2011]

Available data grows exponentially

– Exabytes of data available world-wide

• 1 EB = 1000 PB = 1 million TB = 1 billion GB

• hundreds of EB transferred via the Internet, annually

• EB of new information stored, annually

Page 8: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

8

Big Data, V#2: Variety

Big Data beyond numbers

– text, images & sound, relational data, …

unstructured data

– 30 billion pieces of information on Facebook per month!

400 million tweets per day

4 billion hours of videos are watched on YouTube / month

>400 million wearable, wireless health monitors

– Daniel Keim, 2007: 100 million FedEx transactions per day,

150 million VISA credit card transactionen per day,

300 million long distance calls in ATT’s network per day,

50 billion e-mails worldwide per day, 600 billion IP packets

per day DE-CIX backbone

Dark Data: available, but unused data

[MultiVis.net]

Big Data, V#3: Velocity

Real-time Big Data / Streaming Data Analysis, but also

– rapidly changing data

– data at different speeds and uneven rates (bursts)

Big Data – a moving target!

– lots of generated information cannot be stored!

• 90% of health care data is discarded (videos, etc.)

Page 9: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

9

Big Data, V#4(?): Veracity [IBM, …]

Uncertain / low-quality data

– >$3 trillion loss to US economy

due to bad data quality

– high degree of uncertainty

D. Laney blogs:

– Batman on Big Data:

Even more Vs: [K. Normandeau]

– validity: the right data for the right decisions?

– volatility: when valid, storing for how long, etc.?

Page 10: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

10

Big Data in Practice

Big data is

– generated, aggregated, analyzed, and consumed

– sensed, collected (networks), stored (cloud), and analyzed

(machine learning, visualization, …)

– process-mediated (“nicer” data),

machine-generated (Internet of Things),

human-sourced (from messages to videos)

[EMA 2013, Operationalizing the Buzz]

Big Data – Challenges & Opportunities

Selected Challenges

– shortage of Big Data talent (up to 200.000 needed in the US

plus 1.5 million «data-savvy» managers)

– contextualization of Big Data – Big Data needs to be

complimented by Big Judgment [Harvard Business Review]

– prediction difficult without theory

Selected Opportunities

– annually $300 billion to the US health care system, incl. cost savings up to 8%

– annually $250 billion to the European public sector adm.

– job opportunity (analysts, managers, et al.)!

[McK

inse

y G

I, 2011]

Page 11: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

11

Big Data Visualization

Big Data Visualization

A theme along with Big Data

– often information graphics,

or even information design

– tools (many not really suited for big data)

Also (new) a topic

in Vis.-Research

[inspire.blufra.me/big-data-visualization-review-of-the-20-best-tools]

Page 12: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

12

Big Data Vis.-Challenges

Big data visualization ≠ straight-forward

– significant practical challenges, incl.

• technical limitations (memory, bandwidth, ...)

• distributed out-of-core solutions (cloud, ...)

• managing big data architectures (Hadoop, ...)

• data wrangling (tremendous challenge!)

– lots of actual visualization challenges, incl.

• technical challenges (too big, too fast, ...)

• conceptual challenges (invalid principles, ...)

• user-side challenges (perceptual, cognitive, ...)

User-side Challenges

Physiological, perceptual, and cognitive “limitations”

– the power of our eyes (if we are so lucky):

• about 120M rods (luminance)

and 6–7M cones (color),

unevenly distributed

• about 10–20Hz (–30–90Hz)

– visual perception

• pre-attentive processing

• optical illusions

(Gestalt laws)

– cognition

• memory

• literacy

[Purves et al.; Neuroscience, 2nd ed., 2001: Anatomical Distribution of Rods and Cones]

[csc2.ncsu.edu/faculty/healey/PP]

[Adelson's checker shadow illusion]

Page 13: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

13

Technical Challenges

Visualization technology (SW, HW) not fast enough, ...

– hardware limitations:

• not enough memory

• too slow computer

• not enough pixels

– software limitations:

• not optimized (not parallel, no GPU, ...)

• too complex (super-linear complexity, ...) // many algorithms are O(n log n), O(n2), or more

[WILDER ultra-high-resolution wall: 540K x 15K = 8,100M pixels]

[algorithms research @ UiB.no]

Conceptual Challenges

Invalid visualization principles

– conceptual scaling problems (in addition to technical scaling problems)

• techniques (OK for medium-sized data)

break down for large or huge data

“Parallel Coordinates”: “Graph Visualization”:

[xkcd.com]

Page 14: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

14

Upscaling Visualization

Meeting big data visualization challenges

– technological optimization

• optimized software

• parallelization, etc.

– adapting visualization technology

• semi-transparency against overplotting

• edge bundling for large graphs

– frequency-based visualization

• binned plots

• clustered data

[>1M data points]

Page 15: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

15

First Attempts,

including

– VisReduce: … (Im et al.);

150M records, >100dims.

– Visualizing Big SPH Sim. (Reichl et al.);

10 billion points

– Typograph: …

(Endert et al.);

all of Wikipedia

– Egocentric Storylines (Muelder et al.);

>10k nodes

Visualization Research & Big Data

First Attempts,

including

– VisReduce: … (Im et al.);

150M records, >100dims.

– Visualizing Big SPH Sim. (Reichl et al.);

10 billion points

– Typograph: …

(Endert et al.);

all of Wikipedia

– Egocentric Storylines (Muelder et al.);

>10k nodes

Visualization Research & Big Data

Page 16: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

16

Big Data Visualization Beyond Upscaling

More radical approaches

– alternative representations (multi-scale? like the log-log plot?)

– hybrid approaches, integrating

computational analysis & vis.

– hierarchical exploration/analysis

– task-based abstraction

[«Body Size and Metabolic Rate» by M. Kleiber,

Physiological Reviews, 1947]

[Kehrer, Filzmoser, Hauser: Brushing Moments

in Interactive Visual Analysis; CGF 2010]

Conclusions

Big Data Visualization

– great chance to think outside the box (alternative representations, multi-scale approaches, ...)

– great chance to cooperate with complementary approaches

(statistics, databases, machine learning, ...)

– great chance to understand fundamental limitations

(discrepancy big data ↔ human perception/cognition, ...)

– great chance to engage with the new field of data science

(cohort studies, ensemble datasets, ...)

– great chance to think many vs. really many

(multi- vs. high-dimensional data, ...)

– no big judgement without big data visualization

Page 17: The Future of Big Data Visualization - Universitetet i Bergenii.uib.no › ... › 2017-01-26--VCT2017--BigDataVisFuture... · Big Data – Challenges & Opportunities Selected Challenges

17

Acknowledgements (again)

VRVis

Great students & collaborators

Bergen VisGroup

see www.ii.UiB.no/vis/about/jobs.html