the future of big data visualization - universitetet i bergenii.uib.no › ... ›...
TRANSCRIPT
1
Helwig Hauser, University of Bergen
The Future of Big Data Visualization
... also attributed to others,
including the Danish poet Piet Hein:
"det er svært at spå – især om fremtiden"
[wikiquote.org/wiki/Niels_Bohr]
2
Plan
Some context
– visualization, vis. in Bergen
– typical data visualization
Big data
– a definition
– a note on relevance
Big data visualization & its future
– big data visualization challenges
– upscaling visualization
– new approaches
3
Visualization – from data / models / … to insight
Data, models, etc. visualization
EARTH
SCIENCES MEDICINE BIOLOGY ENGINEERING
Visualization & Vis.-Research at UiB
Visualization
– how to get insight into heterogeneous,
high-dimensional, time-dependent,
multi-modal, and/or large data (models)?
– CS research,
targeting interactive visual means, usually,
to aid the exploration, analysis, and presentation of data, etc.
Research group at UiB’s Dept. of Informatics (one of six) [1]
– profs. Helwig Hauser (>20 years in vis.) & Stefan Bruckner (>10)
– group established 2007 at UiB, currently 15 heads (Dec. 2016),
12 graduated visualization PhDs so far (since 2009)
– internationally recognized in visualization
– collaborative research with medicine, biology, geosciences, etc.
[1] www.ii.UiB.no/vis
4
(Typical) Data Visualization (selected examples)
[D3js.org]
Tabular Data Visualization
Tableau (was Polaris), see Tableau.com
also Visplore by
5
Graph Visualization
[van den Elzen, Holten, Blaas, van Wijk: Reducing Snapshots to Points: ... (IEEE VAST 2015)]
See also: Synerscope demo, youtu.be/SDb19tK0Mg0
Scientific Data Visualization
[Kohlmann, Bruckner, Kanitsar, Gröller: LiveSync: ... (IEEE TVCG 2007)]
6
Big Data
[Blogs.Gartner.com/doug-laney/big-datas-10-biggest-vision-and-strategy-questions]
C++ is like teenage sex.
– it is on everyone's mind all the time.
– everyone talks about it all the time.
– everyone thinks everyone else is doing it.
– almost no one is really doing it.
– the few who are doing it are:
▪ doing it poorly,
▪ sure it will be better next time,
▪ not practicing it safely.
Grafitti in a toilet at the Faculty of comp.sci., Technion, IIT, Haifa, Israel; Nov. 8, 1993.
[panix.com/~clp/humor/computers/programming/OOP_jokes.html]
[WhatsThePont.com/2014/04/06/google-flu-big-data-and-the-woozle-effect]
7
Big Data
What is “Big Data”?
– well, lots of data, right? … we come back to this...
– a new(?) buzz-word! … but a relevant one!
Examples
– big data from large sensor networks (Internet of Things, …)
– big data from healthcare (National Health Database, …)
Broadly used definition
– 3V-def.: “Big data” is high-volume, -velocity & -variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making. [Doug Laney, 2001 / Gartner]
Big Data, V#1: Volume
Big Data (usually) refers to a lot of data!
“Big data” refers to datasets
whose size is beyond the ability of typical database
software tools to capture, store, manage, and analyze.
[McKinsey Global Institute 2011]
Available data grows exponentially
– Exabytes of data available world-wide
• 1 EB = 1000 PB = 1 million TB = 1 billion GB
• hundreds of EB transferred via the Internet, annually
• EB of new information stored, annually
8
Big Data, V#2: Variety
Big Data beyond numbers
– text, images & sound, relational data, …
unstructured data
– 30 billion pieces of information on Facebook per month!
400 million tweets per day
4 billion hours of videos are watched on YouTube / month
>400 million wearable, wireless health monitors
– Daniel Keim, 2007: 100 million FedEx transactions per day,
150 million VISA credit card transactionen per day,
300 million long distance calls in ATT’s network per day,
50 billion e-mails worldwide per day, 600 billion IP packets
per day DE-CIX backbone
Dark Data: available, but unused data
[MultiVis.net]
Big Data, V#3: Velocity
Real-time Big Data / Streaming Data Analysis, but also
– rapidly changing data
– data at different speeds and uneven rates (bursts)
Big Data – a moving target!
– lots of generated information cannot be stored!
• 90% of health care data is discarded (videos, etc.)
9
Big Data, V#4(?): Veracity [IBM, …]
Uncertain / low-quality data
– >$3 trillion loss to US economy
due to bad data quality
– high degree of uncertainty
D. Laney blogs:
– Batman on Big Data:
Even more Vs: [K. Normandeau]
– validity: the right data for the right decisions?
– volatility: when valid, storing for how long, etc.?
10
Big Data in Practice
Big data is
– generated, aggregated, analyzed, and consumed
– sensed, collected (networks), stored (cloud), and analyzed
(machine learning, visualization, …)
– process-mediated (“nicer” data),
machine-generated (Internet of Things),
human-sourced (from messages to videos)
[EMA 2013, Operationalizing the Buzz]
Big Data – Challenges & Opportunities
Selected Challenges
– shortage of Big Data talent (up to 200.000 needed in the US
plus 1.5 million «data-savvy» managers)
– contextualization of Big Data – Big Data needs to be
complimented by Big Judgment [Harvard Business Review]
– prediction difficult without theory
Selected Opportunities
– annually $300 billion to the US health care system, incl. cost savings up to 8%
– annually $250 billion to the European public sector adm.
– job opportunity (analysts, managers, et al.)!
[McK
inse
y G
I, 2011]
11
Big Data Visualization
Big Data Visualization
A theme along with Big Data
– often information graphics,
or even information design
– tools (many not really suited for big data)
Also (new) a topic
in Vis.-Research
[inspire.blufra.me/big-data-visualization-review-of-the-20-best-tools]
12
Big Data Vis.-Challenges
Big data visualization ≠ straight-forward
– significant practical challenges, incl.
• technical limitations (memory, bandwidth, ...)
• distributed out-of-core solutions (cloud, ...)
• managing big data architectures (Hadoop, ...)
• data wrangling (tremendous challenge!)
– lots of actual visualization challenges, incl.
• technical challenges (too big, too fast, ...)
• conceptual challenges (invalid principles, ...)
• user-side challenges (perceptual, cognitive, ...)
User-side Challenges
Physiological, perceptual, and cognitive “limitations”
– the power of our eyes (if we are so lucky):
• about 120M rods (luminance)
and 6–7M cones (color),
unevenly distributed
• about 10–20Hz (–30–90Hz)
– visual perception
• pre-attentive processing
• optical illusions
(Gestalt laws)
– cognition
• memory
• literacy
[Purves et al.; Neuroscience, 2nd ed., 2001: Anatomical Distribution of Rods and Cones]
[csc2.ncsu.edu/faculty/healey/PP]
[Adelson's checker shadow illusion]
13
Technical Challenges
Visualization technology (SW, HW) not fast enough, ...
– hardware limitations:
• not enough memory
• too slow computer
• not enough pixels
– software limitations:
• not optimized (not parallel, no GPU, ...)
• too complex (super-linear complexity, ...) // many algorithms are O(n log n), O(n2), or more
[WILDER ultra-high-resolution wall: 540K x 15K = 8,100M pixels]
[algorithms research @ UiB.no]
Conceptual Challenges
Invalid visualization principles
– conceptual scaling problems (in addition to technical scaling problems)
• techniques (OK for medium-sized data)
break down for large or huge data
“Parallel Coordinates”: “Graph Visualization”:
[xkcd.com]
14
Upscaling Visualization
Meeting big data visualization challenges
– technological optimization
• optimized software
• parallelization, etc.
– adapting visualization technology
• semi-transparency against overplotting
• edge bundling for large graphs
– frequency-based visualization
• binned plots
• clustered data
[>1M data points]
15
First Attempts,
including
– VisReduce: … (Im et al.);
150M records, >100dims.
– Visualizing Big SPH Sim. (Reichl et al.);
10 billion points
– Typograph: …
(Endert et al.);
all of Wikipedia
– Egocentric Storylines (Muelder et al.);
>10k nodes
Visualization Research & Big Data
First Attempts,
including
– VisReduce: … (Im et al.);
150M records, >100dims.
– Visualizing Big SPH Sim. (Reichl et al.);
10 billion points
– Typograph: …
(Endert et al.);
all of Wikipedia
– Egocentric Storylines (Muelder et al.);
>10k nodes
Visualization Research & Big Data
16
Big Data Visualization Beyond Upscaling
More radical approaches
– alternative representations (multi-scale? like the log-log plot?)
– hybrid approaches, integrating
computational analysis & vis.
– hierarchical exploration/analysis
– task-based abstraction
[«Body Size and Metabolic Rate» by M. Kleiber,
Physiological Reviews, 1947]
[Kehrer, Filzmoser, Hauser: Brushing Moments
in Interactive Visual Analysis; CGF 2010]
Conclusions
Big Data Visualization
– great chance to think outside the box (alternative representations, multi-scale approaches, ...)
– great chance to cooperate with complementary approaches
(statistics, databases, machine learning, ...)
– great chance to understand fundamental limitations
(discrepancy big data ↔ human perception/cognition, ...)
– great chance to engage with the new field of data science
(cohort studies, ensemble datasets, ...)
– great chance to think many vs. really many
(multi- vs. high-dimensional data, ...)
– no big judgement without big data visualization
17
Acknowledgements (again)
VRVis
Great students & collaborators
Bergen VisGroup
see www.ii.UiB.no/vis/about/jobs.html