wt-4065, superconductor: gpu web programming for big data visualization, by leo meyerovich and...

Post on 25-May-2015

584 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by Leo Meyerovich and Matthew Torok at the AMD Developer Summit (APU13) Nov. 11-13, 2013.

TRANSCRIPT

A Browser Framework for Visualizing Big Data

Leo  Meyerovich,  Ma.  Torok,  Ras  Bodik  @LMeyerov  UC  Berkeley  /  Graphistry  

SUPERCONDUCTOR  

1  

Why Big Data Visualization?

Yes No

3  

Analysis  Result:    No  

Histogram of Voter Turnout by Town

4  

most towns had a 40% voter

turnout

0% 25% 50% 75% 100% Voter turnout

# Towns

who’s ballot stuffing?

Tree Map Demo

Ex: Time Series in IBM’s IT Monitor

GE Demo

parse

selectors

layout

render

Browser Engine ~= Chart Engine!

DSLs

Exploit Parallelism in Each One

layout

render

selectors

Deploy Today via Parallel JavaScript

HTML  data  CSS  styling  JS  script  

Pixels  

Parser  

Selectors  

Layout  

Renderer  JavaScrip

t  VM  

Renderer.GL  

Parser.js  webpage  

9  

Layout.CL  

Selectors.CL  GPU  

superconductor.js  

data  styling  widgets  

data  viz  

Data  stays  on  GPU!  

Compiler  

DSL 1: Data via JSON

10  

JavaScript, Ruby, Python, Java, …

Easy… until 1-10s data loading

Parsing Demo

11  

span    b    {  width:  83%  }                      div    .dog    {  float:  leJ  }                      p    ,    span  b    {  font-­‐size:  7px  }    

DSL 2: Designers Selectors

<div>  

<p>  <span>  

<img  class=“dog”>  <b>  

12  

<b>  

<i>   <b>  

13  

span    b    {  width:  83%  }                      div    .dog    {  float:  leJ  }                      p    ,    span  b    {  font-­‐size:  7px  }    

Problem: O(sels * tree log tree )

<div>  

<p>  <span>  

<img  class=“dog”>  <b>  

<b>  

<i>   <b>  

<span>  

1K-100K HTML nodes

1-10K selectors

×

Good News: Embarrassing Parallelism!

<div>  

<p>  <span>  

<img  class=“dog”>  <b>  

14  

<b>  

<i>   <b>  

span    b    {  width:  83%  }                      div    .dog    {  float:  leJ  }                      p    ,    span  b    {  font-­‐size:  7px  }    

Selector Engine Implementation

selectors.css

selectors.webcl

compiler.js

Dynamic Animation! edit style at runtime then recompile

DSL 3: Layout

CSS

parallelizable layout

JS

flexible compute

FTL parallelizable compute in declarative layout

Step  1/2:  Schema  of  VisualizaYon  

Tree class hierarchy

Node attributes

17  

x  y   x  y  

y  

y  

y  

w  h  

w  h  

x   x  

x  

h  w  

Step 2/2: Schema Attribute Constraints

10px  5px  

Root  

HBox  

Leaf  Leaf  

Leaf  Leaf  

HBox  

w  

x  y  

h  w  

h  w  h  

inputs  vars  

[Kastens  1980,  Saraiva  2003]  [WWW  2010,  PPOPP  2013]  

2.  Single-­‐assignment  

HBox ! left=IBox right=IBox w := left.w + right.w …

1.  Local    

18  

Leaf  

Compiler Output: Layout as Tree Traversals

w,h   w,h  

w,h  w,h  w,h  

w,h   x,y   …  

1. Works for all data sets 2. Compiler automatically parallelizes!

[WWW  2010]  

logical  joins  

logical  spawns  

Parallel

Parallelism in each traversal!  

19  

Mozilla, Microsoft

DSL 4: Rendering as a Layout Extension

HBox ! left=IBox right=IBox @render @Rectangle(x,y,w,h,color) … w := left.w + right.w …

parallel for loop (level synchronous)

Traversals: Flattened & Level-Synchronous

level  1  

Tree

level  n  

w h x y

Nodes in arrays

Array per attribute

Compiler automates code + data transformations.

[Blelloch  93]  

21  

circ(…)

Problem: Dynamic Memory Allocation on GPU?

square(…) rect(…); …

line(…); …

rect(…); …

oval(…)

22  

1.0 0.8 0.5 0.2 0 0.2

function circ(x,y,r) { buffer = new Array(r*10) for (i = 0; i < r * 10; i++) buffer[i] = Math.cos(i) } dynamic allocation"

Dynamic Allocation as SIMD Traversals

allocCirc(…)à 4 allocRect(…)! 6

allocLine(…)! 6

allocRect(…)! 7

fillCirc(…)

fillRect(…)

fillLine(…)

fillRect(…)

1. Prefix sum for needed space 2. Allocate buffers

3. Fill vertex buffers in parallel 4. Give OpenGL buffers

pointer 23  

1.0 0.8 0.5 0.2 0 0.2

1.0 0.8 0.5 0.2

1.0 0.8 0.5 0.2 0 0.2

1

10

100

1,000

10,000

layout (4 passes) rendering pass TOTAL

Tim

e (m

s)

Naïve JS (Chrome 26) GPU (Safari + WebCL 11/3) 24fps

CPU vs. GPU for Election Treemap: 5 traversals over 100K nodes

24  

WebCL: 31X

WebCL: 5X

COMBINED: 54X !

DSLs for Big Data Visualization, Today.

Superconductor

•  Explore data with interactive visualization

•  Script charts like web pages: DSLs!

•  Hardware accelerate each DSL

•  We use WebCL:

GPGPU, keeps data on GPU, dynamic

compilation

Find us!

sc-lang.com Leo: @LMeyerov / LMeyerov@gmail.com Matt: mtorok@berkeley.edu

Extra  

Parsing Demo

29  

Optimizing JSON Parsing

30  

raw.json: 23MB

compress + zip csr1.zip (0.2MB), …, csr12.zip server  

browser  

Parallel parsing easy! … when you fix the format

big JavaScript object

Each worker: 1.  native JSON parse # csr.json 2.  decompress # obj.json 3. 0-copy return: typed arrays!

parallel parse parallel parse parallel parse

partition raw1.json(1.9MB), …, raw12.json

top related