khronos webcl · 2. create a context to associate opencl devices 3. create programs for execution...

28
© Copyright Khronos Group 2012 | Page 1 Khronos WebCL Enabling OpenCL Acceleration of Web Applications Tasneem Brutch [email protected] Chair WebCL Working Group Samsung Electronics

Upload: others

Post on 02-Oct-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 1

Khronos WebCL Enabling OpenCL Acceleration of

Web Applications Tasneem Brutch

[email protected] Chair WebCL Working Group

Samsung Electronics

Page 2: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 2

WebCL Overview • WebCL (Web Computing Language) is a JavaScript binding to OpenCL

• High performance:

- Enables acceleration of compute-intensive web applications, through high

performance parallel processing

• Platform independent:

- Portable and efficient access to heterogeneous multicore devices

- Provides single and coherent standard across desktop and mobile devices

• Currently being standardized by Khronos

Page 3: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 3

WebCL Standardization • Standardize portable and efficient access to heterogeneous multicore

devices from web content

• Define ECMAScript API for interaction with OpenCL

• Designed for security and robustness

• Use of OpenCL Embedded Profile

• Interoperability between WebCL and WebGL

• Initialization simplified for portability

Page 4: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 4

Contents • Motivation and Goals

• Standardization

- Khronos WebCL Working Group

- WebCL robustness and security

- Standardization status

• WebCL

- OpenCL Introduction

- Programming WebCL

- Robustness and Security

- WebCL prototype implementations

• Demo

• Conclusion

- Summary and Resources

Page 5: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 5

Motivation • New mobile apps with high compute demands

- Big Data Visualization

- Augmented Reality

- Video Processing

- Computational Photography

- 3D Games

• Popularity of web-centric platforms

Page 6: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 6

OpenCL Features Overview • C-based cross-platform programming interface

• Kernels: subset of ISO C99 with language extensions

• Run-time or build-time compilation of kernels

• Well-defined numerical accuracy

- (IEEE 754 rounding with specified maximum error)

• Rich set of built-in functions: cross, dot, sin, cos, pow, log …

Page 7: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 7

OpenCL Platform Model • A host is connected to one or more OpenCL devices

• OpenCL device:

- A collection of one or more compute units - (~ cores)

- A compute unit is composed of one or more processing elements - (~ threads)

- Processing element code execution: - As SIMD or SPMD

Page 8: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 8

OpenCL Execution Model • Kernel:

- Basic unit of executable code, similar to a C function

- Data-parallel or task-parallel

• Program:

- Collection of kernels and functions called by kernels

- Analogous to a dynamic library (run-time linking)

• Command Queue:

- Applications queue kernels and data transfers

- Performed in-order or out-of-order

• Work-item:

- An execution of a kernel by a processing element (~ thread)

• Work-group:

- Collection of related work-items executing on a single compute unit (~ core)

Page 9: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 9

Work-group Example

# work-items = # pixels

# work-groups = # tiles

work-group size = tile width * tile height

Page 10: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 10

OpenCL Memory Model • Memory types:

- Private memory (blue)

- Per work-item

- Local memory (green)

- At least 32KB split into blocks, each available

to any work-item in a given work-group

- Global/Constant memory (orange)

- Not synchronized

- Host memory (red)

- On the CPU

• Memory management is explicit:

- Application must move data from

host global local and back

Page 11: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 11

OpenCL Kernels • OpenCL kernel

- Defined on an N-dimensional computation domain

- Execute a kernel at each point of the computation domain

kernel void vectorMult(

global const float* a,

global const float* b,

global float* c)

{

int id = get_global_id(0);

c[id] = a[id] * b[id];

}

void vectorMult(

const float* a,

const float* b,

float* c,

const unsigned int count)

{

for(int i=0; i<count; i++)

c[i] = a[i] * b[i];

}

Traditional Loop Data Parallel OpenCL

Page 12: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 12

Executing OpenCL Programs 1. Query host for OpenCL devices

2. Create a context to associate OpenCL

devices

3. Create programs for execution on one or

more associated devices

4. From the programs, select kernels to

execute

5. Create memory objects accessible from

the host and/or the device

6. Copy memory data to the device as

needed

7. Provide kernels to the command queue for

execution

8. Copy results from the device to the host

12

Context

Programs Kernels Memory Objects

Command Queue

Send for execution

Programs

Kernel0

Kernel1

Kernel2

Images

Buffers In order & out of order

Create data & arguments Compile

Page 13: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 13

Programming WebCL • Initialization

• Creating kernel

• Running kernel

• WebCL image object creation

- From Uint8Array

- From <img>, <canvas>, or <video>

- From WebGL vertex buffer

- From WebGL texture

• WebGL vertex animation

• WebGL texture animation

13

Page 14: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 14

WebCL Initialization

<script>

var platforms = WebCL.getPlatforms();

var devices = platforms[0].getDevices(WebCL.DEVICE_TYPE_GPU);

var context = WebCL.createContext({ WebCLDevice: devices[0] } );

var queue = context.createCommandQueue();

</script>

14

Page 15: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 15

Creating WebCL Kernel <script id="squareProgram" type="x-kernel">

__kernel square(

__global float* input,

__global float* output,

const unsigned int count)

{

int i = get_global_id(0);

if(i < count)

output[i] = input[i] * input[i];

}

</script>

<script>

var programSource = getProgramSource("squareProgram"); //JavaScript function using DOM

APIs

var program = context.createProgram(programSource);

program.build();

var kernel = program.createKernel("square");

</script>

OpenCL C

(based on C99)

Page 16: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 16

Running WebCL Kernels <script>

var inputBuf = context.createBuffer(WebCL.MEM_READ_ONLY, Float32Array.BYTES_PER_ELEMENT * count);

var outputBuf = context.createBuffer(WebCL.MEM_WRITE_ONLY, Float32Array.BYTES_PER_ELEMENT * count);

var data = new Float32Array(count);

// populate data …

queue.enqueueWriteBuffer(inputBuf, data, true); //Last arg indicates blocking API

kernel.setKernelArg(0, inputBuf);

kernel.setKernelArg(1, outputBuf);

kernel.setKernelArg(2, count, WebCL.KERNEL_ARG_INT);

var workGroupSize = kernel.getWorkGroupInfo(devices[0], WebCL.KERNEL_WORK_GROUP_SIZE);

queue.enqueueNDRangeKernel(kernel, [count], [workGroupSize]);

queue.finish(); //This API blocks

queue.enqueueReadBuffer(outputBuf, data, true); //Last arg indicates blocking API

</script>

Page 17: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 17

WebCL Image Object Creation

From Uint8Array ()

<script>

var bpp = 4; // bytes per pixel

var pixels = new Uint8Array(width * height * bpp);

var pitch = width * bpp;

var clImage = context.createImage(WebCL.MEM_READ_ONLY,

{channelOrder:WebCL.RGBA, channelType:WebCL.UNORM_INT8,

size:[width, height], pitch:pitch } );

</script>

From <img> or <canvas> or <video>

<script>

var canvas = document.getElementById("aCanvas");

var clImage = context.createImage(WebCL.MEM_READ_ONLY, canvas);

// format, size from element

</script>

Page 18: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 18

WebCL: Vertex Animation • Vertex Buffer Initialization: <script>

var points = new Float32Array(NPOINTS * 3);

var glVertexBuffer = gl.createBuffer();

gl.bindBuffer(gl.ARRAY_BUFFER, glVertexBuffer);

gl.bufferData(gl.ARRAY_BUFFER, points, gl.DYNAMIC_DRAW);

var clVertexBuffer = context.createFromGLBuffer

(WebCL.MEM_READ_WRITE, glVertexBuffer);

kernel.setKernelArg(0, NPOINTS, WebCL.KERNEL_ARG_INT);

kernel.setKernelArg(1, clVertexBuffer);

</script>

• Vertex Buffer Update and Draw: <script>

function DrawLoop() {

queue.enqueueAcquireGLObjects([clVertexBuffer]);

queue.enqueueNDRangeKernel(kernel, [NPOINTS], [workGroupSize]);

queue.enqueueReleaseGLObjects([clVertexBuffer]);

gl.bindBuffer(gl.ARRAY_BUFFER, glVertexBuffer);

gl.clear(gl.COLOR_BUFFER_BIT);

gl.drawArrays(gl.POINTS, 0, NPOINTS);

gl.flush();

}

</script>

Web CL

Web GL

Web GL

Web CL

Page 19: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 19

WebCL: Texture Animation • Texture Initialization: <script>

var glTexture = gl.createTexture();

gl.bindTexture(gl.TEXTURE_2D, glTexture);

gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, image);

var clTexture = context.createFromGLTexture2D(WebCL.MEM_READ_WRITE, gl.TEXTURE_2D, glTexture);

kernel.setKernelArg(0, NWIDTH, WebCL.KERNEL_ARG_INT);

kernel.setKernelArg(1, NHEIGHT, WebCL.KERNEL_ARG_INT);

kernel.setKernelArg(2, clTexture);

</script>

• Texture Update and Draw: <script>

function DrawLoop() {

queue.enqueueAcquireGLObjects([clTexture]);

queue.enqueueNDRangeKernel(kernel, [NWIDTH*NHEIGHT], [workGroupSize]);

queue.enqueueReleaseGLObjects([clTexture]);

gl.clear(gl.COLOR_BUFFER_BIT);

gl.activeTexture(gl.TEXTURE0);

gl.bindTexture(gl.TEXTURE_2D, glTexture);

gl.flush();

}

</script>

Page 20: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 20

WebCL Prototype Implementations • Prototype implementations:

- Nokia’s prototype: uses Firefox, open sourced May 2011 (LGPL)

- Samsung’s prototype: uses WebKit, open sourced June 2011 (BSD)

- Motorola’s prototype: uses node.js, open sourced April 2012 (BSD)

• Functionality:

- WebCL contexts

- Platform queries (clGetPlatformIDs, clGetDeviceIDs, etc.)

- Creation of OpenCL Context, Queue, and Buffer objects

- Kernel building

- Querying workgroup size (clGetKernelWorkGroupInfo)

- Reading, writing, and copying OpenCL buffers

- Setting kernel arguments

- Scheduling a kernel for execution

- CL-GL interoperability - clCreateFromGLBuffer

- clEnqueueAcquireGLObjects

- clEnqueueReleaseGLObjects

- Synchronization

- Error handling

Page 21: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 21

Robustness and Security • Security threats addressed:

- Information leakage from out of bounds memory access

- Denial of service from long running kernels

• Security requirements:

- Prevent out of range memory accesses (WiP - WebCL Kernel Validator) - Prevent out of bounds memory accesses

- Prevent out of bounds (OOB) memory writes

- OOB memory reads should not return any uninitialized value

- Memory initialization to prevent information leakage - Initialization of private and local memory objects

- Prevent potential Denial of Service from long running kernels - Ability to terminate contexts with long running kernels

- Prevent security vulnerabilities from undefined behavior - Undefined OpenCL behavior to be defined by WebCL

- Ensure strong conformance testing

- Prevent cross-origin information leakage - Prevent Cross Origin Resource Sharing to restrict media use by non-originating web site

Page 22: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 22

WebCL: Current Status • WebCL API definition and Conformance tests:

- Working draft released through [email protected]: Apr. 2012. WebCL

specification release: 1H, 2013

- Conformance framework and test suite for testing WebCL implementation

(contributed by Samsung) – on going

- Conformance tests for OpenCL 1.2 security and robustness extensions (contributed

by Samsung)

• Ratification of OpenCL 1.2 robustness/security extensions proposed by

WebCL WG

• Definition of WebCL kernel validator – in progress

• Strong interest in WebCL API:

- Participation by browser vendors, OpenCL hardware/driver vendors, application

developers

- Non-Khronos members engaged through public mailing list

Page 23: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 23

WebCL Demo: Large Dataset Visualization

23

Page 24: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 24

Latent Parallelism in Web Domain Specific Languages (DSLs)

Multicore [HotPar ‘09]

Multicore [WWW ‘10]

GPU Multicore [HotPar ‘12, PPOPP

’13]

GPU Multicore New!

Layout Solver

Page 25: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 25

Latent Parallelism in Web Domain Specific Languages (DSLs)

Web Workers

Web Workers

WebCL

WebGL

Layout Solver

Page 26: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 26

Synthesis DSL for Writing Layout Widgets

Specification of

Layout Language

Parallel Schedule

Synthesizer

Widget Specification

ex: treemap

Page 27: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 27

Large Dataset Visualization • Created a GPU-accelerated visualization language for real-time interactive

animations of over 100,000 nodes

• Use case: Real-time, interactive exploration of voting activities and trends,

using a data set of approximately 100,000 polling stations

• Approach: A GPU back-end was used to generate level-synchronous,

breadth-first tree traversal

• Results:

- Automated OpenCL parallelization on three data sets: - 10,000 nodes: 27.6 fps

- 100,000 nodes: 27.6 fps

- 1,000,000 nodes: 4.5 fps

- JavaScript could run maximum of 500 nodes, at 27 fps.

“Parallel Schedule Synthesis for Attribute Grammers”, ACM Principles and Practice of Parallel Programming, L. Meyerovich, M.

Torok, E. Atkinson, R. Bodik, Feb. 2013, Shenzen, China.

Page 28: Khronos WebCL · 2. Create a context to associate OpenCL devices 3. Create programs for execution on one or more associated devices 4. From the programs, select kernels to execute

© Copyright Khronos Group 2012 | Page 28

Summary and Resources • Summary:

- WebCL is a JavaScript binding to OpenCL, allowing web applications to leverage

heterogeneous computing resources

- WebCL enables significant acceleration of compute-intensive web applications

• WebCL Resources:

- Samsung’s open source WebCL prototype implementation available at:

http://code.google.com/p/webcl

- Khronos member WebCL distribution list: [email protected]

- Public WebCL distribution list: [email protected]

- WebCL public working draft: https://cvs.khronos.org/svn/repos/registry/trunk/public/webcl/spec/latest/index.html

- Samsung WebCL prototype: http://code.google.com/p/webcl

- Nokia prototype: http://webcl.nokiaresearch.com/