afds 2012 phil rogers keynote: the programmer’s guide to a universe of possibility

45
THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY Phil Rogers AMD Corporate Fellow Heterogeneous System Architecture

Upload: hsa-foundation

Post on 19-Jun-2015

33.654 views

Category:

Technology


2 download

DESCRIPTION

Phil Roger goes deeper into what HSA is, and some of the area it can address since his first presentation on HSA in 2011. He also announces the HSA Foundation and it founding members

TRANSCRIPT

Page 1: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

THE PROGRAMMER’S GUIDE TO A

UNIVERSE OF POSSIBILITY

Phil Rogers

AMD

Corporate Fellow

Heterogeneous System Architecture

Page 2: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

2 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

Most parallel code runs on CPUs designed for scalar workloads

Page 3: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

3 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

WHAT DID WE HEAR FROM TOM MALLOY THIS MORNING?

Page 4: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

4 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

CHANGING THE THINKING

Typically platform builders create innovative new

hardware and offer an API for software to access it

That tired thinking has only ever had niche success!

Page 5: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

5 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HETEROGENEOUS SYSTEM ARCHITECTURE ROADMAP

Page 6: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

6 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HETEROGENEOUS SYSTEM ARCHITECTURE Brings All the Processors in a System into Unified Coherent Memory

POWER EFFICIENT

EASY TO PROGRAM

FUTURE LOOKING

ESTABLISHED TECHNOLOGY FOUNDATION

OPEN STANDARD

INDUSTRY SUPPORT

Page 7: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

7 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

SOLUTION

PROBLEM

THE HSA OPPORTUNITY ON MODERN APPLICATIONS

Developer

Return (Differentiation in

performance,

reduced power,

features,

time to market)

Developer Investment (Effort, time, new skills)

Good user experiences

Historically, developers program CPUs

HSA + Libraries = productivity & performance with low power

Wide range of differentiated experiences

~4M apps

~10+M* CPU

coders

PROBLEM

Significant niche value

GPU/HW blocks hard to program

Not all workloads accelerate

~200 apps

~100K GPU

coders

Few 100Ks HSA apps

Few M HSA

coders

*IDC

Page 8: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

8 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

APPLICATION AREAS WITH ABUNDANT PARALLEL WORKLOADS

Biometric Recognition

Secure, fast, accurate: face, voice, fingerprints

Beyond HD Experiences

Streaming media, new codecs, 3D, transcode, audio

Augmented Reality

Superimpose graphics, audio, and other digital information as a virtual overlay

AV Content Management

Searching, indexing and tagging of video & audio. multimedia data mining

Natural UI & Gestures

Touch, gesture, and voice

Content Everywhere

Content from any source to any display seamlessly

Page 9: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

9 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HAAR Face Detection

CORNERSTONE TECHNOLOGY

FOR COMPUTERVISION

Page 10: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

11 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

LOOKING FOR FACES IN ALL THE RIGHT PLACES

Page 11: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

12 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

LOOKING FOR FACES IN ALL THE RIGHT PLACES

Quick HD Calculations

Search square = 21 x 21

Pixels = 1920 x 1080 = 2,073,600

Search squares = 1900 x 1060 = ~2 Million

Page 12: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

13 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

LOOKING FOR DIFFERENT SIZE FACES – BY SCALING THE VIDEO FRAME

Page 13: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

14 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

LOOKING FOR DIFFERENT SIZE FACES – BY SCALING THE VIDEO FRAME

More HD Calculations

70% scaling in H and V

Total Pixels = 4.07 Million

Search squares = 3.8 Million

Page 14: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

15 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

Feature l

Feature m

Feature p

Feature r

Feature q

HAAR CASCADE STAGES

Feature k

Stage N

Stage N+1

Face still possible? Yes

No

REJECT FRAME

Page 15: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

16 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

22 CASCADE STAGES, EARLY OUT BETWEEN EACH

STAGE 22 STAGE 21 STAGE 2 STAGE 1

NO FACE

FACE CONFIRMED

Final HD Calculations

Search squares = 3.8 million

Average features per square = 124

Calculations per feature = 100

Calculations per frame = 47 GCalcs

Calculation Rate

30 frames/sec = 1.4TCalcs/second

60 frames/sec = 2.8TCalcs/second

…and this only gets front-facing faces

Page 16: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

17 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

CASCADE DEPTH ANALYSIS

0

5

10

15

20

25Cascade Depth

20-25

15-20

10-15

5-10

0-5

Page 17: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

18 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES

Live

Dead

When running on the GPU, we run each search rectangle on a separate work item

Early out algorithms, like HAAR, exhibit divergence between work items

Some work items exit early

Their neighbors continue

SIMD packing suffers as a result

Page 18: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

19 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9-22

Tim

e (

ms)

Cascade Stage

“Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

GPU

CPU

PROCESSING TIME/STAGE

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,

6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

Page 19: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

20 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 22

Imag

es/S

ec

Number of Cascade Stages on GPU

“Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

CPU

HSA

GPU

PERFORMANCE CPU-VS-GPU

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,

6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

Page 20: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

21 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HAAR SOLUTION – RUN DIFFERENT CASCADES ON GPU AND CPU

By seamlessly sharing data between CPU and GPU,

HSA allows the right processor to handle its appropriate workload

+2.5x

-2.5x

INCREASED

PERFORMANCE DECREASED ENERGY

PER FRAME

Page 21: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

22 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

ACCELERATING MEMCACHED

CLOUD SERVER WORKLOAD

Page 22: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

23 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

DATACENTER WORKLOAD

Generally used for short-term storage and caching, handling requests

that would otherwise require database or file system accesses

Used by Facebook, YouTube, Twitter, Wikipedia, Flickr, and others

Effectively a large distributed hash table

Responds to store and get requests received over the network

Conceptually:

store(key, object)

object = get(key)

Page 23: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

24 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

100%

80%

60%

40%

20%

0 0

1

2

3

4

Key Look Up Performance Execution Breakdown

Data Transfer Execution

OFFLOADING MEMCACHED KEY LOOKUP TO THE GPU

T. H. Hetherington, T. G. Rogers, L. Hsu, M. O’Connor, and T. M. Aamodt, “Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems,”

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2012), April 2012.

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=6189209

Multithreaded CPU Radeon HD 5870 “Trinity” A10-5800K Zacate E-350

Page 24: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

25 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

ACCELERATING JAVA

GOING BEYOND NATIVE LANGUAGES

Page 25: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

26 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

JAVA ENABLEMENT BY APARAPI

Developer creates Java™ source Source compiled to class files (bytecode)

using standard compiler (javac)

Classes packaged and deployed using established Java™ tool chain

Aparapi = Runtime capable of converting Java™ bytecode to OpenCL™

For execution on any OpenCL™ 1.1+ capable device

OR execute via a thread pool if OpenCL™ is not available

Page 26: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

27 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

JAVA AND APARAPI HSA ENABLEMENT ROADMAP

HSAIL

HSA-Enabled JVM

Application

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA

HSA Runtime

LLVM Optimizer

HSAIL

IR

JVM

Application

Aparapi

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA CPU ISA GPU ISA

JVM

Application

Aparapi

GPU CPU

OpenCL™

HSAIL

JVM

Application

Aparapi

HSA GPU HSA CPU

HSA Finalizer

CPU ISA GPU ISA

Page 27: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

28 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HSA SOFTWARE STACKS

Page 28: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

29 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

INTRODUCING HSA BOLT – PARALLEL PRIMITIVES LIBRARY FOR HSA

Easily leverage the inherent power efficiency of GPU computing

Common routines such as scan, sort, reduce, transform

More advanced routines like heterogeneous pipelines

Bolt library works with OpenCL or C++ AMP

Enjoy the unique advantages of the HSA platform

Move the computation not the data

Finally a single source code base for the CPU and GPU!

Developers can focus on core algorithms See Ben Sander’s session tomorrow

for a deep dive on HSA Bolt!

Page 29: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

30 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

HSA SOLUTION STACK

CPU(s) GPU(s) Other

Accelerators

HSA Finalizer

Legacy Drivers

Application

Domain Specific Libs (Bolt, OpenCV™, … many others)

HSA Runtime

Application SW

Drivers

Differentiated HW

DirectX Runtime

Other Runtime

HSAIL

GPU ISA

OpenCL™ Runtime

HSA Software

Knl Driver

Ctl

Page 30: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

31 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

AMD’S OPEN SOURCE COMMITMENT TO HSA

Component Name AMD Specific Rationale

HSA Bolt Library No Enable understanding and debug

OpenCL HSAIL Code Generator No Enable research

LLVM Contributions No Industry and academic collaboration

HSA Assembler No Enable understanding and debug

HSA Runtime No Standardize on a single runtime

HSA Finalizer Yes Enable research and debug

HSA Kernel Driver Yes For inclusion in linux distros

We will open source our linux execution and compilation stack

Jump start the ecosystem

Allow a single shared implementation where appropriate

Enable university research in all areas

Page 31: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

32 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

EASE OF PROGRAMMING

CODE COMPLEXITY VS. PERFORMANCE

Page 32: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

33 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

0

50

100

150

200

250

300

350

LO

C

LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS

Copy-back Algorithm Launch Copy Compile Init Performance

Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt

Pe

rform

an

ce

35.00

30.00

25.00

20.00

15.00

10.00

5.00

0 Copy-back

Algorithm

Launch

Copy

Compile

Init.

Copy-back

Algorithm

Launch

Copy

Compile

Copy-back

Algorithm

Launch

Algorithm

Launch

Algorithm

Launch

Algorithm

Launch

Algorithm

Launch

(Exemplary ISV “Hessian” Kernel)

AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM.

Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta

Page 33: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

34 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

THE HSA FUTURE

Highly productive programmers

+ Scalable performance

+ Power efficiency

= AMAZING USER EXPERIENCES

Page 34: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY
Page 35: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

ANNOUNCING…

THE HSA FOUNDATION PHILIP ROGERS, PRESIDENT

Page 36: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

THE HSA FOUNDATION: ACTIVITIES

Nonprofit, open standardization body for HSA platforms that will own the

development and evangelization of the architecture going forward

Make heterogeneous programming easy and a first-class pervasive

complement to CPU computing

Continue to increase the power efficiency of HSA, keeping it the platform of

choice from smartphones to the cloud

Bring to market strong development solutions (tools, libraries, OS runtimes)

to drive innovative advanced content and applications

Foster growth of heterogeneous computing talent through HSA developer

training and academic programs to drive both learning and innovation

© Copyright 2012 HSA Foundation. All Rights Reserved. 37

Page 37: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

AMD’S CONTRIBUTION TO DATE

HSA draft specifications

HSA Programmer Reference Manual

HSA Hardware System Architecture Specification

HSA Software System Architecture Specification

Open source execution stack and compiler technology

HSA Bolt library – standard template library

Initial funding for incorporation

© Copyright 2012 HSA Foundation. All Rights Reserved. 38

Page 38: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

FOUNDATION CATEGORIES OF MEMBERS

© Copyright 2012 HSA Foundation. All Rights Reserved. 39

Founder

Promoter

Supporter

Contributor

Academic

Associate

Page 39: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

HSA FOUNDATION INITIAL FOUNDERS

represented here by ,

ARM Fellow and VP of Technology, Media Processing

Page 40: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

BRINGING VISUAL

COMPUTING TO LIFE JEM DAVIES

ARM Fellow , VP of Technology

Media Processing Division, ARM

Page 41: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

42 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

ARM COMMITTED TO HETEROGENEOUS COMPUTING

Page 42: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

HSA FOUNDATION INITIAL FOUNDERS

represented here by ,

ARM Fellow and VP of Technology, Media Processing

represented here by ,

President, Imagination Technologies USA

represented here by ,

President, MediaTek USA, Inc.

represented here by ,

Director, Linux Development Center

represented here by ,

CVP, Heterogeneous Applications and Developer Solutions

Page 43: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

THE HSA FOUNDATION

© Copyright 2012 HSA Foundation. All Rights Reserved. 44

www.hsafoundation.com

Page 44: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY
Page 45: AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

46 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions

and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited

to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product

differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no

obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to

make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO

RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS

INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY

DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL

OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF

EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, the HSA logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. OpenCL™ is

a trademark of Apple Corp. which is licensed to the Khronos Organization. All other names used in this presentation are for

informational purposes only and may be trademarks of their respective owners.

© 2012 Advanced Micro Devices, Inc.