qualcomm hexagon dsp: an architecture optimized for mobile multimedia and communications€¦ ·...

23
1 Qualcomm Technologies, Inc. All Rights Reserved Qualcomm Technologies, Inc. All Rights Reserved Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc.

Upload: hoangnhu

Post on 06-May-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

1

Qualcomm Technologies, Inc. All Rights Reserved Qualcomm Technologies, Inc. All Rights Reserved

Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications

Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc.

Page 2: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

2

Qualcomm Technologies, Inc. All Rights Reserved

Camera

Display

JPEG

Video

Other

• aDSP: Real-time media & sensor processing

Hexagon™ DSP processors in Snapdragon products

Multimedia Fabric System Fabric

Krait

CPU Adreno

GPU

Krait

CPU

Krait

CPU

Krait

CPU

2MB L2

Misc.

Connectivity

Modem

Snapdragon 800

Fabric & Memory Controller

LPDDR3 LPDDR3

Hexagon

aDSP

Hexagon

mDSP

• mDSP: Dedicated modem processing

Audio

Sensors

Page 3: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

3

Qualcomm Technologies, Inc. All Rights Reserved

Expansion of Hexagon DSP use cases beyond audio

Image Enhancement Camera, Still, Video HexagonV4 based products

Video HexagonV5 based products

Sensors HexagonV5 based products

Computer Vision & Augmented Reality HexagonV4 based products

HexagonV2/V3

Voice Audio

Hexagon DSP is evolving for use beyond voice and audio to

computer vision, video and imaging features

Page 4: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

4

Qualcomm Technologies, Inc. All Rights Reserved

The Hexagon DSP evolution

Generational improvements in performance and power efficiency driven by both architecture and implementation

V4L 28nm

Apr 2011

V3M 45nm

June 2009

V2 65nm

Dec 2007

V3C 45nm Aug

2009

V3L 45nm Nov

2009

V4M 28nm

Dec 2010

V4C 28nm

Dec 2010

V5A 28nm

Dec 2012

V1 65nm

Oct 2006

Time

V5H 28nm

Dec 2012

Page 5: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

5

Qualcomm Technologies, Inc. All Rights Reserved

Requirements

• Require fixed real-time performance level (fps, Mbit/sec, etc.)

• Extremely aggressive power & area targets

Key characteristics of modem & multimedia applications

Characteristics

• Mix of signal processing & control code

− For modem, Qualcomm does not use a split CPU/DSP architecture. All processing is done on Hexagon DSP

− Multimedia apps have significant control in the RTOS & frameworks

• Heavy L2$ misses

− Multimedia is data intensive

− Modem is code intensive

Page 6: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

6

Qualcomm Technologies, Inc. All Rights Reserved

Hexagon DSP blends features targeted to modem & multimedia

Hexagon

DSP

VLIW • Need multi-issue to

meet performance

• Low complexity for

Area & Power

Innovate in ISA to

maximize IPC • More work/VLIW packet

reduces energy/instruction

• Keep the pipelines full for

MIPS/mm2

• Target both Signal

Processing & Control code

Multi-Threading • To reduce L2$ miss

penalty without the need

for a large L2

• Increases

instructions/VLIW packet

because compiler doesn’t

need to schedule latency

Page 7: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

7

Qualcomm Technologies, Inc. All Rights Reserved

Instruction Unit

VLIW: Area & power efficient multi-issue

Data Unit

(Load/

Store/

ALU)

Data Unit

(Load/

Store/

ALU)

Execution

Unit

(64-bit

Vector)

Execution

Unit

(64-bit

Vector)

Data Cache

L2

Cache

/ TCM

Instruction

Cache

• Dual 64-bit

load/store

units

• Also 32-bit

ALU

Variable sized

instruction packets

(1 to 4 instructions

per Packet)

• Dual 64-bit execution units

• Standard 8/16/32/64bit data

types

• SIMD vectorized MPY / ALU

/ SHIFT, Permute, BitOps

• Up to 8 16b MAC/cycle

• 2 SP FMA/cycle

Register File Register File

Register File/Thread

• Unified 32x32bit

General Register

File is best for

compiler.

• No separate Address

or Accum Regs

• Per-Thread

Device DDR

Memory

Page 8: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

8

Qualcomm Technologies, Inc. All Rights Reserved

Maximizing the signal processing code work/packet

Example from inner loop of FFT: Executing 29 “simple RISC ops” in 1 cycle

Rs

Add

I R

Rt

*

32

<<0-1

*

32

<<0-1

Rd

I R

Add

I R

*

32

<<0-1

*

32

<<0-1

I R

Rs

Rt

-0x80000x8000

Sat_32 Sat_32

High 16bitsHigh 16bits

I R

+ + + +

{ R17:16 = MEMD(R0++M1)

MEMD(R6++M1) = R25:24

R20 = CMPY(R20, R8):<<1:rnd:sat

R11:10 = VADDH(R11:10, R13:12)

}:endloop0

Complex multiply with

round and saturation

Vector 4x16-bit Add

64-bit Load and

Zero-overhead loops • Dec count

• Compare

• Jump top

64-bit Store with post-update addressing

Page 9: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

9

Qualcomm Technologies, Inc. All Rights Reserved

Maximizing the control code work/packet

Hexagon DSP ISA improves control code efficiency over traditional VLIW

Example C code

void example(int *ptr, int val) {

if (ptr!=0) {

*ptr = *ptr + val + 2;

}}

p0 = cmp.eq(r0,#0)

{

if (!p0) r2=memw(r0)

if (p0) jumpr:nt r31

}

r2 = add(r2,#2)

r1 = add(r1,r2)

{

memw(r0) = r1

jumpr r31

}

Instr/Packet = 7 instr/5 packets = 1.4

Tradional VLIW

Assembly Code

③ ④

{

p0 = cmp.eq (r0,#0)

if (!p0.new) r2=memw(r0)

if (p0.new) jumpr:nt r31

}

r2 = add(r2,#2)

r1 = add(r1,r2)

{

memw(r0) = r1

jumpr r31

}

Hexagon DSP:

Dot-New Predication

② ③

{

p0 = cmp.eq(r0,#0)

if (!p0.new) r2=memw(r0)

if (p0.new) jumpr:nt r31

}

r1 = add(r1,add(r2,#2))

{

memw(r0) = r1

jumpr r31

}

Hexagon DSP:

Compound ALU

{

p0 = cmp.eq(r0,#0)

if (!p0.new) r2=memw(r0)

if (p0.new) jumpr:nt r31

}

{

r1 = add(r1,add(r2,#2))

memw(r0) = r1.new

jumpr r31

}

Hexagon DSP:

New-Value Store

Instr/Packet =

7 instr/2packets = 3.5

Page 10: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

10

Qualcomm Technologies, Inc. All Rights Reserved

High avg. instructions/packet for targeted use cases

Compound instructions count as 2

Audio Computer

Vision Video Imaging Control

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Av

era

ge

In

str

uc

tio

ns /

VL

IW P

ac

ke

t

Source: Qualcomm internal measurements

Page 11: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

11

Qualcomm Technologies, Inc. All Rights Reserved

• Hexagon V5 includes three hardware threads

• Architected to look like a multi-core with communication through shared memory

Programmer’s view of Hexagon DSP HW multi-threading

Thread 0

D

U

X

U

Shared Data Cache

L2

Cache /

TCM

Register File

D

U

X

U

Thread 1

D

U

X

U

Register File

D

U

X

U

Thread 2

D

U

X

U

Register File

D

U

X

U

Shared Instruction Cache

Page 12: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

12

Qualcomm Technologies, Inc. All Rights Reserved

• Number of threads match execution pipe depth (three threads three execute stages)

• All instructions complete before next packet dispatch

• Compiler schedules for zero-latency which helps to increase instructions/VLIW packet

Hexagon DSP V1-V4: Interleaved multi-threading

Simple round-robin thread scheduling

Thread 0 Dispatch

T0: { Ld Ld Add Cmp }

Thread 1 Dispatch

T1: { St Ld Mpy Add }

T0: { Ld Ld Add Cmp }

Thread 2 Dispatch

T2: { Ld Add Jump }

T1: { St Ld Mpy Add }

T0: { Ld Ld Add Cmp }

Page 13: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

13

Qualcomm Technologies, Inc. All Rights Reserved

• Remove a thread from IMT rotation

− On L2 cache misses

− When in wait-for-interrupt or off mode

• Additional forwarding to support 2-cycle packets

• VLIW packets with dependencies between long latency instructions will stall

− But many VLIW packets with simple instructions can complete in 2 processor clocks

Hexagon DSP V5: Dynamic HW multi-threading

Recover some performance when threads idle or stalled

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Dhrystone DMIPS/MHz

IMT DMT

0

1

2

3

4

5

6

7

8

Coremarks/MHz

IMT DMT

Source: Qualcomm internal measurements

Page 14: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

14

Qualcomm Technologies, Inc. All Rights Reserved

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Avera

ge I

nstr

ucti

on

s /

Cycle

IPC_DMT

IPC_IMT

Hexagon DSP instructions per cycle

Single-Threaded Apps

Multi-Threaded Apps

Source: Qualcomm internal measurements

Page 15: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

15

Qualcomm Technologies, Inc. All Rights Reserved

Clock Rate (MHz) 430-520 100-267 300-800

DSP Performance (BDTImark2000) 4730-5720 1810-4840 5430-14520*

* - Projected best case score for 3-threads

Source: BDTI - For more detailed information see www.BDTI.com. All scores ©2013 BDTI

0

2

4

6

8

10

12

14

16

18

20

Mobile Competitor Qualcomm Hexagon V5(1 thread)

Qualcomm Hexagon V5(3 threads)

DS

P P

erf

orm

an

ce p

er

MH

z

BDTIm

ark2000™/MHz

Hexagon DSP V5: Efficient Architecture

Highly efficient mobile application processor — designed for more performance per MHz

Page 16: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

16

Qualcomm Technologies, Inc. All Rights Reserved

Hexagon DSP Power Benefits

Page 17: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

17

Qualcomm Technologies, Inc. All Rights Reserved

MP3 playback power for competitive smartphones

Competitor A Qualcomm /Hexagon-based

Competitor B Competitor C Competitor D Competitor E Competitor F Competitor G

• Power measured at the battery for various phones

• Includes everything: DSP, CPU, memory, analog components, etc

Power

Lo

wer

is b

ett

er

Source: Qualcomm internal measurements

Page 18: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

18

Qualcomm Technologies, Inc. All Rights Reserved

Augmented Reality Java Application

Call Feature Detect

ARM Only FastCV Library

Feature Detect

Function

App CPU

FastCV Call Router

Hexagon (QDSP6) FastCV Library

Feature Detect

Function

App DSPVeNum

ARM/VeNum FastCV Library

Feature Detect

Function

32% Less Power* 7% Less Time 52% Less CPU

Augmented Reality Java App finding objects in

image using FastCV Feature Detect

Comparison of Feature Detect run on:

• App CPU (ARM/Neon)

• App DSP (Hexagon)

Computer vision offload – ARM/neon to Hexagon DSP

Detection Time (%)

Total Device Power (%)

CPU Utilization (%)

Source: Qualcomm internal measurements. * Power measured at the device battery

Page 19: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

19

Qualcomm Technologies, Inc. All Rights Reserved

• Excellent near-linear power scalability (as threads go idle, power used by the thread is nearly eliminated)

• Achieved through optimized clock tree design & clock gating

Hexagon DSP power for different thread utilizations

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Dhrystone Power, IMT Mode

Actual

Ideal

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

FIR Power, IMT Mode

Actual

Ideal

Source: Qualcomm internal measurements

Page 20: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

20

Qualcomm Technologies, Inc. All Rights Reserved

Hexagon DSP Software Development

Page 22: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

22

Qualcomm Technologies, Inc. All Rights Reserved

Announcing the Hexagon DSP SDK See the Hexagon DSP SDK in action at Uplinq2013 (www.uplinq.com)

Visit http://developer.qualcomm.com for more information.

Page 23: Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications€¦ ·  · 2013-08-26optimized for mobile multimedia and communications Lucian Codrescu

23

Qualcomm Technologies, Inc. All Rights Reserved

For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog

©2013 Qualcomm Technologies, Inc. Qualcomm and Hexagon are trademarks of QUALCOMM Incorporated, registered in the United States and other countries. All QUALCOMM Incorporated trademarks are used with permission. Other product and brand names may be trademarks or registered trademarks of their respective owners. Hexagon is a product of Qualcomm Technologies, Inc.

Thank you

Follow us on: