ultra low power multimedia processor for mobile application · ultra low power multimedia processor...

36
ESSCIRC 2002 Firenze, September 26 th Authors: M. Mancuso , D. Alfonso, A. Artieri, A. Capra, F. Pappalardo, F. Rovati and R. Zafalon STMicroelectronics s1-1 Ultra Low Power Multimedia Processor for Mobile Application

Upload: others

Post on 24-Mar-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002Firenze, September 26th

Authors:M. Mancuso, D. Alfonso, A. Artieri, A. Capra,F. Pappalardo, F. Rovati and R. Zafalon

STMicroelectronicss1-1

Ultra Low Power Multimedia Processor for Mobile Application

Page 2: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Why designing low-power circuits/systems?

qPractical reasons:Ø Reducing power requirements of high-

throughput portable applications.

q Financial reasons:Ø Reducing packaging costs and achieving

energy savings.

q Technological reasons:Ø Enabling the realization of high-density chips

(heat poses serious limitations to circuit complexity and functionality).

Page 3: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Why low-power (II)

qDriving forces:Ø Advent of deep sub-micron technologies.Ø Increasing market share of mobile applications.Ø Limitations of battery technology.

Page 4: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Power Dissipation per Logic Function

100000

1000000

10000000

100000000

10000000001

98

6

19

90

19

94

19

98

20

02

20

06

Year

Tra

nsi

st./c

m2

incl

ud

ing

SR

AM

0.020.040.060.080.0

100.0120.0

Po

wer

sca

ling

%

(100

% a

t 5V

)

Moore's law Power Scaling

•Power per logic function scales down much slower than integration density’s growing

Page 5: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Maximum Power Trend (Source: ITRS 2001)

708090

100110120130140150160170180

1999

2000

2001

2002

2003

2004

2005

2006

Year

Hig

h P

erf.

Des

kto

p

max

Po

wer

[W

]

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Po

rtab

le m

ax p

ow

er

[W]

Desktop Portable

•High performance Desktop vs. Portable units•Power will be limited more by system level cooling and test constraints than packaging

Page 6: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

… Don’t forget Battery Technology

q Battery maximum power and capacity increase:Ø 10-15% per year

q Chip power requirements increase much faster: Ø 35-40% year ( ITRS 2001 )

Consequence ….

q Larger gap between Ø battery technology capability

and …Ø chip power demand

Page 7: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

What’s Next?

q The CMOS technology evolution has provided a straightforward path to reduce the basic power consumption without remarkable design effortØ Moore’s law still keeps going (at least until 2010)Ø The tech. “Brute force” has been easier, faster

and more affordable for designersq Power is really dictating the limit to super-

integrationØ High performance uP, dissipation will exceed

package limit by 25X in 15 years (Source: ITRS Roadmap, Update 2001)

Page 8: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

The Algorithmic Driving ForceShannon asks for more than Moore can deliver...

1

10

100

1000

10000

100000

1000000

10000000

1980

1984

1988

1992

1996

2000

2004

2008

2012

2016

2020

Algorithmic Complexity(Shannon’s Law)

Processor Performance (Moore’s Law)

Battery Capacity1G

2G

3G

Page 9: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Power Density is Close to …a Nuclear Reactor

q Need to design for high performance AND low power at all levels:Circuits to Micro−architecture and Software

Courtesy of Fred Pollack, IntelKeynote speech, MICRO-32

P4 @ 1.4GHz, 75W

Page 10: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Low Power System Methodology should span the whole range

q Embedded processors Architecture and µ−Arch

q RT-OS Run Time Power Mngt & Dynamic Volt Scaling

q Network-Centric Power Management

Ø power is prime determined in Communication

Ø Battery management is key to key to extend battery life

q Memory hierarchy optimization/SW compilers

q Loss-less Code/Data Compression for Memory Access Energy Saving

Page 11: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Opportunities for Power Reduction

System Level

Behavioral Level

RT Level

Gate / Logic Level

Device Level

Physical Level Week

Days

Day

Hours

Hour

Minutes

Runtime Requirements

10X

40-90%

30-50%

20-30%

10-20%

5-10%

Algorithms, HW/SW Tradeoffs,Avoid waste during SW compilation

Scheduling, Allocation, Resource Sharing & Retiming

Clock-Gating, Operand Isolation, Precomputation, & FSM EncodingVoltage scaling

Technology Mapping, Low Power Library,Rewiring, Phase Assignment, & De-GlitchingMinimize TR x CAPA

Optimize circui/layout:Buffering & Transistor Sizing,Clock Tree

Multiple/Optimum Vt, Triple well, SOI

Page 12: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Market Trends for New Mobile Applications

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

2002 2003 2004 2005 2006

Voice2G

Sources: Handsets: IDC (Q1/02) / PDA: Gartner Dataquest (Q3/01)

Data traffic exceeds voice traffic

Voice & Data(Camera, MP3 Player)

2.5G

Mobile Multimedia 3G

(Streaming, Videoconference,..)

Mobile internetGlobal convergence

Page 13: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

NEW MOBILE MULTIMEDIA SERVICES REQUIRE CONVERGENCE

• Low Power• Real TimeCommunication

• Interoperability

• Real Time Audio/Video

• CMOS Sensor & Imaging

• SW Middleware• Low Cost

• Graphics• Internet Access• OS• Storage

Page 14: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Application Environment Definition

Application is real-time Video Capture and encoding

Multimedia Processor

Multimedia Processor

Imager

Page 15: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

System Level Power Reduction: Examples

Bus Encoding

CMOSSensor

Scaling

DisplayProcess.

Display dependentprocessing ...

AudioInterface

DDX, IntelliMic ...

Embedded Mem

Image InputPipeline

CommunicationPeriph.

BluetoothIrDAUSBGPS ...

Ext. MemIF

FlashSDRAM.

ApplicationSpecificFlash

Periph.

Smartcard

Security

uP

STD OS supportMiddleware, JAVAAPI for MMStreaming ...

MEDIAACCELLERATION

Motion EstimFrame Compress

Page 16: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Image Acquisition System

Page 17: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

qMain features:

Øuse of the correlation between data to increase the coding

ØBus Inverter encoder

Ønarrow bus width to amplify the effects of the BI encoder

qResults:

Øup to 63% less of switching activity (compared with a normal 10 bits bus)

Bus Encoding

RGRGRGRGR

BGBGGRGRBGBGGRGR

Image Processing

Unit

Encoder

SensorRGRGRGRGR

BGBGGRGRBGBGGRGR

Image Processing

Unit

Decoder

Sensor

Page 18: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Bus Encoding: Most Relevant Featuresq Tracing communication overheads between the

architectural modules: switching activity minimization

PBus = Σi ½ αi Ci V 2 fclk

q Low Power Encoding/Decoding with no speed degradation

q Dynamic Software profiling during executionq Evaluation metrics to characterize data streams q Identify the best encoding which fits the target

application

Page 19: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Bus Encoding Technique Overview

qRedundant Encoding:Ø T0 CodeØ Bus­Invert Code

ØMemory-Less Adaptive Partial BusinverterCode

Ø Memory-Adaptive Partial Businverter CodeØ T0-Xor-OffSet Code

q Irredundant Encoding:Ø T0-Xor CodeØ OffSet CodeØ OffSet-Xor Code

Page 20: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Bus Encoding: Software Execution Profiling

q Power Tracer Tool:ØTrace transition activity of system-level buses

during the execution of benchmark programs.

ØAnalyze bus traces in terms of evaluation metrics.

Ø Implement bus encoding techniques.

Page 21: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

161514131211109

87654321

Sensor,10 bits per pixel

2420231922182117

3228312730262925

84736251

161215111410139

Hamming distance

Bus, 5 bits per pixel + inverter line

=1 ,ˆ0 ,

)((t)b

b(t)tout

Decoder

Neighborhood

Levels

Classical approach

Proposed approach

Bus Encoding: Data Reordering

Local levels distribution at low frequencies

Page 22: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

16.95%7.70%raster mode

31.63%21.16%2 pixels

40.12%27.40%4 pixels

44.72%30.05%8 pixels

63.73%33.30%1 Line

BUS 5 BITSBUS 10 BITSBUFFER SIZE

SWITCHING ACTIVITY REDUCTION

Bus Encoding: Results

Page 23: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

SCALING: Where ? (1/2)

The quality of stills requires sensors with higher resolution than video.Consequently Sensor and IGP will work at maximum resolution (VGAfor.ex.) even if video will have lower resolution (QCIF, for.ex).Scaling algorithms can play a key role on Power Consumption.

VGA 15 F/sec

Col. Processing Scaling

VGA 15 F/sec

VGA (for.ex)

QCIF 15 F/sec

Page 24: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

SCALING: Where? (2/2)

Image Generation

Pipeline

Sensor

10 bits Scaling8 bits

BAND5.6 MB/sec

BAND13.2 MB/sec

8 bits

BAND1.1 MB/sec

VGABayer

VGARGB

QCIFRGB

Image Generation Unit

RGRGRGRGR

BGBGGRGRBGBGGRGR

Image Generation

Pipeline

Sensor

10 bits Scaling 10 bits

BAND5.6 MB/sec

BAND0.45 MB/sec

8 bits

BAND1.1 MB/sec

VGA 15fpsBayer

QCIF 15 fpsBayer

QCIF 15 fpsRGB

Image Generation Unit

RGRGRGRGR

BGBGGRGRBGBGGRGR

Un-optimized version

Optimized version

In the optimized version the IGP performs a reduced number of operations per second and the band is reduced

Page 25: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

SCALING: Results

Scaling a VGA Image with Bi-cubic Alg.

Scaling a VGA Image with PoliPhase Alg.

Scaling a Bayer Pattern Scaling a Bayer Pattern VGA with ST VGA with ST proprietary Algorithmproprietary Algorithm

Page 26: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Motion Estimation

SLIMPEG MOTION VECTOR FIELD

BUILDING PROCESS

MOTION ESTIMATION VECTORS

MOTION COMPENSATED NOISE REDUCTION

SCENE CUT DETECTION

3:2 PULLDOWN DETECTION

CONCEALMENT MOTION VECTORS

INTERLACED / PROGRESSIVE DETECTION

ADAPTIVE SEARCH WINDOWS FOR UNCONSTRAINED SEARCH

Motion Estimation plays a critical role in Video Encoding.

ST Solution (SLIMPEG) offers the following advantages:

Low PowerPicture Quality &True MotionLow BandWidthLow Complexity Search Window Independency

Motion Estimation

Other

MIPS

Motion Estimation

Other

Mem. BW

Page 27: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Motion Estimation: Complexity

q SLIMPEG, Three Step, Densely Centered Uniform P-Search [*] ,Fast Search (already included in the standard, based on a heuristic search) and Full Search algorithm.

q Figures are numer of matchings per QCIF frame; values take into account border effects

q SLIMPEG shows lowest and constant complexity: 99% gainvs Full search

Slimpeg 3 step hierarch. Fast Search D.C.U.P-S Full Search

Foreman 1,247 2,707 4,998 16,927 78,231

Coastguard 1,247 2,692 4,356 16,927 78,231

Miss America 1,247 2,655 2,665 16,927 78,231

B. Furth, J. Greenberg, R. Westwater, "Motion Estimations Algorithms for Video Compression",Kluwer Academic Publishers, 1997

Page 28: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Motion Estimation: Stable Complexity

Low and fixed number of operations per macroblock

1.E+00

1.E+01

1.E+02

1.E+03

1.E+04

1.E+05

0 16 32 48 64 80 96 112 128

Full searchLogarithmicProposed

Matchings per macroblock vs. search window size

Page 29: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Motion Estimation: Quality Achieved

-2.00

-1.50

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

carp

hone

child

ren

fore

man

mon

itor

mis

sa

mot

her

new

s

rena

ta

sile

nt

teen

y

Ave

rage

Y P

SNR

[dB

]

Gain over Full SearchGain over PMVFAST

QCIF 64 kbit/s, 15 fps

Page 30: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Motion Estimation Quality Achieved (2)

Comparison between Full Search and proposed method against scene changes

30

31

32

33

34

35

36

1 51 101 151 201 251

Y P

SNR

[dB

]Full SearchProposed

Page 31: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Q 1

+

P

x̂ n

x n

en̂

Coder C

x n Qin

+

Px̂ n

en

en̂

x n

-

Decoder D

Ggk

• Fixed compression ratio of 50%• No mismatch between encoder and decoder• Quality drop well masked by Mpeg quantization

noise

Frame Buffer Compression

Page 32: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

50% Bandwidth Saving

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

carp

hone

chil

dren

fore

man

mon

itor

mis

sa

mot

her

new

s

rena

ta

sile

nt

teen

y

Mea

n ba

ndw

idth

[MB

/s]

Full Search Bandwidth

Average bandwidth of the proposed solution versus the Full Search,without and with memory compression

Page 33: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Minimal Quality Loss

Without compression With compression Difference

Page 34: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

VIDEO CODECSAn Optimized Implementation

qMix of Host SW, FW, HW;q Low operating frequency:Ø QCIF codec 15Hz : 3 MHzØ VGA encode 30Hz : 40 MHz

qUltra low power (0.13um,ULL,0.9v) :Ø QCIF 15Hz decode : <1mWØ QCIF 15Hz encode : 3 mWØ VGA 30Hz encode : 40 mW

Host

HW. Acc

AHB

AHB

SystemMemory

Sensor

IT

Page 35: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Required MIPS on ARM

q Codec drivers is running on ARM CPU.q QCIF 15Hz Video Decoder

requirements:Ø Video IT routine: 0.06 MIPSØ Video Decode driver: 0.13 MIPSØ Video Display driver: 0.03 MIPS

q Only 0.2 MIPS required for video codec control (0.07% of ARM CPU)

q Note:Ø VAX MIPS equivalent to 1757 dhrystone/s, ARM9 @ 264Mhz:290MIPS

Page 36: Ultra Low Power Multimedia Processor for Mobile Application · Ultra Low Power Multimedia Processor for Mobile Application. ... 2.0 2.5 3.0 3.5 4.0 Portable max power [W] Desktop

ESSCIRC 2002 - Firenze

Conclusions

q System Level Power Reduction examples have been presented in the context of Mobile Multimedia

q ST Bus Encoding of Bayer data combined with the optimized scaling allow to achieve a saving of more than 93% in data throughput

q ST Motion estimator solution achieves remarkable savings in terms of computation workload (99% less than Full Search), internal and external memory size, silicon area as well as bandwidth requirement on the system bus. Quality of results comparable to the common Full Search approach