adaptive system on a chip (asoc) for low-power signal processing andrew laffely, jian liang,...

39
Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts, Amherst {alaffely, jliang, pjain, nweng, burleson, tessier} @ecs.umass.edu This material is based upon work supported by the National Science Foundation under Grant No. 9988238. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Adaptive System on a Chip (aSoC) for Low-Power Signal Processing

Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng,

Wayne Burleson, Russell TessierDepartment of Electrical and Computer

EngineeringUniversity of Massachusetts, Amherst

{alaffely, jliang, pjain, nweng, burleson, tessier} @ecs.umass.edu

This material is based upon work supported by the National Science Foundation under Grant No. 9988238.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Page 2: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Overview

• Motivation• Video Processing

• Architecture• Dynamic Power Management

• Core, Interconnect, and Clock

Page 3: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Problem

• Wireless video processing requires• High throughput • Low Power• Flexible

Page 4: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

System on a Chip Solutions

• Take advantage of parallelism• Possible improved performance

• Allow use and reuse of existing integrated components

• If• The application can be partitioned • The appropriate architecture is used

Page 5: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Proposed Architecture: aSoC• High throughput

• Heterogeneous processor elements• Use the right tool for the job

• Fast and predictable interconnect

• Flexible• Runtime reconfiguration of cores and

interconnect

• Power consumption• Implement power saving features in both

cores and interconnect• Use reconfiguration to dynamically control

power consumption

Page 6: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC: adaptive System on a Chip

• Tiled SoC architectureDCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation

Page 7: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC: adaptive System on a Chip

• Tiled SoC architecture• Supports the use of

independently developed heterogeneous cores

• Pick and place cores which best perform the given application

• Increase performance

• Save power• Cores may be any

number of tiles in size

DCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation

Page 8: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC: adaptive System on a Chip

• Tiled SoC architecture• Supports the use of

independently developed heterogeneous cores

• Connected with an interconnect mesh

• Restricted to near neighbor communications

• Creates pipeline• Decreases cycle time

DCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation

Page 9: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC: adaptive System on a Chip

• Tiled SoC architecture• Supports the use of

independently developed heterogeneous cores

• Connected with a fixed interconnect mesh

• Using a communication interface (CI) to manage data

• Network port (Coreport) for each core

• Each CI uses a memory and FSM to repetitively process a predefined schedule of communications

• Crossbar

DCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation

Page 10: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Stream Control• Instruction memory

• Holds the predetermined schedule of communications

• PC • Selects and synchronizes

the communications• Decoder

• Sets crossbar• Controller

• Sets PC • Interprets incoming

configuration commands• Crossbar

• Any input to any set of outputs

NorthSouthEastWest

CoreNorthSouthEastWest

Core

Decoder/Controller

PC

InputsOutputs

Instruction

Memory

LocalConfig

.

Page 11: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Communication

• Stream A-D

• Core CCore BCore A

• A given application requires periodic communications from Core A to Core C

• aSoC uses a prescheduled communication STREAM• Core A places the data in a dedicated STREAM between

the two tiles• Core C pulls the data from that STREAM

• The tile to tile communication uses 3 cycles

Page 12: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Stream

CBA

1 Core to East

Page 13: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Stream

• Stream A-D

• CBA

2 West to East

Page 14: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Stream

CBA

West to Core3

Page 15: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Stream

• Stream A-D

• CBA

West to Core

1

3

2

Core to East

West to EastLoopBack

Page 16: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Static Scheduled Communications

• Creates system scalability by “eliminating” network congestion

• Many interconnect segments managed with time division multiplexing

• lots of Bandwidth

• Improves SoC performance by up to

factor of 8

DCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation

Page 17: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Power Consumption?

• Provide reconfiguration methods for cores and CI

• Develop programmable clocking systems at each tile

Page 18: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Power Aware Core

• Custom motion estimation core• Choose search method

• Full search• 960-600mW (bit width and pel sub-sampling)

• Spiral search• 76mW

• Three step search• 25mW

Data taken with SynopsysTM Power Compiler at the RTL level

Page 19: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support

• Multiple streams in and out through dedicated coreports

• Easy to manage on both sides of the port

• Schedule configuration streams in with the data

• Stream A: Input Frame• Stream B: Configuration

(Choose search mode and size)

• Stream C: Motion Vectors

Motion Estimation

Core

in1 in2 out2out1

Stream AStream B

Stream C

Coreports

Page 20: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Reconfigurable Interconnect

• P-frame

• I-frame

ME MC

-

+

InputFrame

DCTInputFrame

DCT

Page 21: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support

• Lumped ME, MC and Summation into one double core

DCTMotion Estimation& Compensation

Page 22: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: P-Frame

InputFrame

(Stream A)

DCTMotion Estimation& Compensation

DifferenceFrame

(Stream B)

Page 23: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: Schedule Change

InputFrame

(Stream A)

DCTMotion Estimation& Compensation

DifferenceFrame

(Stream B)

Configuration Streams (C & D)

Page 24: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: Schedule Change

InputFrame

(Stream A)

DCTMotion Estimation& Compensation

DifferenceFrame

(Stream B)

Configuration(Streams C)

Schedule 1

Schedule 2

PC

Page 25: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: Schedule Change

InputFrame

(Stream A)

DCTMotion Estimation& Compensation

DifferenceFrame

(Stream B)

Configuration(Streams C)

Schedule 1

Schedule 2

PC

Page 26: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: Schedule Change

InputFrame

(Stream A)

DCTMotion Estimation& Compensation

Configuration(Streams D)

Schedule 1

Schedule 2

PC

Page 27: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: Schedule Change

InputFrame

(Stream A’)

DCTMotion Estimation& Compensation

Configuration(Streams D)

Schedule 1

Schedule 2

PC

Page 28: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC Support: I-Frame

InputFrame

(Stream A’)

DCTMotion Estimation& Compensation

OFF

Page 29: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Operating Frequency?

• Interconnect synchronized• H-tree clock distribution

• Core frequencies depend on critical path• Tile provides clock reference• Coreport provides asynchronous boundary

• Dynamic core configuration requires dynamic clock configuration• aSoC clock reference provides multiples of

interconnect clock (… 4x, 2x, 1x, 0.5x, 0.25x, …)

• Configured through the tile controller

Page 30: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Mixed vs. Fixed Core Frequencies

• Cores not designed with clock gating• Core power from Synopsys RTL simulation• Interconnect from SPICE• Assumes 10 cycle schedule, 4 pixels/word

Optimal Independent Frequencies

Fixed Worst Case 105MHz

Core: Mode

Frequency MHz

Power mW

Power mW

ME: Full Search

105 973 973

ME: Spiral

9.9 76 659

ME: Three Step Search

2.75 25 580

DCT 9.6 54 349 Interconnect 6.34 0.14 0.81

Page 31: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Current Density and Clocking

• Red: fixed worst case clocking

• Short spikes of high current

• Green: optimal independent clocking

• Slow and low

• Optimal clocking eliminates current spikes (improved battery life)

DeadlineProcess Start

ME: Full Search

ME: Spiral

ME: Three Step Search

DCT

Time

Current

Page 32: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Configuration Overhead• Configuration adds up

to 2 streams per tile• Only 2 required for

data• Total BW =5xTxN

• 5 streams/(cycle,tile)• T tiles• N cycles in schedule

• Single tile can support up to 50 different streams in 10 cycle schedule

DCT

TransformFrame

(Stream D)

InputFrame

(Stream B)

ConfigurationStreams

Page 33: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Configuration Power Overhead

• Configuration streams used infrequently• Once/Macro block or Once/Frame

• Architecture disables unused streams• Data valid bit already used for flow control

• Only 4-9% of interconnect power is due to configuration streams

Page 34: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Conclusion

• aSoC supports dynamic power management with Reconfiguration• Cores• Interconnect• Clocks

• Low configuration overhead in both• Communication Bandwidth• Power

Page 35: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Future Work

• Add reconfigurable voltage supplies at each tile

• Finish test chip• Import larger applications

Page 36: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Questions

Page 37: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

aSoC: adaptive System on a Chip

DCT

VLE

MemoryViterbiFIR

EncryptControl

Motion Estimationand Compensation Cores

Interconnect

Interface

Tile

Page 38: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Example: Stream

• Stream A-D

• CBA

Page 39: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier

Partitioning

• Automated partitioning a non trivial problem

• For small signal processing systems user defined partitioning may be possible

• Key: Perfectly partitioning the system may not be possible• How can the SoC mitigate the

penalty?