a reconfigurable signal processing ic with embedded fpga and

23
A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti , L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy

Upload: flashdomain

Post on 05-Dec-2014

725 views

Category:

Documents


6 download

DESCRIPTION

 

TRANSCRIPT

Page 1: A Reconfigurable Signal Processing IC with embedded FPGA and

A Reconfigurable Signal Processing IC with embedded

FPGA and Multi-Port Flash Memory

M. Borgatti, L. Calì, G. De Sandre,

B. Forêt, D. Iezzi, F. Lertora, G. Muzzi,

M. Pasotti, M. Poles, P.L. Rolandi

STMicroelectronics - Central R&D - Italy

Page 2: A Reconfigurable Signal Processing IC with embedded FPGA and

Outline of Presentation

• Project motivation and background• System architecture– Reconfigurable core– Memory subsystem

• System performance– Application example: embedded face

recognition system

• Energy efficiency, measurements• SoC integration and design flow– System 2 RTL and RTL 2 Layout

• Summary22

Page 3: A Reconfigurable Signal Processing IC with embedded FPGA and

Project motivation and background• Conflicting industry trends– Economics of system integration

• Even more complex SoC• More integration• Cost effectiveness and performance (per unit)

– Increasing design complexity and risks– Increasing NREs– Shorter time-to-market and product life

• Strong need for:– Faster project turnaround– Lower risk

• Usage of re-configurable silicon fabrics33

Page 4: A Reconfigurable Signal Processing IC with embedded FPGA and

Project motivation and background

• Pragmatic approach proposed:– Reconfigurable architecture– Joins a statically extensible processor with

e-FPGA– Tight connection to Flash memory

subsystem– Open architecture with flexible

programmable I/O

• Programmable platform approach– Simple model for programmers

44

Page 5: A Reconfigurable Signal Processing IC with embedded FPGA and

Programmable Platform Approach

System ApplicationsFamily System Application

Silicon process +Enabling technologies

PlatformCompilation

Config. Proc+

e-FPGA

ApplicationCompilation

Programmable platform55

Page 6: A Reconfigurable Signal Processing IC with embedded FPGA and

8KB D$

System Architecture

Inst. Ext I/F

Extensible

MPU

bu

sb

ridg

e

e-FPGA

General Purpose I/O Lines

8KBD$

8KBI$

I2C BUS

M/SAHB I/F

INTs

DMA &FPGAProg.

I/F

BufferI/F

GPI/O 64 bit APB BUS

1kB Buffer

AHB/APBBridge

64 bit AHB BUS

I2CMaster

I/Oregisters

48 kBSRAM

FP CP DP

Flash Mem

Inst

r. E

xt.

66

Page 7: A Reconfigurable Signal Processing IC with embedded FPGA and

e-FPGA Purposes• Processor ISA extensions– Simplest programmer’s model– Specific interface to the MPU datapath– Impact on processor performance– Impact on processor energy efficiency– Efficiency limited by instruction stream

decoding

• Bus-mapped co-processor–Maximum benefits in speed/power

• Flexible I/O

77

Page 8: A Reconfigurable Signal Processing IC with embedded FPGA and

e-FPGA – Microprocessor interface

E

Clock Ctrl

Other FPGA

Purposes

Instructionextension

RPipe

ControlDecode

RegisterFile

Instruction

Result

Microprocessor clocke-FPGAClock

88

Page 9: A Reconfigurable Signal Processing IC with embedded FPGA and

Flash Memory Architecture

DP CP FP

8-bit P

P I/F

PMA

DFT

PowerBlock

2Mb#0

FPGA PortCode PortData Port

2Mb#1

2Mb#2

2Mb#3

128-bit Memory Sub-System Crossbar

128 128 128 128

64 64 32

99

Page 10: A Reconfigurable Signal Processing IC with embedded FPGA and

Flash Memory Subsystem

• Modular approach– Customizable array of N independent 2Mb

modules

• 3 content-specific ports (CP, DP, FP)• HW support for filesystem implem. (DP) – Defrag– Compression– Virtual erase

• 2Mb Module features:– 128b I/O– 40ns access time (400MB/s peak throughput)– Power management and arbitration

1010

Page 11: A Reconfigurable Signal Processing IC with embedded FPGA and

System Memory Hierarchy

64-bit AHB Bus

32-bit uP RegisterFile

6x4 128-bit Crossbar

4 x 16384 x 128-bit Memory Module

AHB Bridge

4 x Flash Memory Controller Logic

64 bit Port CP32-bit

Port FP

2 x 64- + 1 x 32-bit Memory Port I/Fs

64-bit CP I/F 64-bit DP I/F DMA

64-bit AHB

32-bit FPGA PI/F

32-bit

512-B Buffer

64-bit Port DP

• AHB Peak Throughput:– 800MB/s

• e-FPGA– 400MB/s– (50MB/s

sustained)

• Total Aggregate Peak– 1.2GB/s

1111

Page 12: A Reconfigurable Signal Processing IC with embedded FPGA and

Application Ex.: Face Recognition• Target application:– Recognize a face out of twenty– low-resolution images from CMOS cameras

• Potential applications:– Low cost smart toys– Advanced human-machine interfaces– Color CMOS camera processors

• Image preprocessing: Bayer filter• Face location: based on Hough transform• Face recognition: Line-Based

• Recognition rates over 90 %• Scale-invariant• Tolerant to changes in illumination intensity

1212

Page 13: A Reconfigurable Signal Processing IC with embedded FPGA and

Processor Extension (I)

_

x

+

+ +

‘8’ ’16’

ProcessorLoad Unit

64-bit register

Result

4-segm. 4-segm. • 8-issue, 8-bit L2 distance

• Complexity:– 23 8-bit OPS– 6 64-bit OPS

• 1GOPS peak throughput– Distance computation

• 10k equiv. ASIC gates• Mapped to e-FPGA 1313

Page 14: A Reconfigurable Signal Processing IC with embedded FPGA and

Processor Extension (II)NumberRemaind.root

>>1

<< 1

<<2 >>2>>30

+

_

+1

>

+ 2

Result

• Fixed-point square root kernel

• Complexity:– 12 32-bit OPS

• 2k equiv. ASIC gates• Mapped to e-FPGA

1414

Page 15: A Reconfigurable Signal Processing IC with embedded FPGA and

Algorithm Stage RISC w/ basic DSP

RISC w/ basic DSP + uP Ext.

Speed-Up

Bayer Filter 58 msec 24.7 msec x 2.3

Edge Detection 4.5 msec

2.5 msec x 1.8

Face Detection 1.5 sec 382 msec x 4

Face Recognition

(20-face database)

9.15 sec 860 msec x 10.6

Totals 10.7 sec 1.26 sec x 8.5

Performance: Processing Time @ 100 MHz

Page 16: A Reconfigurable Signal Processing IC with embedded FPGA and

Energy Efficiency vs. Flexibility

Flexibility (Coverage)

En

erg

y E

ffic

ien

cy (

MO

PS

/mW

)

Embedded Processors

ASIPs, DSPs

DedicatedHW

0.1

1

10

100

1000

from: Zhang et Al., ISSCC 2000

Energy-Flexibility Gap !

FPGA-mapped

CoProcessors

uP + FPGA

Instructions

1616

Page 17: A Reconfigurable Signal Processing IC with embedded FPGA and

Algorithm Stage Speed-Up

Energy Gain

Energy x Delay Gain

Bayer Filter x 2.3 x 1.4 x 3.2

Edge Detection x 1.8 x 0.95 x 1.7

Face Detection x 4 x 2.9 x 11.6

Face Recognition

(20-face database)

x 10.6 x 9 x 95.4

Totals x 8.5 x 6.7 x 57

Performance: Energy Efficiency

1717

Page 18: A Reconfigurable Signal Processing IC with embedded FPGA and

Cycle Accurate Simulation Performance Analysis

C

VHDL(e-FPGA) HW (RTL)

uP, AHB/APB Bus

Peripherals

SWApps

SoC Integration

uP ISS

Functional model (untimed)

Partitioning / I/F Synthesis / Refinement

LibrariesHW/SW

Soft Hardware (eFPGA)

eFPGA mapping

eFPGA HARD

MACRO

Inst.Ext. Verilog

1818

Page 19: A Reconfigurable Signal Processing IC with embedded FPGA and

Inst.Ext.

Synthesis

Mapping (P&R)

CPU core, IPs

Interface RTL code

FlashRAM

Synthesis

Floorplanning / P&R

Static Timing Analysis, Dynamic Verification

Static Timing Analysis(SoC + eFPGA)

FPGA Timing DB

Bit-stream

Coproc. I/OI/F

eFPGA core

Con.

Netlist +Timing

Database

Silicon fab

1919

Page 20: A Reconfigurable Signal Processing IC with embedded FPGA and

Chip LayoutProcess 0.18um CMOS 2P/6M

Embedded Flash

Flash

Memory (x4)

256kB x 9 sectors

128-bit word

1MB/s write through.

400MB/s read through.

SRAM

Memory

Main: 48kB (64-bit)

I$: 8kB (64-bit)

D$: 8kB (64-bit)

Buffers: 4x256B

Chip size 8.4 x 8.4 mm2 (e-FPGA size: 8.2 mm2)

I/O 24 inputs + 24 outputs (tristate) + 8 bidirs

Supply 2.7-3.6V (external), 1.8V(core)

48 KB SRAM

BU

FF

ER

Em

bed

ded

FP

GA

TAGS8+8 KBI$ + D$

32b uP +AHB & APB +250k GATES

1MB FLASH Memory

uPAHB/APB FPGA8+8 kBI$+D$

DFT

Flash

Ports

Bu

ffers

48kB SRAM

2020

Page 21: A Reconfigurable Signal Processing IC with embedded FPGA and

Chip Performances and Power Consumption

Processor maximum speed: 125MHz (WCMIL)

Reconfiguration speed: 500us @ 100MHz clock

Chip average power consumption

300mW @ 100MHz, 1.8V

2121

Page 22: A Reconfigurable Signal Processing IC with embedded FPGA and

Summary• e-FPGAs allow architectural tradeoffs for

reconfigurable embedded systems:– Processor ISA extensions– Bus-mapped co-processor– Flexible I/O

• Modular, content-specific, multiport e-Flash• Performance figures:– Up to 10x speedup– Up to 9x energy reduction– Dynamic reconfiguration in 500 us

• Specific design-flow for system and RTL2222

Page 23: A Reconfigurable Signal Processing IC with embedded FPGA and

Acknowledgements:

The authors thank:

all the colleagues of NVM-DP Dept.A. Maurelli, F. Piazza and L. Fumagalli.

2323