embedded processing using fpgas -...

65
Embedded Processing Using FPGAs

Upload: trandan

Post on 23-Apr-2018

228 views

Category:

Documents


1 download

TRANSCRIPT

Embedded

Processing Using

FPGAs

www.xilinx.com2 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• A Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com3 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• A Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com4 - Embedded Processing using FPGAs

Why Use Processors

In the First Place

• Microcontrollers (µC) and Microprocessors (µP)

Provide a Higher Level of Design Abstraction

– Most µC functions can be implemented using VHDL

or Verilog - downsides are parallelism & complexity

– Using C/C++ abstraction & serial execution make

certain functions much easier to implement in a µC

• Discrete µCs are Inexpensive and Widely Used

– µCs have years of momentum and software

designers have vast experience using them

www.xilinx.com5 - Embedded Processing using FPGAs

Why Embedded Design using

FPGAs

In Addition To The Universal Drive Towards Smaller Cheaper Faster With Less Power….

Difficult to Find the Required Mix of Peripherals in Off theShelf (OTS) Microcontroller Solutions

• Selecting a Single Processor Core with Long Term Solution Viability is Difficult at Best

• Without Direct Ownership of the Processing Solution, Obsolescence is Always a Concern

• Many Microprocessor Based Solutions Provide Limited On-Chip Peripheral Support

1

2

3

4

www.xilinx.com6 - Embedded Processing using FPGAs

Rarely the Ideal Mix

• Difficult to Find the Required Mix of Peripherals in

Off the Shelf (OTS) Microcontroller Solutions

– Even with variants, compromises must still be made

– Must decide between inventory impact of multiple

configurations or device cost of comprehensive IP

2@ UART

TIMER

I2C

SPI

GPIO

FLASH

DDR SDRAM

System

Requirements

?CPU Core

CPU Core

FLASHFLASH

RAMRAM

GPIOGPIO

UARTUART

UARTUART

SPISPI

TimerTimer

Microcontroller #1

Lacks I2C & Includes

RAM vs DDR SDRAM

CPU Core

CPU Core

FLASHFLASH

DDRDDR

UARTUART

CANCAN

SPISPI

TimerTimer

GPIOGPIO

I2CI2C

Microcontroller #2

Lacks a Second UART &

Includes Unnecessary IP

1

www.xilinx.com7 - Embedded Processing using FPGAs

Changing Requirements

• Selecting a Single Processor Core with Long

Term Solution Viability is Difficult at Best

– Future proofing system requirements is difficult at best

– Changing processor cores to accommodate new

requirements consumes valuable design resources

150 MHz

2@ UART

TIMER

I2C

SPI

GPIO

FLASH

DDR SDRAM

10/100 Ethernet

Future System

Requirements

?100 MHz Core

100 MHz Core

FLASHFLASH

DDRDDR

UARTUART

UARTUART

SPISPI

TimerTimer

GPIOGPIO

I2CI2C

Microcontroller

Completely meets current

system requirements

100 MHz

2@ UART

TIMER

I2C

SPI

GPIO

FLASH

DDR SDRAM

Today’s System

Requirements

2

www.xilinx.com8 - Embedded Processing using FPGAs

Here Today, Gone Tomorrow

• Without Direct Ownership of the Processing

Solution, Obsolescence is Always a Concern

– A single core is used to create a family of µCs

– The sheer number of µC configurations can lead to

obsolescence of lower volume configurations/variants

3

CPU Core

CPU Core

FLASHFLASH

RAMRAM

GPIOGPIO

CANCAN

UARTUART

SPISPI

TimerTimer

Microcontroller Variant #1 - High Volume Automotive

CPU Core

CPU Core

FLASHFLASH

RAMRAM

GPIOGPIO

SPISPI

UARTUART

SPISPI

TimerTimer

Microcontroller Variant #2 - Moderate Volume Consumer

CPU Core

CPU Core

FLASHFLASH

RAMRAM

GPIOGPIO

I2CI2C

UARTUART

SPISPI

TimerTimer

Microcontroller Variant #3 - Lower Volume Niche

?

www.xilinx.com9 - Embedded Processing using FPGAs

Chipset Solutions

• Many Microprocessor Based Solutions Provide

Limited On-Chip Peripheral Support

– Manufacturers create I/O devices to extend the

functionality of the base processor core

– An external, pre-defined interface between a µP and

its support device limits overall system performance

4

Microprocessor

Microprocessor

MPU Interface

MPU Interface

ParallelParallel

PWMPWM

SerialSerial

I/O Expansion Device

Microprocessor

Chip-Set

Pre-defined interface

limits performance

www.xilinx.com10 - Embedded Processing using FPGAs

FPGA Embedded Design

• FPGAs Allow For The Implementation Of An Ideal

Mix Of Peripherals And System Infrastructure

• New System Requirements Can Be Supported

Without Changing Core Design

• Longevity Of FPGAs Approaches The Longest

Available Microcontrollers In The Market

• FPGAs Are Used To Augment µp Functionality -

Absorbing The Core Is The Next Natural Step

1

2

3

4

www.xilinx.com11 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• A Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com12 - Embedded Processing using FPGAs

Processor Use Models

• Highest Integration,

Extensive Peripherals,

RTOS & Bus Structures

• Networking & Wireless

• High Performance

• Medium Cost, Some

Peripherals, Possible

RTOS & Bus Structures

• Control & Instrumentation

• Moderate Performance

• Lowest Cost, No

Peripherals, No RTOS

& No Bus Structures

• VGA & LCD Controllers

• Low/High Performance

1 2 3

State Machine Microcontroller Custom Embedded

Range of Use Models

www.xilinx.com13 - Embedded Processing using FPGAs

Xilinx Processor Solutions

1 2 3

State Machine Microcontroller Custom Embedded

Range of Solutions

www.xilinx.com14 - Embedded Processing using FPGAs

PicoBlaze 8-Bit Microcontroller

Reference Design• Free PicoBlaze macro for implementing

complex state machines and micro sequencers

• Supports Spartan-3, Virtex-4, Virtex-II/Pro FPGAs and CoolRunner-II CPLDs

• Minimal logic size

– Only 96 Spartan-3 slices, 5% of XC3S200 device

• High performance

– Up to 44 MIPS in Spartan-3 and 100 MIPS in Virtex-4

• 100% embedded capability

– Everything in FPGA – no external components required

• http://www.xilinx.com/picoblaze

www.xilinx.com15 - Embedded Processing using FPGAs

Instruction

Decode( includes

inst buffer )

Program

Counter Execution Unit( Add, Sub, Shift,

Logical, Multiply )

Register File( 32 X 32b )

I-Cache( 2KB to 64KB )

MicroBlaze

123 DMIPS @ 150 MHz in

Virtex-II Pro (0.82 DMIPs/MHz)

OnOn--Chip Peripheral Bus Chip Peripheral Bus -- OPBOPB

• 3-Stage Parallel Pipeline

• Optional Instruction and Data Caches – 2KB to 64KB Each

• Optional Multiply Unit

• 32 x 32-bit GPRs

• 100 MHz+ On-Chip Peripheral Bus

– 32-bit Semi-Duplex Bus

• Dedicated Fast Simplex Links - FSL

– 32-bit, Point-to-Point Links between GPRs in the Register File and User Logic

– 8 Input & 8 Output

– Blocking & Non-Blocking Transfers – Predefined Insts

D-Cache( 2KB to 64KB )

Firm IP Core

Remaining FPGA Fabric

Relatively Placed Macro (RPM)

• Local Memory Bus Interfaces - LMB

• Provide Access to Inst/Data Memory Without Bus AccessIn

struction-Side Bus Interface

Local Memory Bus -LMB

Instruction-Side Bus Interface

Local Memory Bus -LMB

Data-Side Bus Interface

Local Memory Bus -LMB

Data-Side Bus Interface

Local Memory Bus -LMB

FSLFSL

• Fast access “Cache Link” interface

Cache Link

www.xilinx.com16 - Embedded Processing using FPGAs

Ext DCR

FCM

PowerPCPowerPC

405 Core405 Core

ISOCM

ISPLB

DSPLB

DSOCM

APU

DCRDCR DCR

Control Control

Reset, Reset,

DebugDebug

PHYClient

PHY

ISOCM ISOCM

ControlControl

APUAPU

ControlControl

DSOCM DSOCM

ControlControl

Host

Statistics

Statistics

EMAC

EMAC

DCR

I/F

Client

Auxiliary

Processor

Unit

Auxiliary

Processor

Unit

Device

Configuration

Register

Interface

Device

Configuration

Register

Interface

Processor Local

Bus Interface

Processor Local

Bus Interface

10/100/1000

EMAC

Core

10/100/1000

EMAC

Core

10/100/1000

EMAC

Core

10/100/1000

EMAC

Core

PowerPC Processor Block

FPGA Logic

Processor Block

Instruction Side

On Chip Memory

Interface

Instruction Side

On Chip Memory

Interface

Data Side

On Chip Memory

Interface

Data Side

On Chip Memory

Interface

www.xilinx.com17 - Embedded Processing using FPGAs

Integrating a PowerPC 405

• IP-Immersion™– Advanced Technology Metal

“Headroom” Enables True IP Immersion

– IP-Immersion Tiles Provide Direct

Connectivity to Virtex-II Pro Fabric

• ActiveInterconnect™– Segmented Routing Enables High

Bandwidth Interface

– General Purpose Routing Passes

Over Hard IP (Hex & Long)

Metal 5

Metal 6

Metal 7

Metal 8

Metal 9

Silicon Substrate

Metal 1

Metal 2

Metal 3

Metal 4

Poly

Metal 1

Metal 2

Metal 3

Metal 4

Poly

Metal 1

Metal 2

Metal 3

Metal 4

Poly

Embedded

PowerPC 405

IP-Immersion

Tile

IP-Immersion

Tile

Replaces only 1,024 LUTs and 8 Block RAMs

Metal 5

Metal 6

Metal 7

Metal 8

Metal 9

2

Metal 1

Metal 2

Metal 3

Metal 4

1

1 2

www.xilinx.com18 - Embedded Processing using FPGAs

Virtex-II Pro Fabric

Code stored

in block RAM

Simple processor/fabric interface

uses minimal FPGA resources

PowerPC 405

ISOCM

ISOCM

DSOCM

DSOCM

JTAGJTAGResetReset

BRAMBRAMBRAMBRAM

JTAG

sys_clk

sys_rst

gpio_in

gpio_out

Easy to use 5-port HDL module

UltraController

www.xilinx.com19 - Embedded Processing using FPGAs

FPGA FabricCode Loaded

and stored

in Cache

Simple processor/fabric interface

uses minimal FPGA resources

PowerPC 405

ISOCM

ISOCM

DSOCM

DSOCM

J

T

A

G

J

T

A

G

R

e

s

e

t

R

e

s

e

t

JTAG

sys_clk

sys_rst

gpio_in

gpio_out

Operates at maximum

CPU frequency

Easy to use HDL module

UltraController II

sys_rst_out

Interrupt

FPGA Fabric

www.xilinx.com20 - Embedded Processing using FPGAs

Virtex 5 Processor Block

• 5 x 2 Crossbar Connection (128-bit) including Arbiters– Configure through bit-stream or DCR

– Up to1:1 freq w/ processor

• 1 PLB master interface to FPGA fabric

• 2 PLB slave interfaces from FPGA fabric

• PLB Master/Slave ports– operate at diff freq

• 4 Thirty-two bit DMA Channels

• Memory Ctlr Interface to connect

soft logic based Memory controller

• Auxiliary Processing Unit Controller– 128 bit interface, pipelining supported

• DCR Controller Module

• CPM and Control Module

FCM

Interface

PLB-S0 I/F

PLB-S1 I/F

PLB-M I/F

Memory Ctlr I/F

CPM/

Control

APU

Control

440Core

IPLB

DRPLB

DWPLB

Processor Block

DCR

DCR Master

Interface

DMA

LocalLinkDMA

LocalLink

DMA

LocalLinkDMA

LocalLink

DCR Slave

Interface

Internal

Registers

www.xilinx.com21 - Embedded Processing using FPGAs

The System Infrastructure

3-State

Control

3-State

Control Bi-directional BusBi-directional Bus

SlaveSlave SlaveSlave

MasterMaster MasterMaster

Central ArbitrationCentral Arbitration

SlaveSlave SlaveSlave

MasterMaster MasterMaster MasterMaster

ArbArb

SlaveSlave

ArbArb

SlaveSlave

MasterMaster

Bus infrastructure ranges from logic efficient to maximum bandwidth

Bi-directional BusSingle channel of communication,

least amount of logic required and

moderate routing requirement

Centrally ArbitratedFull duplex communication,

moderate amount of logic required

and minimal routing requirement

Slave-side ArbitrationN-channels of communication,

most expensive logic requirement

and extensive routing required

www.xilinx.com22 - Embedded Processing using FPGAs

System Peripherals - IP

ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter

SPI

EEPROM

SPI

EEPROM UARTUARTDDR

SDRAM

DDR

SDRAMWatchdog

Timer

Watchdog

Timer IntcIntc10/100

Ethernet

10/100

Ethernet

Infrastructure

I/O Based Peripherals

Memory Interfaces

High Level Functions

Slower peripherals such as I/O

functions are typically placed

on secondary bus structures

High speed memory and system

interfaces are typically placed on

primary bus structures

www.xilinx.com23 - Embedded Processing using FPGAs

PowerPC 405

ISOCM

ISOCM

DSOCM

DSOCM

GPIOGPIO UARTUART10/100

Ethernet

10/100

Ethernet

User PeripheralUser Peripheral

IPIF

(master or

slave)

IPIF

(master or

slave)

User LogicUser Logic

BRAMBRAM

PCI 32/33PCI 32/3310/100/1G

Ethernet

10/100/1G

EthernetMemory

Controller

Memory

Controller

DDR SDRAMDDR SDRAM

SDRAMSDRAM

MemoryMemory

ZBT SSRAMZBT SSRAM

High Levels of Integration

Full System Customization & High Performance

Data Control Register Bus - DCR

Instruction Data

ArbiterProcessor Local Bus - PLB Secondary PLBArbiter Bridge

JTAG

Block

JTAG

Block

System

Reset

System

Reset

www.xilinx.com24 - Embedded Processing using FPGAs

FPGA Fabric

JTAG

Block

JTAG

Block

System

Reset

System

Reset

Dual Port

Block RAM

Dual Port

Block RAMInstruction-Side

Local Memory Bus

Data-Side Local

Memory Bus

Ethernet

or CAN

Ethernet

or CAN SPISPIUART 7UART 7UART1UART1Memory

Controller

Memory

Controller

ZBT SSRAMZBT SSRAM

SDRAMSDRAMDDR SDRAMDDR SDRAM

User

Peripheral

User

Peripheral

IPIFIPIF

Complete Customization

MicroBlaze

Inst LMB

Inst LMB

Data LMB

Data LMB

ArbiterProcessor Local Bus - PLB

www.xilinx.com25 - Embedded Processing using FPGAs

Gigabit System Reference

Design (GSRD)• Designed for maximum TCP/IP

performance with the PowerPC 405

• GSRD building blocks– LocalLink Gigabit Ethernet MAC

– Communications DMA Controller

– Multi Port Memory Controller

• High Performance TCP transmit – Over 900Mbps

(with Treck TCP/IP stack)

• Applications– High performance bridging

– TCP/IP and user data interfaces

– IP over GigE Ethernet Switching

– High Speed Remote Monitoring

– Remote Image/Video Capture

www.xilinx.com26 - Embedded Processing using FPGAs

Multiported Memory Controller

www.xilinx.com27 - Embedded Processing using FPGAs

MicroBlaze or PowerPC

Multi Port

Memory

Controller

Local Memory

GPIOGPIOCAN/MOST

PCI

Custom

Coprocessors

Ethernet MAC

Interrupt

Controller

Timer/PWM

I2C/SPI

UART

Generic Peripheral

ControllerGPIO

DMA

Custom I/O

Peripherals

Virtex or Spartan FPGA

JTAG

Debug

Today You Have Multiple Topology Choices

PLBv46

Point-to-Point

Connections for

higher performance

Shared Bus for

smaller area

Trace

GPIOGPIOUSB 2.0

www.xilinx.com28 - Embedded Processing using FPGAs

Multicore Topologies

PPC405

Receive FIFO

XPS_INTC

Interrupt

PLBv46

_Data_IRQ_A

XPS_INTC

XPS_Mailbox Core

PLBv46

DPLB2

IPLB2

MPMC

(External Memory)

LL

PLB

PLB

PLB

XCL

XCL

DPLB

IPLB

MicroBlaze

Interrupt

DXCL

IXCLIPLB

DPLB

IPLB

Send FIFO

_Data_IRQ_B

XPS_Mutex Core

Mutex array

XPS_LLTEMAC

Free

Other LL

Peripheral

LL

Data_IRQ_A Data_IRQ_B

XPS BRAM

Private boot memory

XPS BRAM

Private boot memory

XPS BRAM

Shared external memory

Shared BRAM memory

PLB_v46 BRIDGE

Processor Subsystem 2

Other XPS

Peripherals

UART, GPIO, TIMER

Processor Subsystem 1

Other XPS

Peripherals

UART, GPIO, TIMER

Memory Map. SW

responsibility for non-

conflicting resource partition

Pass small sized

messages between a

sender and a receiver

Use synchronous-polling or

assync interrupts

You can still use shared

memory techniques

with BRAM Configure number of

mutex registers to test

and set

CPUs with shared

resources need to sync.

Partitioning clocking

domains

Treatment of CPU and

system reset

www.xilinx.com29 - Embedded Processing using FPGAs

Virtex 5 Processor Block

• 5 x 2 Crossbar Connection (128-bit) including Arbiters– Configure through bit-stream or DCR

– Up to1:1 freq w/ processor

• 1 PLB master interface to FPGA fabric

• 2 PLB slave interfaces from FPGA fabric

• PLB Master/Slave ports– operate at diff freq

• 4 Thirty-two bit DMA Channels

• Memory Ctlr Interface to connect

soft logic based Memory controller

• Auxiliary Processing Unit Controller– 128 bit interface, pipelining supported

• DCR Controller Module

• CPM and Control Module

FCM

Interface

PLB-S0 I/F

PLB-S1 I/F

PLB-M I/F

Memory Ctlr I/F

CPM/

Control

APU

Control

440Core

IPLB

DRPLB

DWPLB

Processor Block

DCR

DCR Master

Interface

DMA

LocalLinkDMA

LocalLink

DMA

LocalLinkDMA

LocalLink

DCR Slave

Interface

Internal

Registers

www.xilinx.com30 - Embedded Processing using FPGAs

FPGA logicFPGA logic FPGA logicFPGA logic

FPGA logicFPGA logic

Significant reduction in amount of FPGA logic required to builSignificant reduction in amount of FPGA logic required to build a system d a system

using integrated functionality of Virtexusing integrated functionality of Virtex--5 Processor Block 5 Processor Block

www.xilinx.com31 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• The Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com33 - Embedded Processing using FPGAs

Two Ways to meet

Performance Requirements

Brut Force

µCµC

5 GHz

FPGAFPGA

CPU HW

Accele

rator

5 MHz

Intelligent

APU or FSL

www.xilinx.com34 - Embedded Processing using FPGAs

Co-Processor Interfacing

FPGA, ASIC

or ASSP

FPGA, ASIC

or ASSPµCµCConventional

approach

Xilinx

Solutions

Bus

Interface

Co-Processor /

Peripheral

FSL

Co-Processor /

Peripheral

APU

• Fast

• Deterministic

• Easy

implementation

www.xilinx.com35 - Embedded Processing using FPGAs

LogicLogicLogicAPU

Controller

APUAPU

ControllerController

Bus

Interface

Logic

Bus Bus

InterfaceInterface

LogicLogic

LogicLogicLogic

Conventional vs. APU

Implementations

ConventionalConventional

Approach Approach

Virtex-II PRO

Overhead required to connect soft logicOverhead required to connect soft logic

APU centric APU centric

approachapproach

Virtex-4

Benefits• Less overhead logic

• Faster transfer

• Efficient software implementation

BUSBUSBUSCo-Processor

Interface

CoCo--ProcessorProcessor

InterfaceInterfacePowerPCPowerPCPowerPC

PowerPCPowerPCPowerPC

www.xilinx.com36 - Embedded Processing using FPGAs

Bus Cycle Improvement via APU

Reduce number of bus cycles by factor of 10 by using APUReduce number of bus cycles by factor of 10 by using APUReduce number of bus cycles by factor of 10 by using APU

With APUWith APU

WithoutWithout

APUAPU

ExecutionExecution

ExecutionExecution Res

Read ResultRead Result Read StatusRead StatusWrite Operand 2Write Operand 2Write Operand 1Write Operand 1

WrWr

www.xilinx.com

Up to 13x Performance BoostPPC 405 Single Precision FPU

4x0.37 us1.66 usPID Loop

4x5.42 ms19.48 ms1024pt FFT

9x27.93 ms 248.38 msVideo Editing Algorithm

13x63.94 ms846.09 msFIR Filter

Single Precision

FPU

Software Emulation of

Floating Pt *

Speed Up

Performance

Algorithm

(*C-Code is compiled using IBM Performance Libs delivered with EDK 8.2i)

www.xilinx.com38 - Embedded Processing using FPGAs

Floating Point Accelerated by APU

PLB + FPU 21 MFlopsPLB + FPU PLB + FPU 21 MFlops21 MFlops

APU + FPU 45 MFlops (est)APU + FPUAPU + FPU 45 MFlops (est)45 MFlops (est)

OCM + FPU 30 MFlopsOCM + FPUOCM + FPU 30 MFlops30 MFlops

APU improves algorithm execution 20X over software emulationAPU improves algorithm execution 20X over software emulationAPU improves algorithm execution 20X over software emulation

• Algorithm description - FPU w/ FIR filter

• PowerPC operating frequency – 400 MHz

• APU with direct interface to fabric- XtremeDSP blocks

SWSW 2MFlops2MFlops

www.xilinx.com39 - Embedded Processing using FPGAs

Accelerated MPEG De-Compression

• Leverages Virtex-4 Integrated Features– PowerPC, APU, XtremeDSP Slice

– Software Emulation – 13000 IDCTs/sec

• HW acceleration over software– Lower latency and high bandwidth

• Efficient HW/SW design partitioning– Optimized implementation

• Significant performance increase– 20x improvement over software

– Utilizing APU and ExtremeDSP

Soft

Auxiliary

Processor

Soft Soft

AuxiliaryAuxiliary

ProcessorProcessor

APU

Controller

APUAPU

ControllerControllerPowerPCPowerPCPowerPC

FPGA FabricFPGA Fabric

Hard Processor BlockHard Processor Block

OCMOCM

PLBPLB

APU

I/F

FPGAInterface

ExtermeDSPExtermeDSP

ExtremeDSPExtremeDSP

ExtremeDSPExtremeDSP

Example: Video application - IDCT MPEG de-compression

www.xilinx.com40 - Embedded Processing using FPGAs

Tailor the System to Achieve

Performance

1x 2x 8x 10x

100MHz MicroBlazeTM,

pure software

= 146 seconds

100MHz MicroBlazeTM,

pure software

= 146 seconds

50x 100x20x

100MHz MicroBlazeTM

+FSL +DCT + IMDCT + LL MAC

= 7 seconds

100MHz MicroBlazeTM

+FSL +DCT + IMDCT + LL MAC

= 7 seconds

21X

1X

100MHz MicroBlazeTM

+FSL+ LL MAC

= 9 seconds

100MHz MicroBlazeTM

+FSL+ LL MAC

= 9 seconds

16X

Custom Hardware Logic

Take MP3 Decoding with Custom Hardware Logic

Performance Improvement

IMDCT DCT LL MAC

Note: MicroBlazeTM v4.00 core, ML40x board, 100MHz system clock, EDK8.1

www.xilinx.com41 - Embedded Processing using FPGAs

An Example:

MicroBlazeTM FPU in Motor Control

MicroBlaze

w/FPU

BRAM

Fault

Feedback

Pow

er Stage

Motor

Contrl.

Quad

Decoder

HALL

DecoderIsolation

UART

Memory

Interface

Ethernet

IIC

CAN

SPI

GPIO

XC3S1500

Extra cycles needed

for specialized peripherals

1XOver 5,300MicroBlazeTM

without FPU

Over 15XOver 82,000MicroBlazeTM

with FPU

ImprovementFunction

Calls/SecondConfiguration

• Design implications:

– You will get more done for a given clock frequency

– Reducing the clock frequency (if possible) reduces power dissipation

Benefit:

The customer now have the extra

processor bandwidth for use on other

things.

Note: Used MicroBlazeTM v4.00 core on Spartan 3S1500.

www.xilinx.com42 - Embedded Processing using FPGAs

Auxiliary Processing Solutions

405 Core

APU

ControlAuxiliary

Processor

Auxiliary

Processor

Fabric

Control

Module

Interface

PLB

OCM

Hard Processor Block

FPGA Fabric

APU I/F

Floating Point

Unit

FSL

-V4

C to HDL

Impulse

Technologies Celoxica Solution

Poseidon Design

Systems

Application

Specific

General

Solution

www.xilinx.com43 - Embedded Processing using FPGAs

C to HDL Tools

• Create Code

• Profile Code

• Identify Critical Code Segments

• Translate Critical Code to HDL

• Build HW Accelerators from HDL

• Replace Critical Code with Calls

to HW Accelerators

ProfileCode

HDL

Code

Accelerator Call

www.xilinx.com46 - Embedded Processing using FPGAs

Impulse C Programming Model• System of Independent sequential execution units

– Called Processes

– Can be executed in HW or SW and run concurrently

• Processes communicate with each other – Via self-synchronizing FIFO buffers called streams, or

– Via shared memories

Sequential execution only Producer

Function

SW FPGA FabricFaster

Execution

Stream

Concurrent Software and Hardware execution

PROCESS

Consumer

Function

Critical

FunctionPROCESS

PROCESS

Stream

Co Developer™

Producer

Function

SW Only

Consumer

Function

Function

Function

Function

Critical

Function

www.xilinx.com47 - Embedded Processing using FPGAs

Impulse C-HDL Tool Flow

www.xilinx.com48 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• A Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com49 - Embedded Processing using FPGAs

What is Partial Reconfiguration?

• Dynamically modify blocks of logic in an FPGA by

downloading partial bit files while unmodified logic

continues to operate

• Logic is LUTs, FFs, BRAMs, DSP blocks, MGTs...

FPGA

Reconfig

Block

www.xilinx.com50 - Embedded Processing using FPGAs

PR Applications Today• JTRS Software Defined Radio

– SWaP and multi-channel

concurrency

• Microwave Communications

– Flexibility and extended functionality

• Cryptography

– Physical isolation in one FPGA

www.xilinx.com51 - Embedded Processing using FPGAs

Core C

onnect

D /A Interface (14 bits -- 300 MHz)

A/D Interface (12 bits -- 200 MHz)

TX NCO RX NCO

Interpolation filter Decimation filter

Waveform combiner

Ethernet interface

DAC ADC

Ethernet Switch

RJ45 – 100BT

(Data)

RJ45 – 100BT

(C tlr)

RJ45 – 100BT

(Voice)

WAVEFORM A

WAVEFORM B

SDR Development Platform

www.xilinx.com52 - Embedded Processing using FPGAs

PR Application: Dynamic Network

Interface Protocol Switching

FPGA

10 GigE tx/rx

OC48 tx/rx

Fibre tx/rx

Custom tx/rx

Switch

uP

10 GigE tx/rx

OC48 tx/rx

Fibre tx/rx

Custom tx/rx

Config Memory Storage

www.xilinx.com54 - Embedded Processing using FPGAs

Partial Reconfiguration Applications

• Reduce FPGA size by multiplexing FPGA hardware resources

CPU

Wavefo

rm A

Wavefo

rm B

Wavefo

rm C

Memory

Reconfiguration PortInternal configuration access port

External Memorywith full system configuration

and PR bit streams

Wavefo

rm D

www.xilinx.com55 - Embedded Processing using FPGAs

Agenda

• Why FPGA Platform Based Embedded

Processing

• Embedded Use Models And Their FPGA Based

Solutions

• Architecture/Topology Choices

• A Reoccurring Question: Hardware Or Software

• Reconfigurable Hardware

• Tool Flows For FPGA Based Embedded Systems

www.xilinx.com56 - Embedded Processing using FPGAs

Necessary Tools

• A Full Complement of Tools are Required to

Design an Embedded Processor System

– Processor System Generation

– Hardware Implementation & Software Compilation

– Hardware & Software Debug

• HW Implementation & SW Compilation are the

two Main Flows that Must be Addressed

– The Embedded Flows Should Mirror Traditional Flows

www.xilinx.com57 - Embedded Processing using FPGAs

System Generation

CoreCore

clock

reset

interrupts

ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter

FLASHFLASH TimerTimer UARTUART

SRAMSRAM

?

Tools are needed to create the system

infrastructure, connect peripherals and

configure the peripherals as desired

Once the system is defined, the hardware can

be implemented and software compiled

Databits = 8

Baudrate = 9600

Parity = Off

www.xilinx.com58 - Embedded Processing using FPGAs

www.xilinx.com61 - Embedded Processing using FPGAs

Co-processor Creation

Auto-generated HW & SW Templates

HW instantiated

SW Drivers created

Test application code

www.xilinx.com62 - Embedded Processing using FPGAs

Hardware Implementation

Simulation/Synthesis

Build & Map

Place & Route

Standard FPGA

HW Development Flow

VHDL or Verilog

Configuration File

SynthesisSynthesis

Build & MapBuild & Map

Place & RoutePlace & Route

Source Code

Constraints

Memory Map for

Internal Code Storage

System Netlist

Merged &

Mapped Design

Configuration File

www.xilinx.com63 - Embedded Processing using FPGAs

Eclipse Based SDK for the Software Savvy Engineer

Powerful Software Development• Eclipse open source Software IDE

allows an industry standard method for

seamless tool integration

• Consists of:

• Perspectives

• Views

• Editors

• Environment window displays one or

more perspectives

• A perspective contains views (like

the Navigator) and editors

www.xilinx.com64 - Embedded Processing using FPGAs

• Device drivers are provided for each hardware device

• Device drivers are written in ANSI C and are designed to be portable across processors

• Device drivers allow the user to select the desired functionality to minimize the required memory

• Device drivers in all layers have common characteristics and API

Board Support

Package (BSP)

Boot Code

Initialization Code

Ethernet 10/100

Device Driver

UART 16550

Device Driver

IIC Master & Slave

Device Driver

ATM Utopia Device

Driver

Peripheral n, n+1…

Device Drivers

RTOS Adaptation Layer

Software IP

www.xilinx.com65 - Embedded Processing using FPGAs

XIntc_mEnableIntr ( … );

...

Board Support Package

ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter

SPI

EEPROM

SPI

EEPROMDDR

SDRAM

DDR

SDRAM UARTUART TimerTimer IntcIntc10/100

Ethernet

10/100

Ethernet

XEmac_mWriteReg ( … );

XEmac_SendFrame ( … );

...

xemac.h

XUartLite_SendByte ( … );

XUartLite_GetStats ( … );

...

XTmrCtr_mWriteReg ( … );

XTmrCtr_IsExpired ( … );

...

XSpi_mSendByte ( … );

XSpi_GetStatusReg ( … );

...

xtmrctr.h

xspi.h

uartlite.h

libxil.a

XEmac_mWriteReg ( … );

XEmac_SendFrame ( … );

...

XSpi_mSendByte ( … );

XSpi_GetStatusReg ( … );

...

XUartLite_SendByte ( … );

XUartLite_GetStats ( … );

...

XTmrCtr_mWriteReg ( … );

XTmrCtr_IsExpired ( … );

XIntc_mEnableIntr ( … );

...

Board Support Packages (BSPs) are

collections of parameterized drivers

for a specific processor system

www.xilinx.com66 - Embedded Processing using FPGAs

Software Compilation

C/C++ Cross Compiler

Assembler

Standard Embedded

SW Development Flow

C Code

Linker

Machine Code

CompilerCompiler

AssemblerAssembler

LinkerLinker

Source Code

Source Code

Mapfile and

Linker Script

Assembly Code

Relocatable

Object Code

Machine Code

www.xilinx.com67 - Embedded Processing using FPGAs

Data2MEM

Download Combined

Image to FPGA

Compiled ELF Compiled BIT

RTOS, Board Support Package

Embedded

Developers Kit

System Design Flow

Instantiate the

‘System Netlist’

and Implement

the FPGA

?

HDL Entry

Simulation/Synthesis

Implementation

Download Bitstream

Into FPGA

Chipscope

Standard FPGA

HW Development Flow

VHDL or Verilog

System NetlistInclude the BSP

and Compile the

Software Image

?

Code Entry

C/C++ Cross Compiler

Linker

Load Software

Into FLASH

Debugger

Standard Embedded

SW Development Flow

C Code

Board Support

Package

12 3 Compiled BITCompiled ELF

www.xilinx.com68 - Embedded Processing using FPGAs

HW & SW Debug

CoreCore

clock

reset

interrupts

ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter

FLASHFLASH TimerTimerSRAMSRAM UARTUART Integrated

Logic Analyzer

Integrated

Logic Analyzer

Integrated

Bus Analyzer

Integrated

Bus Analyzer

Debug Port

JTAG Port

SW debug typically

consists of JTAG based

runtime control of the

target processor core

Internal & external logic analysis is

used to determine where problems reside

External Connection

www.xilinx.com69 - Embedded Processing using FPGAs

“Platform Debug”Integrated HW/SW Debuggers

• Cross Trigger HW and

SW Debuggers to Find

and Fix Bugs Faster!

• Enable better

insight into the HW /

SW code dynamics

www.xilinx.com72 - Embedded Processing using FPGAs

Key Messages

• FPGA Based Embedded Systems Solve Classic Integration

Issues (Size, Cost, Power)

• They Also Creating Novel Solutions Not Easily Realized Using

Other Technologies

• FPGA Platforms Enable Flexible and Dynamic Systems

• Intelligent Tools Accelerate Development And Enable You To

Optimize The Balance Of Feature Set, Performance, Area & Cost

Of Your Design

www.xilinx.com73 - Embedded Processing using FPGAs

Q&A