embedded processing using fpgas -...
TRANSCRIPT
www.xilinx.com2 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• A Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com3 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• A Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com4 - Embedded Processing using FPGAs
Why Use Processors
In the First Place
• Microcontrollers (µC) and Microprocessors (µP)
Provide a Higher Level of Design Abstraction
– Most µC functions can be implemented using VHDL
or Verilog - downsides are parallelism & complexity
– Using C/C++ abstraction & serial execution make
certain functions much easier to implement in a µC
• Discrete µCs are Inexpensive and Widely Used
– µCs have years of momentum and software
designers have vast experience using them
www.xilinx.com5 - Embedded Processing using FPGAs
Why Embedded Design using
FPGAs
In Addition To The Universal Drive Towards Smaller Cheaper Faster With Less Power….
Difficult to Find the Required Mix of Peripherals in Off theShelf (OTS) Microcontroller Solutions
• Selecting a Single Processor Core with Long Term Solution Viability is Difficult at Best
• Without Direct Ownership of the Processing Solution, Obsolescence is Always a Concern
• Many Microprocessor Based Solutions Provide Limited On-Chip Peripheral Support
1
2
3
4
www.xilinx.com6 - Embedded Processing using FPGAs
Rarely the Ideal Mix
• Difficult to Find the Required Mix of Peripherals in
Off the Shelf (OTS) Microcontroller Solutions
– Even with variants, compromises must still be made
– Must decide between inventory impact of multiple
configurations or device cost of comprehensive IP
2@ UART
TIMER
I2C
SPI
GPIO
FLASH
DDR SDRAM
System
Requirements
?CPU Core
CPU Core
FLASHFLASH
RAMRAM
GPIOGPIO
UARTUART
UARTUART
SPISPI
TimerTimer
Microcontroller #1
Lacks I2C & Includes
RAM vs DDR SDRAM
CPU Core
CPU Core
FLASHFLASH
DDRDDR
UARTUART
CANCAN
SPISPI
TimerTimer
GPIOGPIO
I2CI2C
Microcontroller #2
Lacks a Second UART &
Includes Unnecessary IP
1
www.xilinx.com7 - Embedded Processing using FPGAs
Changing Requirements
• Selecting a Single Processor Core with Long
Term Solution Viability is Difficult at Best
– Future proofing system requirements is difficult at best
– Changing processor cores to accommodate new
requirements consumes valuable design resources
150 MHz
2@ UART
TIMER
I2C
SPI
GPIO
FLASH
DDR SDRAM
10/100 Ethernet
Future System
Requirements
?100 MHz Core
100 MHz Core
FLASHFLASH
DDRDDR
UARTUART
UARTUART
SPISPI
TimerTimer
GPIOGPIO
I2CI2C
Microcontroller
Completely meets current
system requirements
100 MHz
2@ UART
TIMER
I2C
SPI
GPIO
FLASH
DDR SDRAM
Today’s System
Requirements
2
www.xilinx.com8 - Embedded Processing using FPGAs
Here Today, Gone Tomorrow
• Without Direct Ownership of the Processing
Solution, Obsolescence is Always a Concern
– A single core is used to create a family of µCs
– The sheer number of µC configurations can lead to
obsolescence of lower volume configurations/variants
3
CPU Core
CPU Core
FLASHFLASH
RAMRAM
GPIOGPIO
CANCAN
UARTUART
SPISPI
TimerTimer
Microcontroller Variant #1 - High Volume Automotive
CPU Core
CPU Core
FLASHFLASH
RAMRAM
GPIOGPIO
SPISPI
UARTUART
SPISPI
TimerTimer
Microcontroller Variant #2 - Moderate Volume Consumer
CPU Core
CPU Core
FLASHFLASH
RAMRAM
GPIOGPIO
I2CI2C
UARTUART
SPISPI
TimerTimer
Microcontroller Variant #3 - Lower Volume Niche
?
www.xilinx.com9 - Embedded Processing using FPGAs
Chipset Solutions
• Many Microprocessor Based Solutions Provide
Limited On-Chip Peripheral Support
– Manufacturers create I/O devices to extend the
functionality of the base processor core
– An external, pre-defined interface between a µP and
its support device limits overall system performance
4
Microprocessor
Microprocessor
MPU Interface
MPU Interface
ParallelParallel
PWMPWM
SerialSerial
I/O Expansion Device
Microprocessor
Chip-Set
Pre-defined interface
limits performance
www.xilinx.com10 - Embedded Processing using FPGAs
FPGA Embedded Design
• FPGAs Allow For The Implementation Of An Ideal
Mix Of Peripherals And System Infrastructure
• New System Requirements Can Be Supported
Without Changing Core Design
• Longevity Of FPGAs Approaches The Longest
Available Microcontrollers In The Market
• FPGAs Are Used To Augment µp Functionality -
Absorbing The Core Is The Next Natural Step
1
2
3
4
www.xilinx.com11 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• A Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com12 - Embedded Processing using FPGAs
Processor Use Models
• Highest Integration,
Extensive Peripherals,
RTOS & Bus Structures
• Networking & Wireless
• High Performance
• Medium Cost, Some
Peripherals, Possible
RTOS & Bus Structures
• Control & Instrumentation
• Moderate Performance
• Lowest Cost, No
Peripherals, No RTOS
& No Bus Structures
• VGA & LCD Controllers
• Low/High Performance
1 2 3
State Machine Microcontroller Custom Embedded
Range of Use Models
www.xilinx.com13 - Embedded Processing using FPGAs
Xilinx Processor Solutions
1 2 3
State Machine Microcontroller Custom Embedded
Range of Solutions
www.xilinx.com14 - Embedded Processing using FPGAs
PicoBlaze 8-Bit Microcontroller
Reference Design• Free PicoBlaze macro for implementing
complex state machines and micro sequencers
• Supports Spartan-3, Virtex-4, Virtex-II/Pro FPGAs and CoolRunner-II CPLDs
• Minimal logic size
– Only 96 Spartan-3 slices, 5% of XC3S200 device
• High performance
– Up to 44 MIPS in Spartan-3 and 100 MIPS in Virtex-4
• 100% embedded capability
– Everything in FPGA – no external components required
• http://www.xilinx.com/picoblaze
www.xilinx.com15 - Embedded Processing using FPGAs
Instruction
Decode( includes
inst buffer )
Program
Counter Execution Unit( Add, Sub, Shift,
Logical, Multiply )
Register File( 32 X 32b )
I-Cache( 2KB to 64KB )
MicroBlaze
123 DMIPS @ 150 MHz in
Virtex-II Pro (0.82 DMIPs/MHz)
OnOn--Chip Peripheral Bus Chip Peripheral Bus -- OPBOPB
• 3-Stage Parallel Pipeline
• Optional Instruction and Data Caches – 2KB to 64KB Each
• Optional Multiply Unit
• 32 x 32-bit GPRs
• 100 MHz+ On-Chip Peripheral Bus
– 32-bit Semi-Duplex Bus
• Dedicated Fast Simplex Links - FSL
– 32-bit, Point-to-Point Links between GPRs in the Register File and User Logic
– 8 Input & 8 Output
– Blocking & Non-Blocking Transfers – Predefined Insts
D-Cache( 2KB to 64KB )
Firm IP Core
Remaining FPGA Fabric
Relatively Placed Macro (RPM)
• Local Memory Bus Interfaces - LMB
• Provide Access to Inst/Data Memory Without Bus AccessIn
struction-Side Bus Interface
Local Memory Bus -LMB
Instruction-Side Bus Interface
Local Memory Bus -LMB
Data-Side Bus Interface
Local Memory Bus -LMB
Data-Side Bus Interface
Local Memory Bus -LMB
FSLFSL
• Fast access “Cache Link” interface
Cache Link
www.xilinx.com16 - Embedded Processing using FPGAs
Ext DCR
FCM
PowerPCPowerPC
405 Core405 Core
ISOCM
ISPLB
DSPLB
DSOCM
APU
DCRDCR DCR
Control Control
Reset, Reset,
DebugDebug
PHYClient
PHY
ISOCM ISOCM
ControlControl
APUAPU
ControlControl
DSOCM DSOCM
ControlControl
Host
Statistics
Statistics
EMAC
EMAC
DCR
I/F
Client
Auxiliary
Processor
Unit
Auxiliary
Processor
Unit
Device
Configuration
Register
Interface
Device
Configuration
Register
Interface
Processor Local
Bus Interface
Processor Local
Bus Interface
10/100/1000
EMAC
Core
10/100/1000
EMAC
Core
10/100/1000
EMAC
Core
10/100/1000
EMAC
Core
PowerPC Processor Block
FPGA Logic
Processor Block
Instruction Side
On Chip Memory
Interface
Instruction Side
On Chip Memory
Interface
Data Side
On Chip Memory
Interface
Data Side
On Chip Memory
Interface
www.xilinx.com17 - Embedded Processing using FPGAs
Integrating a PowerPC 405
• IP-Immersion™– Advanced Technology Metal
“Headroom” Enables True IP Immersion
– IP-Immersion Tiles Provide Direct
Connectivity to Virtex-II Pro Fabric
• ActiveInterconnect™– Segmented Routing Enables High
Bandwidth Interface
– General Purpose Routing Passes
Over Hard IP (Hex & Long)
Metal 5
Metal 6
Metal 7
Metal 8
Metal 9
Silicon Substrate
Metal 1
Metal 2
Metal 3
Metal 4
Poly
Metal 1
Metal 2
Metal 3
Metal 4
Poly
Metal 1
Metal 2
Metal 3
Metal 4
Poly
Embedded
PowerPC 405
IP-Immersion
Tile
IP-Immersion
Tile
Replaces only 1,024 LUTs and 8 Block RAMs
Metal 5
Metal 6
Metal 7
Metal 8
Metal 9
2
Metal 1
Metal 2
Metal 3
Metal 4
1
1 2
www.xilinx.com18 - Embedded Processing using FPGAs
Virtex-II Pro Fabric
Code stored
in block RAM
Simple processor/fabric interface
uses minimal FPGA resources
PowerPC 405
ISOCM
ISOCM
DSOCM
DSOCM
JTAGJTAGResetReset
BRAMBRAMBRAMBRAM
JTAG
sys_clk
sys_rst
gpio_in
gpio_out
Easy to use 5-port HDL module
UltraController
www.xilinx.com19 - Embedded Processing using FPGAs
FPGA FabricCode Loaded
and stored
in Cache
Simple processor/fabric interface
uses minimal FPGA resources
PowerPC 405
ISOCM
ISOCM
DSOCM
DSOCM
J
T
A
G
J
T
A
G
R
e
s
e
t
R
e
s
e
t
JTAG
sys_clk
sys_rst
gpio_in
gpio_out
Operates at maximum
CPU frequency
Easy to use HDL module
UltraController II
sys_rst_out
Interrupt
FPGA Fabric
www.xilinx.com20 - Embedded Processing using FPGAs
Virtex 5 Processor Block
• 5 x 2 Crossbar Connection (128-bit) including Arbiters– Configure through bit-stream or DCR
– Up to1:1 freq w/ processor
• 1 PLB master interface to FPGA fabric
• 2 PLB slave interfaces from FPGA fabric
• PLB Master/Slave ports– operate at diff freq
• 4 Thirty-two bit DMA Channels
• Memory Ctlr Interface to connect
soft logic based Memory controller
• Auxiliary Processing Unit Controller– 128 bit interface, pipelining supported
• DCR Controller Module
• CPM and Control Module
FCM
Interface
PLB-S0 I/F
PLB-S1 I/F
PLB-M I/F
Memory Ctlr I/F
CPM/
Control
APU
Control
440Core
IPLB
DRPLB
DWPLB
Processor Block
DCR
DCR Master
Interface
DMA
LocalLinkDMA
LocalLink
DMA
LocalLinkDMA
LocalLink
DCR Slave
Interface
Internal
Registers
www.xilinx.com21 - Embedded Processing using FPGAs
The System Infrastructure
3-State
Control
3-State
Control Bi-directional BusBi-directional Bus
SlaveSlave SlaveSlave
MasterMaster MasterMaster
Central ArbitrationCentral Arbitration
SlaveSlave SlaveSlave
MasterMaster MasterMaster MasterMaster
ArbArb
SlaveSlave
ArbArb
SlaveSlave
MasterMaster
Bus infrastructure ranges from logic efficient to maximum bandwidth
Bi-directional BusSingle channel of communication,
least amount of logic required and
moderate routing requirement
Centrally ArbitratedFull duplex communication,
moderate amount of logic required
and minimal routing requirement
Slave-side ArbitrationN-channels of communication,
most expensive logic requirement
and extensive routing required
www.xilinx.com22 - Embedded Processing using FPGAs
System Peripherals - IP
ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter
SPI
EEPROM
SPI
EEPROM UARTUARTDDR
SDRAM
DDR
SDRAMWatchdog
Timer
Watchdog
Timer IntcIntc10/100
Ethernet
10/100
Ethernet
Infrastructure
I/O Based Peripherals
Memory Interfaces
High Level Functions
Slower peripherals such as I/O
functions are typically placed
on secondary bus structures
High speed memory and system
interfaces are typically placed on
primary bus structures
www.xilinx.com23 - Embedded Processing using FPGAs
PowerPC 405
ISOCM
ISOCM
DSOCM
DSOCM
GPIOGPIO UARTUART10/100
Ethernet
10/100
Ethernet
User PeripheralUser Peripheral
IPIF
(master or
slave)
IPIF
(master or
slave)
User LogicUser Logic
BRAMBRAM
PCI 32/33PCI 32/3310/100/1G
Ethernet
10/100/1G
EthernetMemory
Controller
Memory
Controller
DDR SDRAMDDR SDRAM
SDRAMSDRAM
MemoryMemory
ZBT SSRAMZBT SSRAM
High Levels of Integration
Full System Customization & High Performance
Data Control Register Bus - DCR
Instruction Data
ArbiterProcessor Local Bus - PLB Secondary PLBArbiter Bridge
JTAG
Block
JTAG
Block
System
Reset
System
Reset
www.xilinx.com24 - Embedded Processing using FPGAs
FPGA Fabric
JTAG
Block
JTAG
Block
System
Reset
System
Reset
Dual Port
Block RAM
Dual Port
Block RAMInstruction-Side
Local Memory Bus
Data-Side Local
Memory Bus
Ethernet
or CAN
Ethernet
or CAN SPISPIUART 7UART 7UART1UART1Memory
Controller
Memory
Controller
ZBT SSRAMZBT SSRAM
SDRAMSDRAMDDR SDRAMDDR SDRAM
User
Peripheral
User
Peripheral
IPIFIPIF
Complete Customization
MicroBlaze
Inst LMB
Inst LMB
Data LMB
Data LMB
ArbiterProcessor Local Bus - PLB
www.xilinx.com25 - Embedded Processing using FPGAs
Gigabit System Reference
Design (GSRD)• Designed for maximum TCP/IP
performance with the PowerPC 405
• GSRD building blocks– LocalLink Gigabit Ethernet MAC
– Communications DMA Controller
– Multi Port Memory Controller
• High Performance TCP transmit – Over 900Mbps
(with Treck TCP/IP stack)
• Applications– High performance bridging
– TCP/IP and user data interfaces
– IP over GigE Ethernet Switching
– High Speed Remote Monitoring
– Remote Image/Video Capture
www.xilinx.com27 - Embedded Processing using FPGAs
MicroBlaze or PowerPC
Multi Port
Memory
Controller
Local Memory
GPIOGPIOCAN/MOST
PCI
Custom
Coprocessors
Ethernet MAC
Interrupt
Controller
Timer/PWM
I2C/SPI
UART
Generic Peripheral
ControllerGPIO
DMA
Custom I/O
Peripherals
Virtex or Spartan FPGA
JTAG
Debug
Today You Have Multiple Topology Choices
PLBv46
Point-to-Point
Connections for
higher performance
Shared Bus for
smaller area
Trace
GPIOGPIOUSB 2.0
www.xilinx.com28 - Embedded Processing using FPGAs
Multicore Topologies
PPC405
Receive FIFO
XPS_INTC
Interrupt
PLBv46
_Data_IRQ_A
XPS_INTC
XPS_Mailbox Core
PLBv46
DPLB2
IPLB2
MPMC
(External Memory)
LL
PLB
PLB
PLB
XCL
XCL
DPLB
IPLB
MicroBlaze
Interrupt
DXCL
IXCLIPLB
DPLB
IPLB
Send FIFO
_Data_IRQ_B
XPS_Mutex Core
Mutex array
XPS_LLTEMAC
Free
Other LL
Peripheral
LL
Data_IRQ_A Data_IRQ_B
XPS BRAM
Private boot memory
XPS BRAM
Private boot memory
XPS BRAM
Shared external memory
Shared BRAM memory
PLB_v46 BRIDGE
Processor Subsystem 2
Other XPS
Peripherals
UART, GPIO, TIMER
Processor Subsystem 1
Other XPS
Peripherals
UART, GPIO, TIMER
Memory Map. SW
responsibility for non-
conflicting resource partition
Pass small sized
messages between a
sender and a receiver
Use synchronous-polling or
assync interrupts
You can still use shared
memory techniques
with BRAM Configure number of
mutex registers to test
and set
CPUs with shared
resources need to sync.
Partitioning clocking
domains
Treatment of CPU and
system reset
www.xilinx.com29 - Embedded Processing using FPGAs
Virtex 5 Processor Block
• 5 x 2 Crossbar Connection (128-bit) including Arbiters– Configure through bit-stream or DCR
– Up to1:1 freq w/ processor
• 1 PLB master interface to FPGA fabric
• 2 PLB slave interfaces from FPGA fabric
• PLB Master/Slave ports– operate at diff freq
• 4 Thirty-two bit DMA Channels
• Memory Ctlr Interface to connect
soft logic based Memory controller
• Auxiliary Processing Unit Controller– 128 bit interface, pipelining supported
• DCR Controller Module
• CPM and Control Module
FCM
Interface
PLB-S0 I/F
PLB-S1 I/F
PLB-M I/F
Memory Ctlr I/F
CPM/
Control
APU
Control
440Core
IPLB
DRPLB
DWPLB
Processor Block
DCR
DCR Master
Interface
DMA
LocalLinkDMA
LocalLink
DMA
LocalLinkDMA
LocalLink
DCR Slave
Interface
Internal
Registers
www.xilinx.com30 - Embedded Processing using FPGAs
FPGA logicFPGA logic FPGA logicFPGA logic
FPGA logicFPGA logic
Significant reduction in amount of FPGA logic required to builSignificant reduction in amount of FPGA logic required to build a system d a system
using integrated functionality of Virtexusing integrated functionality of Virtex--5 Processor Block 5 Processor Block
www.xilinx.com31 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• The Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com33 - Embedded Processing using FPGAs
Two Ways to meet
Performance Requirements
Brut Force
µCµC
5 GHz
FPGAFPGA
CPU HW
Accele
rator
5 MHz
Intelligent
APU or FSL
www.xilinx.com34 - Embedded Processing using FPGAs
Co-Processor Interfacing
FPGA, ASIC
or ASSP
FPGA, ASIC
or ASSPµCµCConventional
approach
Xilinx
Solutions
Bus
Interface
Co-Processor /
Peripheral
FSL
Co-Processor /
Peripheral
APU
• Fast
• Deterministic
• Easy
implementation
www.xilinx.com35 - Embedded Processing using FPGAs
LogicLogicLogicAPU
Controller
APUAPU
ControllerController
Bus
Interface
Logic
Bus Bus
InterfaceInterface
LogicLogic
LogicLogicLogic
Conventional vs. APU
Implementations
ConventionalConventional
Approach Approach
Virtex-II PRO
Overhead required to connect soft logicOverhead required to connect soft logic
APU centric APU centric
approachapproach
Virtex-4
Benefits• Less overhead logic
• Faster transfer
• Efficient software implementation
BUSBUSBUSCo-Processor
Interface
CoCo--ProcessorProcessor
InterfaceInterfacePowerPCPowerPCPowerPC
PowerPCPowerPCPowerPC
www.xilinx.com36 - Embedded Processing using FPGAs
Bus Cycle Improvement via APU
Reduce number of bus cycles by factor of 10 by using APUReduce number of bus cycles by factor of 10 by using APUReduce number of bus cycles by factor of 10 by using APU
With APUWith APU
WithoutWithout
APUAPU
ExecutionExecution
ExecutionExecution Res
Read ResultRead Result Read StatusRead StatusWrite Operand 2Write Operand 2Write Operand 1Write Operand 1
WrWr
www.xilinx.com
Up to 13x Performance BoostPPC 405 Single Precision FPU
4x0.37 us1.66 usPID Loop
4x5.42 ms19.48 ms1024pt FFT
9x27.93 ms 248.38 msVideo Editing Algorithm
13x63.94 ms846.09 msFIR Filter
Single Precision
FPU
Software Emulation of
Floating Pt *
Speed Up
Performance
Algorithm
(*C-Code is compiled using IBM Performance Libs delivered with EDK 8.2i)
www.xilinx.com38 - Embedded Processing using FPGAs
Floating Point Accelerated by APU
PLB + FPU 21 MFlopsPLB + FPU PLB + FPU 21 MFlops21 MFlops
APU + FPU 45 MFlops (est)APU + FPUAPU + FPU 45 MFlops (est)45 MFlops (est)
OCM + FPU 30 MFlopsOCM + FPUOCM + FPU 30 MFlops30 MFlops
APU improves algorithm execution 20X over software emulationAPU improves algorithm execution 20X over software emulationAPU improves algorithm execution 20X over software emulation
• Algorithm description - FPU w/ FIR filter
• PowerPC operating frequency – 400 MHz
• APU with direct interface to fabric- XtremeDSP blocks
SWSW 2MFlops2MFlops
www.xilinx.com39 - Embedded Processing using FPGAs
Accelerated MPEG De-Compression
• Leverages Virtex-4 Integrated Features– PowerPC, APU, XtremeDSP Slice
– Software Emulation – 13000 IDCTs/sec
• HW acceleration over software– Lower latency and high bandwidth
• Efficient HW/SW design partitioning– Optimized implementation
• Significant performance increase– 20x improvement over software
– Utilizing APU and ExtremeDSP
Soft
Auxiliary
Processor
Soft Soft
AuxiliaryAuxiliary
ProcessorProcessor
APU
Controller
APUAPU
ControllerControllerPowerPCPowerPCPowerPC
FPGA FabricFPGA Fabric
Hard Processor BlockHard Processor Block
OCMOCM
PLBPLB
APU
I/F
FPGAInterface
ExtermeDSPExtermeDSP
ExtremeDSPExtremeDSP
ExtremeDSPExtremeDSP
Example: Video application - IDCT MPEG de-compression
www.xilinx.com40 - Embedded Processing using FPGAs
Tailor the System to Achieve
Performance
1x 2x 8x 10x
100MHz MicroBlazeTM,
pure software
= 146 seconds
100MHz MicroBlazeTM,
pure software
= 146 seconds
50x 100x20x
100MHz MicroBlazeTM
+FSL +DCT + IMDCT + LL MAC
= 7 seconds
100MHz MicroBlazeTM
+FSL +DCT + IMDCT + LL MAC
= 7 seconds
21X
1X
100MHz MicroBlazeTM
+FSL+ LL MAC
= 9 seconds
100MHz MicroBlazeTM
+FSL+ LL MAC
= 9 seconds
16X
Custom Hardware Logic
Take MP3 Decoding with Custom Hardware Logic
Performance Improvement
IMDCT DCT LL MAC
Note: MicroBlazeTM v4.00 core, ML40x board, 100MHz system clock, EDK8.1
www.xilinx.com41 - Embedded Processing using FPGAs
An Example:
MicroBlazeTM FPU in Motor Control
MicroBlaze
w/FPU
BRAM
Fault
Feedback
Pow
er Stage
Motor
Contrl.
Quad
Decoder
HALL
DecoderIsolation
UART
Memory
Interface
Ethernet
IIC
CAN
SPI
GPIO
XC3S1500
Extra cycles needed
for specialized peripherals
1XOver 5,300MicroBlazeTM
without FPU
Over 15XOver 82,000MicroBlazeTM
with FPU
ImprovementFunction
Calls/SecondConfiguration
• Design implications:
– You will get more done for a given clock frequency
– Reducing the clock frequency (if possible) reduces power dissipation
Benefit:
The customer now have the extra
processor bandwidth for use on other
things.
Note: Used MicroBlazeTM v4.00 core on Spartan 3S1500.
www.xilinx.com42 - Embedded Processing using FPGAs
Auxiliary Processing Solutions
405 Core
APU
ControlAuxiliary
Processor
Auxiliary
Processor
Fabric
Control
Module
Interface
PLB
OCM
Hard Processor Block
FPGA Fabric
APU I/F
Floating Point
Unit
FSL
-V4
C to HDL
Impulse
Technologies Celoxica Solution
Poseidon Design
Systems
Application
Specific
General
Solution
www.xilinx.com43 - Embedded Processing using FPGAs
C to HDL Tools
• Create Code
• Profile Code
• Identify Critical Code Segments
• Translate Critical Code to HDL
• Build HW Accelerators from HDL
• Replace Critical Code with Calls
to HW Accelerators
ProfileCode
HDL
Code
Accelerator Call
www.xilinx.com46 - Embedded Processing using FPGAs
Impulse C Programming Model• System of Independent sequential execution units
– Called Processes
– Can be executed in HW or SW and run concurrently
• Processes communicate with each other – Via self-synchronizing FIFO buffers called streams, or
– Via shared memories
Sequential execution only Producer
Function
SW FPGA FabricFaster
Execution
Stream
Concurrent Software and Hardware execution
PROCESS
Consumer
Function
Critical
FunctionPROCESS
PROCESS
Stream
Co Developer™
Producer
Function
SW Only
Consumer
Function
Function
Function
Function
Critical
Function
www.xilinx.com48 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• A Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com49 - Embedded Processing using FPGAs
What is Partial Reconfiguration?
• Dynamically modify blocks of logic in an FPGA by
downloading partial bit files while unmodified logic
continues to operate
• Logic is LUTs, FFs, BRAMs, DSP blocks, MGTs...
FPGA
Reconfig
Block
www.xilinx.com50 - Embedded Processing using FPGAs
PR Applications Today• JTRS Software Defined Radio
– SWaP and multi-channel
concurrency
• Microwave Communications
– Flexibility and extended functionality
• Cryptography
– Physical isolation in one FPGA
www.xilinx.com51 - Embedded Processing using FPGAs
Core C
onnect
D /A Interface (14 bits -- 300 MHz)
A/D Interface (12 bits -- 200 MHz)
TX NCO RX NCO
Interpolation filter Decimation filter
Waveform combiner
Ethernet interface
DAC ADC
Ethernet Switch
RJ45 – 100BT
(Data)
RJ45 – 100BT
(C tlr)
RJ45 – 100BT
(Voice)
WAVEFORM A
WAVEFORM B
SDR Development Platform
www.xilinx.com52 - Embedded Processing using FPGAs
PR Application: Dynamic Network
Interface Protocol Switching
FPGA
10 GigE tx/rx
OC48 tx/rx
Fibre tx/rx
Custom tx/rx
Switch
uP
10 GigE tx/rx
OC48 tx/rx
Fibre tx/rx
Custom tx/rx
Config Memory Storage
www.xilinx.com54 - Embedded Processing using FPGAs
Partial Reconfiguration Applications
• Reduce FPGA size by multiplexing FPGA hardware resources
CPU
Wavefo
rm A
Wavefo
rm B
Wavefo
rm C
Memory
Reconfiguration PortInternal configuration access port
External Memorywith full system configuration
and PR bit streams
Wavefo
rm D
www.xilinx.com55 - Embedded Processing using FPGAs
Agenda
• Why FPGA Platform Based Embedded
Processing
• Embedded Use Models And Their FPGA Based
Solutions
• Architecture/Topology Choices
• A Reoccurring Question: Hardware Or Software
• Reconfigurable Hardware
• Tool Flows For FPGA Based Embedded Systems
www.xilinx.com56 - Embedded Processing using FPGAs
Necessary Tools
• A Full Complement of Tools are Required to
Design an Embedded Processor System
– Processor System Generation
– Hardware Implementation & Software Compilation
– Hardware & Software Debug
• HW Implementation & SW Compilation are the
two Main Flows that Must be Addressed
– The Embedded Flows Should Mirror Traditional Flows
www.xilinx.com57 - Embedded Processing using FPGAs
System Generation
CoreCore
clock
reset
interrupts
ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter
FLASHFLASH TimerTimer UARTUART
SRAMSRAM
?
Tools are needed to create the system
infrastructure, connect peripherals and
configure the peripherals as desired
Once the system is defined, the hardware can
be implemented and software compiled
Databits = 8
Baudrate = 9600
Parity = Off
www.xilinx.com61 - Embedded Processing using FPGAs
Co-processor Creation
Auto-generated HW & SW Templates
HW instantiated
SW Drivers created
Test application code
www.xilinx.com62 - Embedded Processing using FPGAs
Hardware Implementation
Simulation/Synthesis
Build & Map
Place & Route
Standard FPGA
HW Development Flow
VHDL or Verilog
Configuration File
SynthesisSynthesis
Build & MapBuild & Map
Place & RoutePlace & Route
Source Code
Constraints
Memory Map for
Internal Code Storage
System Netlist
Merged &
Mapped Design
Configuration File
www.xilinx.com63 - Embedded Processing using FPGAs
Eclipse Based SDK for the Software Savvy Engineer
Powerful Software Development• Eclipse open source Software IDE
allows an industry standard method for
seamless tool integration
• Consists of:
• Perspectives
• Views
• Editors
• Environment window displays one or
more perspectives
• A perspective contains views (like
the Navigator) and editors
www.xilinx.com64 - Embedded Processing using FPGAs
• Device drivers are provided for each hardware device
• Device drivers are written in ANSI C and are designed to be portable across processors
• Device drivers allow the user to select the desired functionality to minimize the required memory
• Device drivers in all layers have common characteristics and API
Board Support
Package (BSP)
Boot Code
Initialization Code
Ethernet 10/100
Device Driver
UART 16550
Device Driver
IIC Master & Slave
Device Driver
ATM Utopia Device
Driver
Peripheral n, n+1…
Device Drivers
RTOS Adaptation Layer
Software IP
www.xilinx.com65 - Embedded Processing using FPGAs
XIntc_mEnableIntr ( … );
...
Board Support Package
ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter
SPI
EEPROM
SPI
EEPROMDDR
SDRAM
DDR
SDRAM UARTUART TimerTimer IntcIntc10/100
Ethernet
10/100
Ethernet
XEmac_mWriteReg ( … );
XEmac_SendFrame ( … );
...
xemac.h
XUartLite_SendByte ( … );
XUartLite_GetStats ( … );
...
XTmrCtr_mWriteReg ( … );
XTmrCtr_IsExpired ( … );
...
XSpi_mSendByte ( … );
XSpi_GetStatusReg ( … );
...
xtmrctr.h
xspi.h
uartlite.h
libxil.a
XEmac_mWriteReg ( … );
XEmac_SendFrame ( … );
...
XSpi_mSendByte ( … );
XSpi_GetStatusReg ( … );
...
XUartLite_SendByte ( … );
XUartLite_GetStats ( … );
...
XTmrCtr_mWriteReg ( … );
XTmrCtr_IsExpired ( … );
…
XIntc_mEnableIntr ( … );
...
Board Support Packages (BSPs) are
collections of parameterized drivers
for a specific processor system
www.xilinx.com66 - Embedded Processing using FPGAs
Software Compilation
C/C++ Cross Compiler
Assembler
Standard Embedded
SW Development Flow
C Code
Linker
Machine Code
CompilerCompiler
AssemblerAssembler
LinkerLinker
Source Code
Source Code
Mapfile and
Linker Script
Assembly Code
Relocatable
Object Code
Machine Code
www.xilinx.com67 - Embedded Processing using FPGAs
Data2MEM
Download Combined
Image to FPGA
Compiled ELF Compiled BIT
RTOS, Board Support Package
Embedded
Developers Kit
System Design Flow
Instantiate the
‘System Netlist’
and Implement
the FPGA
?
HDL Entry
Simulation/Synthesis
Implementation
Download Bitstream
Into FPGA
Chipscope
Standard FPGA
HW Development Flow
VHDL or Verilog
System NetlistInclude the BSP
and Compile the
Software Image
?
Code Entry
C/C++ Cross Compiler
Linker
Load Software
Into FLASH
Debugger
Standard Embedded
SW Development Flow
C Code
Board Support
Package
12 3 Compiled BITCompiled ELF
www.xilinx.com68 - Embedded Processing using FPGAs
HW & SW Debug
CoreCore
clock
reset
interrupts
ArbiterArbiter Primary BusPrimary Bus BridgeBridge Secondary BusSecondary Bus ArbiterArbiter
FLASHFLASH TimerTimerSRAMSRAM UARTUART Integrated
Logic Analyzer
Integrated
Logic Analyzer
Integrated
Bus Analyzer
Integrated
Bus Analyzer
Debug Port
JTAG Port
SW debug typically
consists of JTAG based
runtime control of the
target processor core
Internal & external logic analysis is
used to determine where problems reside
External Connection
www.xilinx.com69 - Embedded Processing using FPGAs
“Platform Debug”Integrated HW/SW Debuggers
• Cross Trigger HW and
SW Debuggers to Find
and Fix Bugs Faster!
• Enable better
insight into the HW /
SW code dynamics
www.xilinx.com72 - Embedded Processing using FPGAs
Key Messages
• FPGA Based Embedded Systems Solve Classic Integration
Issues (Size, Cost, Power)
• They Also Creating Novel Solutions Not Easily Realized Using
Other Technologies
• FPGA Platforms Enable Flexible and Dynamic Systems
• Intelligent Tools Accelerate Development And Enable You To
Optimize The Balance Of Feature Set, Performance, Area & Cost
Of Your Design