경종민 [email protected] 1 system functionality verification using fpga

45
1 경경경 [email protected] System Functionality Verification using FPGA

Upload: cordelia-payne

Post on 24-Dec-2015

231 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

1

경종민 [email protected]

System Functionality Verification using FPGA

Page 2: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

2

Contents• Section I

– Introduction to reconfigurable computing– FPGA Logic/Routing architecture

• Section II– Core-embedded FPGA– ALTERA/XILINX/TRISCEND/SiDSA

• Section III– Multiple-FPGA architecture– Emulation/Simulation acceleration using FPGA’s

Page 3: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

3

Introduction• Design execution methodology

– Hardware• Very fast & efficient• No alteration after fabrication• Expensive process to redesign and refabrication

– Software-programmed processors• Set of instructions determines a specific operation.• Functionality can be easily changed.• Performance is far below that of an ASIC.

Page 4: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

4

Reconfigurable Computing• Fill the gap between hardware and software

– FPGA is an array of computational elements and the routing wires among them.

– The configuration is determined by programmable configuration bits.

• Development – 1963 : Concept of “restructurable computing” appeared.– 1980’s : FPGA technology developed as a hybrid device betw

een PALs and MPGA(Mask Programmable Gate Arrays) by Xilinx, Altera, Lucent, QuickLogic..

– SRAM-programmable FPGA : high density– 1999-Now : Core-embedded FPGA incorporates both of progr

ammable processor and FPGA.

Page 5: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

5

Logic Block• LUT-based logic block

– Efficient logic block architecture adopted in many commercial FPGA’s

– Composed of LUT, DFF(Latch), and mux

carry logic

carry logic

4-LUT4-LUT

DFFDFF

Cout CinI1 I2 I3 I4

Out

Page 6: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

6

Logic Block• 4-LUT

– Any function with 4 input variables can be implemented.

• FF– Used for pipelining, registers, – It can be configured for latch by configuration– Clock signals come from global signals routed on

special resources (Global net)

• Carry logic– Speed up the carry-based arithmetic functions– Bypass the routing resources but connected directly

to the neighboring CLB

Page 7: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

7

Interconnection Architecture

• Island-style FPGA routing architecture– Routing architecture of most FPGA architectures– Sea of routing resources for connection between

rows and columns of logic blocks– Connection blocks : Programmable multiplexers that

selects the signals in the given routing channel to be connected to the logic block’s terminal.

– Switch Box: Connections between horizontal and vertical routing resources

Page 8: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

8

Interconnection Architecture

• island-style routing architecture

Page 9: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

9

Interconnection Architecture

• Routing resources with various lengths– Local interconnections : Routing between logical

blocks (ex. dedicated carry chain)– Medium length lines : Routing wire that runs width of

several logical blocks– Long lines : Routing wire that runs the whole chip

height or width– Global lines : Routing wire that runs the entire area

of the chip • High-speed, low-skew, connections to all logic blocks• Usually used for clocks, resets.

Page 10: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

10

Two Routing Architectures• Segmented routing architecture

– Local communication traffic by short wires– Long wires are frequently used to travel long distances witho

ut passing through many switches– Researches

• How many wires should be contained in each channel?• How many types of long wires would be efficient?• Proper portion of each wire type in the whole routing resources

– Companies : Xilinx, Lucent, Vantis

Page 11: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

11

Two Routing Architectures• Hierarchical routing architecture

– Cluster-based routing architecture• Routing within a cluster is at the local level, only

connecting within that cluster.• Longer wires connect different clusters together.

– Each routing level contains several clusters– Background

• Most connections between logic blocks are local with only a limited amount of communication traversing long distance

– Good placement algorithm is required. – Company : ALTERA

Page 12: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

12

Two Routing ArchitecturesSegmented Routing Hierarchical Routing

Logic blocks

Connection switches cluster

Page 13: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

13

Heterogeneous architecture

• Multiplier embedding– Multiplier implementation in FPGA is usually inefficient.– Custom/Configurable hardware for multiplication with various

operand widths and choice of signed/unsigned can be embedded using a reconfigurable array of FAB’s (special full adder blocks).

– (Haynes, Field-Programmable Custom Computing Machines, 1998)

Page 14: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

14

Page 15: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

15

Heterogeneous architecture

• Embedded memory blocks– Use of available LUTs as RAM structure (Xilinx XC4000, Virte

x FPGAs)– Dedicated memory blocks within array (Xilinx Virtex and Altera

FPGAs)

Page 16: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

16

Xilinx Virtex architectureBlock SelectRAM is embedded inside logic blocks as a column.

Page 17: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

17

Heterogeneous Architecture

• Processor embedding– At late 2000, several commercial FPGA companies have ann

ounced plans to include entire microprocessors. – Altera

• ARM9-based Excalibur device– Xilinx

• PowerPC based Virtex-II device– Triscend

• 8051/ARM based SoC integration platform

Page 18: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

18

경종민 [email protected]

SoC Verification through FPGA’s

Core-Embedded FPGA

Page 19: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

19

Core-Embedded FPGA’s• ALTERA

– ExcaliburTM

• ARM-embedded FPGA– StratixTM

• Currently without ARM core. Excalibur’s next version is under development.

• XILINX– Virtex-II ProTM

• IBM’s PowerPC-embedded FPGA. • Triscend

– A7• ARM-embedded FPGA

– E5• 8051-embedded FPGA

Page 20: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

20

ALTERA’s Excalibur• ARM9 core integrated with FPGA

– “SOPC (System On Programmable Chip)”– C/C++ compiler/debugger integrated in the FPGA compiler.

• Interface between processor and FPGA– AMBA (Advanced Microcontroller Bus Architecture)– The widely used internal bus architecture for SoC.– The connection between ARM processor and FPGA block is d

one by AMBA bus.

Page 21: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

21

ALTERA’s Excalibur

Clock Domain 2(AHB2)Clock Domain 2(AHB2)(up to 90MHz)(up to 90MHz)

Clock Domain 3 (PLD)Clock Domain 3 (PLD)(up to 100MHz)(up to 100MHz)

Clock Domain 1 (AHB1)Clock Domain 1 (AHB1)(up to 180MHz)(up to 180MHz)

Page 22: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

22

Clock Domain 2(AHB2)Clock Domain 2(AHB2)(up to 90MHz)(up to 90MHz)

Clock Domain 3 (PLD)Clock Domain 3 (PLD)(up to 100MHz)(up to 100MHz)

Clock Domain 1 (AHB1)Clock Domain 1 (AHB1)(up to 180MHz)(up to 180MHz)

Page 23: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

23

ALTERA’s Excalibur• AHB1

– Bridge for AHB2– Interrupt controller,

watchdog timer– Single Port & Dual

Port SRAM– The Embedded

processor is the only bus master on AHB1

Page 24: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

24

ALTERA’s Excalibur• AHB2

– PLD transfers data with memories, UART or PLD slave

– Dedicated interfaces between stripe (Processor and Peripherals) and PLD

Page 25: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

25

• AHB2– PLD transfers data with

memories, UART or PLD slave

– Dedicated interfaces between stripe (Processor and Peripherals) and PLD

Page 26: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

26

XILINX’s Virtex-II Pro• PowerPC core integrated with FPGA

– “Platform FPGA architecture”– Up to four PPC cores can be integrated.

• Interface between processor and FPGA– CoreConnect Bus

• PLB (Processor Local Bus)• DCR (Device Control Register) bus

– OCM(On-Chip Memory) interface• Dedicated interface between the block RAM and OCM signals of

PPC core.

Page 27: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

27

Virtex-II Pro Block Diagram

PowerPC core. This block diagram contains two PPC cores.

Block RAM and multiplier blocks

Configurable logic block array

Page 28: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

28

PPC Core Block

PPC 405 Core

OC

M c

on

trolle

rO

CM

con

trolle

r

OC

M c

on

trolle

rO

CM

con

trolle

r

Control

Control

Block RAM

Block RAM

Block RAM

Block RAM

OCM controller is dedicated interface between PPC and Block RAM.

Block RAM can be configured as Instruction-Side Block RAM(ISBRAM) or Data-Side Block RAM(DSBRAM).

Fixed latency of memory access guarantees higher speed execution.

Block RAM can be configured as dual-port RAM (Data communication between PPC and FPGA).

PLB master interface ports are at the boundary of PPC core.

DCR bus

Page 29: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

29

Triscend’s E5/A7• E5/A7

– “CSoC(Configurable System-on-Chip)”– E5 contains 8051 core, CSL(Configurable System Logic) matr

ix, and peripheral interfaces(JTAG, DMA, Timer, FIFO)– A7 contains ARM core instead of 8051.

• CSI (Configurable System Interconnect)– Bus developed by Triscend. – Pipelined bus architecture for the performance optimization

Page 30: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

30

Triscend E5/A7• Bus architecture allows the bus to be

expanded throughout the whole chip while preserving high-performance.– The internal system bus is extended throughout the

user-configurable system logic.

• Objectives– Inclusion of any processor is possible.– High-performance assured regardless of the CSL size

Page 31: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

31

Triscend’s A7 Architecture• CSI Bus

– Configurable System Interconnect

– Masters of CSI• ARM• JTAG(Configuration)• DMA0, DMA1, DMA2,

DMA3– Sideband Signals

• Dedicated small # of signals for UART, Timer

Page 32: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

32

Triscend’s CSL matrixVertical/Horizontal BreakersVertical/Horizontal Breakers1.1. Vertical : Address Decoder part of Vertical : Address Decoder part of

CSICSI2.2. Horizontal : Data read/write port Horizontal : Data read/write port

of CSIof CSI

Vertical/Horizontal BreakersVertical/Horizontal Breakers1.1. Vertical : Address Decoder part of Vertical : Address Decoder part of

CSICSI2.2. Horizontal : Data read/write port Horizontal : Data read/write port

of CSIof CSI

Selector Selector 1.1. Decodes address Decodes address 2.2. Registers are arranged in Registers are arranged in

vertical column of CSL cellsvertical column of CSL cells3.3. Pre-programmed at the Pre-programmed at the

initializationinitialization

Selector Selector 1.1. Decodes address Decodes address 2.2. Registers are arranged in Registers are arranged in

vertical column of CSL cellsvertical column of CSL cells3.3. Pre-programmed at the Pre-programmed at the

initializationinitialization

Page 33: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

33

Triscend’s System Architecture

CPUCPU

DMADMA

JTAGJTAG

Bus FIFO/Arbiter

for multiple Masters

Bus FIFO/Arbiter

for multiple Masters

CSLCSL

RAMRAM

ROMROM

Memory Interface

Memory Interface

Bus master requires

grant signals from arbiter

CPU runs boot code initially. Boot code is for configuring CSL as well as storing program/data.

Page 34: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

34

CSI Bus Architecture

Bus FIFO

Bus FIFO

Master Write – Address/Data/Control Slave Write – Address/Data/Control

Master Master

Arbiter

Master Read – Data/Control

Selectors and pipe registers

Selectors and pipe registers

Slave Read – Data/Control

Dedicated Slave

Dedicated Slave

CSL

CSL

Arbiter

Page 35: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

35

Pipelined Write Transaction

Bus FIFO

Bus FIFO

Master Write – Address/Data/Control Slave Write – Address/Data/Control

Master Master

Arbiter

Master Read – Data/Control

Selectors and pipe registers

Selectors and pipe registers

Slave Read – Data/Control

Dedicated Slave

Dedicated Slave

CSL

CSL

Time Slot T+1

Time Slot T+2

Arbiter

Time Slot T

Page 36: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

36

Pipelined Read Transaction

Bus FIFO

Bus FIFO

Master Write – Address/Data/Control Slave Write – Address/Data/Control

Master Master

Arbiter

Master Read – Data/Control

Selectors and pipe registers

Selectors and pipe registers

Slave Read – Data/Control

Dedicated Slave

Dedicated Slave

CSL

CSL

Time Slot T+1

Time Slot T+2

Time Slot T+3

Arbiter

Time Slot T

Page 37: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

37

Pipeline in view of Bus Logic

mastermaster

arbiterarbiter

Address/Data

Address/Data

Configure SelectorDecode

Configure SelectorDecode

Read from CSL

Read from CSL

Bus FIFOBus FIFO

Data from CSL to Master

Data from CSL to Master

T T+1 T+2 T+3

Page 38: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

38

Wait State• Why is it generated?

– 1. The handshake operation inside the logic implemented in CSL.

– 2. CSL logic is too slow to respond in one cycle.

• Sequence of generation– 1. “Address Selector” in CSL generates wait state if

the system tries to access the Selector’s address. – 2. If more than one wait state is required, the CSL

function inserts additional wait states.

Page 39: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

39

Wait State Insertion

mastermaster

arbiterarbiter

Address/Data

Address/Data

Configure SelectorDecode

Configure SelectorDecode

Read from CSL

Read from CSL

Bus fifoBus fifo

Data from CSL to Master

Data from CSL to Master

T T+1 T+2 T+3

OR

Waitnow

Page 40: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

40

CSL Physical Structure• Bus pipeline registers at each

bank boundary Time slots for user logic is independent of the signal transport time between banks.

• The write/read bus is distributed throughout CSL and buffered and piped into the bank as shown by the red arrows.

16x8 RAM System Logic8K

RAM16x8 RAM

Bank Bank Bank Bank

Bank Bank Bank Bank

Bank Bank Bank Bank

Bank Bank Bank Bank

Logic tile

• The wait signals generated from each bank is propagated to the pipeline registers in all other banks.

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Wait Dist.Logic Cell

Page 41: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

41

Structure Bank/Bus/Selector

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Tile Tile Tile Tile Tile Tile Tile Tile

Selector Selector Selector Selector Selector Selector Selector Selector

Bank

Horizontal data line writes data to CSL cell. The read data is OR’ed to the horizontal read data line.

4 wires each tile

Configured initially for the selection of the column/wait generation.

Page 42: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

42

E5 Physical Implementation

• 8051 CPU core• 0.35um, 40MHz CSL operation

8051 CPU core and RAM/ROM

CSL matrix

Page 43: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

43

SiDSA’s FIPSOC• Integration of CAB (Configurable Analog Block)

– 8051 microcontroller– FPGA– Configurable analog cells optimized for data

acquisition applications

• Dynamic reconfiguration– Two configuration bits for each CLB– User can download extra configuration data while the

cells are in operation.

Page 44: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

44

Analog Subsystem• Configurable Analog Blocks (CAB)

– Differential amplification– Comparison– Data conversion (ADC, DAC)

• Digital part– Digital part to configure CAB is controlled by the P or the pro

grammable logic.

Page 45: 경종민 kyung@ee.kaist.ac.kr 1 System Functionality Verification using FPGA

45

Comparison• Xilinx

– Using CoreConnect bus to connect processor and FPGA.– Multiple processor cores can be used simultaneously.

• ALTERA– AMBA bus to connect processor and FPGA.

• Triscend– Processor can read/write any register inside of CSL matrix.

(CSL matrix can be considered as a functional block of the processor)

– Intensive pipeline schemes adopted to maintain/increase the throughput, as the latency otherwise caused by the distributed bus throughout the CSL matrix can be excessive.