implementing complex algorithms in fpgas

93
1 Nov02 Implementing Complex Algorithms in FPGAs Workshop Dr Steve Chappell Director Apps Engineering

Upload: florence-alexander

Post on 13-Mar-2016

46 views

Category:

Documents


2 download

DESCRIPTION

Implementing Complex Algorithms in FPGAs. Workshop Dr Steve Chappell Director Apps Engineering. Workshop Materials. For the Labs Course Workbook, Tutorials and Application Notes DK integrated help system On your Workstations DK, PDK Target Platforms RC100, RC1000. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Implementing Complex Algorithms in FPGAs

1Nov02

Implementing Complex Algorithms in FPGAs

Workshop

Dr Steve Chappell Director Apps Engineering

Page 2: Implementing Complex Algorithms in FPGAs

2

Workshop Materials

> For the Labs Course Workbook, Tutorials and Application Notes DK integrated help system

> On your Workstations DK, PDK

> Target Platforms RC100, RC1000

Page 3: Implementing Complex Algorithms in FPGAs

3

Contents

> Introductions About Celoxica

> The Basics Opportunities with a HW Coprocessor Target Boards Design Flows – DK and Handel-C in brief

> Handel-C Language

> Tool Connectivity

> Platform Developers Kit Platform Abstraction Codesign

> Appendices Technology, Applications, CUP

Lab#1Lab#2

Labs#3,4

Page 4: Implementing Complex Algorithms in FPGAs

4

About Celoxica

> System EDA company Design Tools, FPGA Boards, Consultancy and Services Incorporated on the 25th September 2000 (Formerly ESL) Market leader in complete solutions for software-compiled system

design Core Technology is DK incorporating the Handel-C programming language

A senior management wealth of EDA and electronics industry experience

> Industry leading partners:

> Strong Links with Research & Development Technology and expertise based upon decades of research into state-of-

the-art at The University of Oxford Chief Science Officer Ian Page, visiting Professor at the Imperial College of Science,

Technology & Medicine, London Established and active University Program (700 institutions world-wide)

> Investors Premier league investors including

Intel

Page 5: Implementing Complex Algorithms in FPGAs

5

Supporting Argonne

> Augmented Cluster supplied by Linux Networks Incorporating Tarari CPP cards and Software drivers Celoxica Development Kit for FPGA content

> Ensuring successful deployment and evaluation Cluster support by Linux Networks Augmented Application and CPP card support by Celoxica

Page 6: Implementing Complex Algorithms in FPGAs

6Nov02

The Basics

Opportunities and Challenges

Essence of an FPGA

Design Flows

Page 7: Implementing Complex Algorithms in FPGAs

7

Opportunities with a HW Co-processor> Algorithm Acceleration

Exploit the parallelism in algorithms to increase performance with implementation in custom (parallel) hardware

> Algorithm Offload Exploit the coprocessor to free CPU resource

e.g., in an SSL proxy, the CPU can always handle more TCP traffic if algorithms such as RSA and 3DES are moved to a coprocessor

> For PCI-based coprocessor cards candidate algorithms include ones where CPU execution time far exceeds data transfer time over PCI

Full analysis needs to consider: Time required to perform the algorithm in the Co-processor System application performance improvement – Amdahl’s Law

Page 8: Implementing Complex Algorithms in FPGAs

8

Opportunities with FPGAs

> FPGA architecture

> What it means for applications “Soft” Hardware

Reconfigurability/Programmability Integer processors (FP is “resource expensive”) Wide data paths Parallel Computation

> Challenges to deployment in enterprise computing Development complexity

IP deployment and integration Design Framework and methods

Data Bandwidth to/from coprocessor Choosing the right applications

Page 9: Implementing Complex Algorithms in FPGAs

9

Essence of an FPGA

> Soft Cores Processor

> Block RAM

> Processor

> Multipliers

> Application

> SRAM Field Programmable Gate ArrayCLB’s+IOB’s+Interconnect Matrix

> CLB

Page 10: Implementing Complex Algorithms in FPGAs

10Nov02

Target Boards

RC100, RC1000, RC2000 Tarari CPP

Page 11: Implementing Complex Algorithms in FPGAs

11

RC-100

> Xilinx Spartan2-200 FPGA

> 2MB ZBT SRAM, in 2 36-bit banks. 8MB Flash RAM

> 50 pin expansion header, PS/2 mouse/keyboard, parallel port

> Video input decoder, VGA output DAC

> Two 7-segment LED displays

> 80MHz maximum clock

Page 12: Implementing Complex Algorithms in FPGAs

12

RC-1000

> PCI card, DMA transfers > 110 MB/sec sustained

> Xilinx Virtex-2000 FPGA

> 8MB SRAM, in 4 32-bit banks

> 2 PMC slots

> 50 auxiliary I/O pins

> Programmable clock

Page 13: Implementing Complex Algorithms in FPGAs

13

RC-1000

13

Page 14: Implementing Complex Algorithms in FPGAs

14

RC-2000

> Virtex II 2V3000-4, 2V6000-4 and 2V6000-6 FPGAs

> 64bit 66MHz PCI bus

> 6 banks of ZBT SRAM offering a total of either 12Mb or 24Mb

> Front-panel I/O up to 146 lines, dependant on options

> 64 I/O lines via PMC connector

> 16Mb Flash for configuration storage

> 2 Programmable clocks

> Options include: 16Mb additional ZBT SRAM in 2 banks 128Mb DDR Ram

Page 15: Implementing Complex Algorithms in FPGAs

15

RC-2000

15

Page 16: Implementing Complex Algorithms in FPGAs

16

CPP – Basic Board Architecture

> Two CPE’s – Content Processing Engines Virtex-II 1000 FPGA:

Eight LEDs 2x 1MB SRAM Connection to CPC

> CPC – Content Processing Controller 256MB DDR SDRAM PCI Bus to Host

CPC

PCI bus

256MB DDR SDRAM

CPE

CPE

1MB SRAM

1MB SRAM

1MB SRAM

1MB SRAM

Basic CPP architecture

L E D

L E D

Page 17: Implementing Complex Algorithms in FPGAs

17Nov02

Design Flows

DK and Handel-C

Page 18: Implementing Complex Algorithms in FPGAs

18

Designing acceleration IP

> Traditional Options – HDL based design Purchase FPGA (HW) development tools Hire/use HW engineers Pay 3rd Party development fees

> The Alternative – “Software Compiled System Design” Use Celoxica Content Processing Development Kit

Development framework with Example Acceleration IP Comprehensive Hardware-Software Co-simulation environment Tool and Language Connectivity

Enable SW engineers and/or increase HW engineer productivity

Page 19: Implementing Complex Algorithms in FPGAs

19

Why a Software Language Based Approach for System Design?> Some problems are better expressed as a software

algorithm

> Software Reference designs can be utilized

> Designs are often specified by a C/C++ executable

> Simplifies and delays hardware-software partitioning

> Software development techniques can be used

> Brings hardware and software teams closer together

> New Possibilities …

Page 20: Implementing Complex Algorithms in FPGAs

20

RC100

> RC100 prototyping board $10 FPGA Commodity memory chips Video Input and Output

1

Page 21: Implementing Complex Algorithms in FPGAs

21

RC100

> RC100 prototyping board $10 FPGA Commodity memory chips Video Input and Output

2

Page 22: Implementing Complex Algorithms in FPGAs

22

CPDK for developing acceleration IP> The Content Processing Development Kit includes

Celoxica “DK” and supporting libraries

> Consisting of “Software Compiled System Design” environment Simple design flow with integrated Simulation and direct

implementation Similar SW/HW design methods simplifies design exploration

and optimal allocation of functionality between SW and HW Verification and Debug using a Symbolic Debugger Connectivity and co-simulation with SW and HDL cores API’s to hide complexity

> Enabling your software and hardware developers To rapidly develop acceleration IP

Page 23: Implementing Complex Algorithms in FPGAs

23

FinalHardware

Celoxica DK1 – Rapid Design

> Handel-C direct to FPGA, Minimum Tool Chain

> Easy-to-learn language – ISO-C (ANSI-C)

> Design of hardware and software in parallel with co-simulation

Netlist

Compile

Configure

SimulateHandel-C

Design Flow>

1

A

D

Q 1

Q 4

EN B

R e g iste r

FPGA Vendor’s Tools

Place & Route

Page 24: Implementing Complex Algorithms in FPGAs

24

Supported FPGA/PLD Devices

Page 25: Implementing Complex Algorithms in FPGAs

25

Minimal Tool Chain

Similar Languages

Standardised API’s

Platform Abstraction

External IP (optional)

Development Flow

HW

CompileEDIF OBJ

SW

Specification

Handel-C C

DK

BSP

OS

SW Tool

BSP

LIBSHW SW

Implementation

Algorithm Definition

Partition

Develop

HLL Co-Verification

EDIF

HDL

LIB

C

CPP Host CPU

Page 26: Implementing Complex Algorithms in FPGAs

26

API’s Enable Rapid Co-verification

HW SW

Specification

DK

Handel-C C

NexusHW SW

Implementation

Virtual Platform

> “Virtual Platform” for Co-simulation and Co-design> Cycle-accurate HLL simulator for Acceleration IP modelling> Extendable Co-Sim to: C/C++, HDL, System-C, ISS

BSPBSP

HDL-Simulator SW and/or ISS

Page 27: Implementing Complex Algorithms in FPGAs

27

DK User Interface

File view

Symbol view

Syntax highlighting

Break-points

Multithreaded Debug

Watchvariables

Simulate Build

Clock Cycles

Info

Page 28: Implementing Complex Algorithms in FPGAs

28

Handel-C in Brief

> Handel-C is based on ANSI C> Well-defined semantics similar to OCCAM/CSP> Additions:

support for parallelism channels for communications between parallel processes operators for detailed control of hardware constructs for RAM, ROM, interfacing, etc.

Page 29: Implementing Complex Algorithms in FPGAs

29

HW-SW Co-Design

Page 30: Implementing Complex Algorithms in FPGAs

30Nov02

Handel-C Language

Page 31: Implementing Complex Algorithms in FPGAs

31

Core Language Features

> Standard C (if, while, switch etc) including Functions Structures Pointers

> par {…} construct for parallelism> Simple model of timing

each assignment is one clock cycle

> Arbitrary widths on variables> Enhanced bit manipulation operators> Sharing/Copying expressions> Support for hardware constructs

Multiple clock domains, RAM, ROM, external interfaces

Page 32: Implementing Complex Algorithms in FPGAs

32

Handel-C describes Hardware!

> No side effects in expressions i.e. statements like a = b*c++; are not supported

> No floating point Floating point not directly supported by Handel-C. Library support provided for fixed and floating point arithmetic

> No run-time recursion Due to the absence of any kind of ‘call stack’ in hardware.

> Limited standard library (i.e. no printf, fopen etc.) However, DK1.1 allows direct calls to external functions written

in C/C++, and these could incorporate file I/O, user interaction, recursion, etc.

Page 33: Implementing Complex Algorithms in FPGAs

33

Variables

> Handel-C has one basic type - integer

> May be signed or unsigned

> Can be any width, not limited to 8, 16, 32 etc.Variables are mapped to hardware registers.

void main(void){

unsigned 6 a;a=45;

}

1 0 1 1 0 1 = 0x2da =

LSBMSB

Page 34: Implementing Complex Algorithms in FPGAs

34

Bit Manipulation Operators

> Extra operators have been added to allow more ‘hardware like’ bit manipulation:

<< Shift Left b = a<<2;

>> Shift Right b = a>>1;

<- Take least significant bits b = a<-5;

\\ Drop least significant bits b = a\\5;

@ Concatenate bits b = a@c;

[ ] Bit Selection b = a[4:1];

Page 35: Implementing Complex Algorithms in FPGAs

35

Example Bit Manipulation

[MSB :LSB ] - bit selection (range of bits)

1 0 1 1 0 1 = 0x2da =

0 1 1 0 = 0x6b =

b = a[4:1]

Page 36: Implementing Complex Algorithms in FPGAs

36

Bit Manipulation 2

> Other bit manipulation examples:

signed int 4 a;signed b,c,d;

a = 0b1100;

b = a<<1; // b = 0b1000b = a>>1; // b = 0b1110c = a[2:1]; // c = 0b10c = a<-2; // c = 0b00c = a\\2; // c = 0b11d = a @ a; // d = 0b11001100

Page 37: Implementing Complex Algorithms in FPGAs

37

index = 0; // 1 Cyclewhile (index < length){

if(table[index] = key)found=index; // 1 Cycle

elseindex = index+1; // 1 Cycle

}}

Timing model

> Assignments and delay statements take 1 clock cycle

> Combinatorial Expressions computed between clock edges

Most complex expression determines clock period Example: takes 1+n cycles (n is number of iterations)

Page 38: Implementing Complex Algorithms in FPGAs

38

Parallelism

> Handel-C blocks are by default sequential

> par{…} executes statements in parallel

> par block completes when all statements complete Time for block is time for longest statement Can nest sequential blocks in par blocks

// 3 Clock Cycles {

a=1;b=2;c=3;

}

Sequential BlockParallel Block

// 1 Clock Cycle par{

a=1;b=2;c=3;

}

Page 39: Implementing Complex Algorithms in FPGAs

39

More Parallelism

> Example – array initialisation> Sequential version takes 20 clock cycles

for() loop has 1 cycle overhead for increment> Parallel version takes 1 clock cycle

Replicated par() builds hardware to execute all 20 iterations in a single cycle

Allows trade-off between hardware size and performance

for(i=0;i<10;i++){ array[i]=0;}

Sequential code Parallel code

par(i=0;i<10;i++){ array[i]=0;}

Page 40: Implementing Complex Algorithms in FPGAs

40

Channels

> Allow communication and synchronisation between two parallel branches

Semantics based on CSP: unbuffered (synchronous) send and receive

> Declaration Specifies data type to be communicated

{ … c?b; //read c to b …}

{ … c!a+1; //write a+1 to c …}

Chan unsigned 6 c;

ca b

Page 41: Implementing Complex Algorithms in FPGAs

41

Sharing Hardware for Expressions

> Functions provide a means of sharing hardware for expressions

> By default, compiler generates separate hardware for each expression

Hardware is idle when control flow is elsewhere in the program

Hardware function body is shared among call sites

{…x= x*a + b;y= y*c +d

}

int mult_add(int z,c1,c2){ return z*c1 + c2; }

{…x= mult_add(x,a,b);y= mult_add(y,c,d);

}

Page 42: Implementing Complex Algorithms in FPGAs

42

Replicating Hardware for Expressions

> Inline Functions are expanded at the call site Provide for functional abstraction of complex hardware

inline complex mult_complex(complex x,y){complex z;par{

z.re = x.re*y.re – x.im*y.im;z.im = x.re*y.im + x.im*y.re;

}return z;

}

complex x1,y1,x2,y2,z1,z2;…par{

z1 = mult_complex(x1,y1);z2 = mult_complex(x2,y2);

}

Page 43: Implementing Complex Algorithms in FPGAs

43

Macro procedures

> macro proc is similar to an inline function, but is expanded at compile time.

They also allow for arbitrary bit width calculations

> The following generates a reusable timer:

macro proc usleep(ms){ #define TENTH_SEC CLOCK_RATE/10

unsigned (log2ceil(TENTH_SEC)) Counter; Counter = TENTH_SEC * (0@ms) ;

while (Counter) Counter--;}

Page 44: Implementing Complex Algorithms in FPGAs

44

Signals

> A signal behaves like a wire - takes the value assigned to it but only for that clock cycle.

The value can be read back during the same clock cycle. The signal can also be given a default value.

// Breaking up complex expressionsint 15 a, b;signal <int> sig1;static signal <int> sig2=0; //default value of 0a = 7;par{    sig1 = (a+34)*17;

sig2 = (a<<2)+2;b = sig1 + sig2;

}

Page 45: Implementing Complex Algorithms in FPGAs

45

Interfaces - Introduction

> Interfaces allow Handel-C designs to connect to external hardware and logic.

> Three types of interfaces Buses – used for connecting to external pins

Ports – used for creating connection points for external logic. e.g. Creating the ports for a VHDL entity

User Defined – used for including external logic blocks inside a Handel-C design. e.g. Including an EDIF black box inside a deign.

Page 46: Implementing Complex Algorithms in FPGAs

46

Interfaces – Buses

> Makes connections to pins on the FPGA. Bus types

Output Input – direct, clocked and latched input Tri-state – direct, clocked and latched tri-state

interface bus_in(int 4) Address() with {data={P1,P2,P3,P4}};x=Address.in;

x

P1P2

P4P3

Addressx

Page 47: Implementing Complex Algorithms in FPGAs

47

Interfaces – Ports

> Allows connection points for external logic to be specified. e.g. Defining the ports for a ‘black box’ VHDL entity

Port types: Input, Output

//Declare Portsinterface port_in(int 4 Input1) InputPort1();interface port_in(int 4 Input2) InputPort2();interface port_out() OutputPort(int 4 Output = OutReg);

Handel-C black box

Input1

Input2Output

Page 48: Implementing Complex Algorithms in FPGAs

48

Interfaces – User Defined

> Allows external logic blocks to be used inside a Handel-C design. e.g. Using an EDIF core.

//Instantiate connections to coreinterface pipe_mult(int 4 Result)

Multiplier( int 4 A, int 4 B);

Handel-C Design

ResultBA EDIF Module

pipe_mult.edf

Page 49: Implementing Complex Algorithms in FPGAs

49

Multiple Clock Domains - example

Domain1.c Domain2.c

chan unsigned 8 ComChan;

set clock = external "C1";

void main(void){

unsigned 8 x;

do{

x++;ComChan ! x;

}while(1);}

extern chan unsigned 8 ComChan;

set clock = external "C2";

void main(void){

unsigned 8 y;

do{

ComChan ? y;}while(1);

}

Page 50: Implementing Complex Algorithms in FPGAs

50

Handel-C Summary

> Handel-C is based on ANSI C

> Well-defined semantics similar to OCCAM/CSP

> Additions: support for parallelism channels for communications between parallel processes operators for detailed control of hardware constructs for RAM, ROM, interfacing, etc.

Page 51: Implementing Complex Algorithms in FPGAs

51Nov02

Lab #1

Quick Start DK1, Handel-C and the RC100

Page 52: Implementing Complex Algorithms in FPGAs

52Nov02

Tool Connectivity

The Whole Y-Chart

Page 53: Implementing Complex Algorithms in FPGAs

53

Tool Connectivity

Page 54: Implementing Complex Algorithms in FPGAs

54

Black Boxes - Xilinx CoreGen

Page 55: Implementing Complex Algorithms in FPGAs

55

Co-Simulation with HDL

55

Page 56: Implementing Complex Algorithms in FPGAs

56

Co-Simulation with ISS

Page 57: Implementing Complex Algorithms in FPGAs

57

HW-SW Co-Simulation &Virtual Platforms

Page 58: Implementing Complex Algorithms in FPGAs

58

MatLab Simulink

Filter.hcc

Sfunc.cpp

dll

Page 59: Implementing Complex Algorithms in FPGAs

59

Co-Simulation with System-C

Page 60: Implementing Complex Algorithms in FPGAs

60Nov02

Lab #2

Advanced Features

Page 61: Implementing Complex Algorithms in FPGAs

61Nov02

PDK – Platform Dev Kit

PDK, PAL and DSM

Page 62: Implementing Complex Algorithms in FPGAs

62

Introduction to PDK> PDK – Platform Developer’s Kit

> Goal – to provide an integrated package of tools, support libraries and implementations to simplify application development and verification using DK1

Insulate developer from hardware details Improve portability and maintainability Provide key pre-packaged value–adding functionality Allow simulation of the complete environment from modelling

through to hardware implementation

> Benefits Reduce development time Allow development focus on application added value

Page 63: Implementing Complex Algorithms in FPGAs

63

Introduction to PDK> PDK – Three major components

> DSM Integration between processors and FPGA/PLD

> PAL A consistent API for portable board-level Handel-C

implementations

> PSL Provides board, hardware or development tool specific support

for DK1 and Handel-C

Page 64: Implementing Complex Algorithms in FPGAs

64

Introduction to PDK

> Each PDK component provides four functional areas:

> Simulation Provides hardware independent simulation of DSM and PAL APIs

and co-simulation with external tools and simulators

> Kit Provides key components and/or templates to allow

development of new, platform specific, implementations

> Platform Platform specific implementations of DSM, PAL and PSL

components

> Cores Implementations of added-value functionality, demos or

examples

Page 65: Implementing Complex Algorithms in FPGAs

65

Platform Abstraction Layer (PAL)

Peripheral 1 Peripheral 2

Handel-C Application

Board

Platform Abstraction Layer Application Programming Interface

Platform Support Library (PSL)

PAL-Core

Page 66: Implementing Complex Algorithms in FPGAs

66

DSM – Data Stream Manager

Processor

Software DSM Library

FPGA

Hardware DSM Library

Handel - C program

Handel - C program

Application Application

Hardware Bus Controller

Software Bus Controller

Figure 4

Page 67: Implementing Complex Algorithms in FPGAs

67Nov02

Labs #3 and #4

PDK:

PAL and DSM

Page 68: Implementing Complex Algorithms in FPGAs

68

Summary

> High performance gains with HW acceleration cards For appropriate algorithms

> Development kit enabling rapid design using a software-like development framework

Celoxica DK and Handel-C

> Consultancy and Services

> For More > www.celoxica.com

Page 69: Implementing Complex Algorithms in FPGAs

72Nov02

Appendices

Technology Behind DK

Consultancy, Services, Projects

Case studies

University Programme

Page 70: Implementing Complex Algorithms in FPGAs

73Nov02

The Technology Behind DK

Page 71: Implementing Complex Algorithms in FPGAs

74

The Technology Behind DK

> Simple Hardware constructs

> Compilation Flow

> Optimisations

Page 72: Implementing Complex Algorithms in FPGAs

75

The Hardware Description

> Data Path Circuitry to Move/Manipulate/Store Data

> Control Path Circuitry to schedule operations

Page 73: Implementing Complex Algorithms in FPGAs

76

Control and Assignment

> Variables are mapped to hardware registers

> The control start signal forms the clock enable signal for the destination register of the assignment.

Figure. Implementation of Assignment

Q

DCLK

Start

F in ish

RCE

Exp

void main(void){

…R=Exp;…

}

Page 74: Implementing Complex Algorithms in FPGAs

77

The IF Construct

> Start

F in ish

S1 S2

BE

void main(void){

…if { BE }

S1; else S2; …}

Page 75: Implementing Complex Algorithms in FPGAs

78

Sequential Composition

>

S1

Start

F in ish

S2

S3

void main(void){

…S1;S2;S3;…

}

Page 76: Implementing Complex Algorithms in FPGAs

79

Parallel Composition

>

void main(void){

…par{

S1;S2;S3;

}…

}

Start

F inish

S1 S2 S3

D

Q

D

Q

D

Q

Page 77: Implementing Complex Algorithms in FPGAs

80

Compilation Flow - Optimisations

> Generate AST from Source code

Macro Expansion Width Inferencing Design Checking

> Compilation to High Level Netlist

> Expansion to technology specific netlist

High Level Optimisation

Expansion

Low Level Optimisation

Compilation

Abstract Syntax Tree

High Level Netlist

Gate Level Netlist

Technology Independent

Technology Specific

Figure. Compilation flow after parsing.

Page 78: Implementing Complex Algorithms in FPGAs

81

Re-Writing

> Logical equivalence(a) Constant 1 input to

AND Gate removed

(b) Gate removed with unused output

(c) Block removed with unused output

1

xy

xy

Figure. Some re-writing optimizations.

Removed

(a)

(b)

(c)

Page 79: Implementing Complex Algorithms in FPGAs

82

Conditional Re-Writing

> Logical equivalence by testing for impossible Conditions

Gates removed for circuit with output independent of y

x

y

Figure. Conditional re-writing.

x

y

Page 80: Implementing Complex Algorithms in FPGAs

83

Common Sub-Expression Elimination> Test for common logic

Duplicate AND gate removed

xy

Figure. Common sub-expression elimination.

xy

Page 81: Implementing Complex Algorithms in FPGAs

84

DK1 Optimisation Settings

Page 82: Implementing Complex Algorithms in FPGAs

85Nov02

Customer Highlights

Consultancy, Services, Projects

Case studies

Page 83: Implementing Complex Algorithms in FPGAs

86

Celoxica Expertise

> Technical Strengths Design Methodologies and Hardware Compiler technology FPGA board design and prototyping Image, Data processing and Multimedia

Encryption Compression/Decompression Video Processing

Telecommunications Routers/Switches Protocol stacks – IPv6, VoIP (H323, SIP), ATM Software defined radio – UMTS, 3G, DAB

> Business Consultancy Analysis, Marketing and Strategy Venture capital Services and Support

Page 84: Implementing Complex Algorithms in FPGAs

87

Marconi Celoxica Technology Demonstrator> Internet Reconfigurable Hardware from Software

FPGA based, no microprocessor or operating system Different applications from the same hardware Can be reconfigured over internet to new applications

> MMT 2000 IP Phone MP3 player Games console Graphic display

Page 85: Implementing Complex Algorithms in FPGAs

88

High Speed Video Prototyping System

> Customer Requirement to shorten the evaluation time of video

filter algorithms as candidates for use in DTVB

> Solution FPGA-based system comprising:

Wealth of analogue and digital video I/O COTS boards and custom Development kit: DK and Video framework

libraries (SW/HW)

> Outcome Real-time evaluation system rather than

slow software models Algorithm Evaluation times reduced from

12 to 3-6 months Prototypes for ASIC process rather than

software models

FPG

A H

ost C

ard

D IM Eexpans ion

site

Com ponentA na logue

V ideoInte rfaceM odule

(HD ) S D IInterfaceM odule

SD

I Inp

ut

SD

I Out

put #

1S

DI O

utpu

t #2

Ana

logu

e In

put #

1

Ana

logu

e O

utpu

t #1

Ana

logu

e In

put #

2

Ana

logu

e O

utpu

t #2

Com ponentA na logue

V ideoInte rfaceM odule

Page 86: Implementing Complex Algorithms in FPGAs

89

EuroSkyWay Multimedia Satellite: Ground Traffic EuroSkyWay Multimedia Satellite: Ground Traffic SimulatorSimulator

• Services: 512, 2048 kb/s, 8...32 Mb/s (provider)• fixed and mobile users (aircraft, busses, vessels)• service launch in 2004

GT S C o ntro lle r

E V A LA N

G T S P a ram e te rD atab ase A cce ss M a na ge r

EC S

Access M ana ge r

Acce ss M ana g er

Acce ss M a na ge r

Acce ss M anag e r

EPS

E xternal trafficG /A

M& C

IP/ATM etxernal traff ic

GT S C o ntro lle r

E V A LA N

G T S P a ram e te rD atab ase A cce ss M a na ge r

EC S

Access M ana ge r

Acce ss M ana g er

Acce ss M a na ge r

Acce ss M anag e r

EPS

E xternal trafficG /A

M& C

IP/ATM etxernal traff ic

EuroSkyWay PHY Board

• for system verification and end-to-end perf. testing• generation of total network traffic (ATM, IP)• full implementation of ESW protocols (layer 1/2/3)• digital baseband transmission

SaT-B/C

PTN GTW PrT-A,-B

SaT-A

ServiceProvider Center

SaT-A

CollectiveUse

IndividualUse

SaT-B

SaT-C

ISL

160 Kbps

512/2048Kbps

160 Kbps

512 Kbps

2048 Kbps

32.768 Mbps 6.144 / 32.768 Mbps

InSS8 x 32.768 Mbps

To/FromSupportedNetworks

CLUSTER1

1B1A

MCS

2 x32.768Mbps

NetworkOperation

Center

InSS

n x 32.768 Mbps

To/FromSupportedNetworks

Page 87: Implementing Complex Algorithms in FPGAs

90

JPEG2000 MQ encoder implementation

> SCSD version Slices 1,999 Device utilization 18% Speed (MHz)

115.5 Lines of code

330 Design time (days) 10 +2 Av cycles per code block (000’s)

108 Processing time (ms) 0.939 Simulation time for Lena jpeg 5 minutes

> Traditional HDL Implementation Slices 620 Device utilization 6% Speed (MHz) 76 Lines of code 800 Design time (days) * 30+ Av cycles per code block (000’s) 67.5 Processing time (ms) 0.888 Simulation time for Lena jpeg XXX

IBM Power PC

Wind River SBC405 GP

Xilinx Virtex

Proteus FPGA daughter card

Page 88: Implementing Complex Algorithms in FPGAs

91

Customer highlights "The DK1 suite enables us to work at a high level, quickly optimise C code for hardware implementation, prototype using FPGAs and will ultimately provide the HDL output for our ASIC design.“Shigeru Kawada, General Manager, NEC Electronics Singapore's technology centre

I visited Celoxica's headquarters. While there, I re-implemented our existing VHDL solution using the DK1 suite in just one day. I was hooked.“Jan Mennekens, chief technical officer M-TEC WIRELESS.

A new joint development team to create powerful, flexible and scaleable application specific servers was announced today. Celoxica Ltd, Motorola and StrongBow Technologies are working together to create servers that embed applications, such as transaction processing for credit cards, directly in hardware“Without Celoxica’s tools, this would not have been possible,” Alan Prouse, CEO and founder of StrongBow Technologies.

"The real value of the Celoxica tools is the quick re- engineering capability and smooth transfer to a production platform. The DK1 methodology allows us to accomplish tasks in a time frame that conventional design methods cannot handle.” Dennis Hazel, Director of Engineering, Foxboro

"Our evaluation of DK1 clearly demonstrated that the flow increases our engineering throughput, and allows us to make better use of our scarce hardware engineering resources. Using Celoxica's Handel-C to hardware flow, our software engineers can take a software solution through to hardware allowing the hardware designer to focus on system integration and optimisation." Andy Davey, senior engineer at Cogent Defence Systems

“Our original project plan was slated for 12-18 months using the traditional HDL design methodology. By adopting the Handel-C high level design language methodology, we were able to finish the project in 6 months with DK1 design suite and Xilinx ISE software targeting Xilinx VII-6000 FPGAs. We put in minimum development resource, but still met the design specification, timing far ahead of the schedule. Anyone can use the DK1 design suite to design efficient hardware.”Gary Mallaley, Manager of Strategy Development at Northrop Grumman.

Page 89: Implementing Complex Algorithms in FPGAs

92Nov02

CUP:Celoxica University Programme

Recent Highlights

Page 90: Implementing Complex Algorithms in FPGAs

93

Introduction to CUP

> CUP has been active since the company was formed

> 700 universities worldwide registered with a multi-disciplinary user base

> Strategic relationship with XUP

> University specific products and services Heavily discounted

> Focused upon supporting innovative teaching and research

> Comprehensive Website www.celoxica.com/programs/university/index

> Register Now!

Page 91: Implementing Complex Algorithms in FPGAs

94

Benefits to Universities

> Rapid Design Exploration Fit more interest into time dependent project work through rapid

prototyping and productivity improvements Port protoyped C designs to Handel-C for implementation in FPGA’s

> For Computer Science disciplines Familiar software environment Parallel programming environment Computer architecture exploration – build your own instruction sets Exploring hardware accelerated systems

> For EE disciplines Cycle accurate interactive simulation SW/HW co-design, system design and SOC Integration with HDL’s

> Creates a bridge for increased collaboration between different disciplines

Page 92: Implementing Complex Algorithms in FPGAs

95

Update on University activity

> Research Articles: Customising Floating-Point Designs, Imperial College, Xilinx. Accelerating Radiosity Calculations using Reconfigurable Platforms,

Altaf Abdul Gaffar and Wayne Luk, Imperial College

A Hardware Implementation of a Genetic Programming System Using FPGAs and Handel-C, Peter Martin, University of Birmingham

> Teaching Programmes VDEC Japan now support DK1/Handel-C HARDWARE/SOFTWARE CO-DESIGN: A SHORT COURSE FOR

UNBELIEVERS, A. Downton et al, University of Essex

Page 93: Implementing Complex Algorithms in FPGAs

96

IGOL Framework

What is it?

> COM based Framework for Development and Distribution of Hardware Acceleration

Testing and debugging for development

Runtime services and packaging for deployment

> Application Examples Premier, Photoshop, WinAmp,

VirtualDub, DirectShow

Demonstrates

> Ease of Development and Deployment of Hardware Acceleration

> Separation of concerns Hardware developers only

develop hardware Application developers only

develop software

> Re-use of hardware and software components

Simply updating and patching Automatic application support

for new componentsAdobe Acrobat

Document