design space exploration using parameterized cores

40
RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS – UNIVERSITY OF WINDSOR Design Space Exploration Using Parameterized Cores Ian D. L. Anderson M.A.Sc. Candidate March 31, 2006 Supervisor: Dr. M. Khalid Design Space Exploration Using Parameterized Cores 1

Upload: others

Post on 06-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS – UNIVERSITY OF WINDSOR

Design Space Exploration UsingParameterized Cores

Ian D. L. AndersonM.A.Sc. Candidate

March 31, 2006

Supervisor: Dr. M. Khalid

Design Space Exploration Using Parameterized Cores 1

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

OUTLINE

2

• Introduction• Designing Systems Using IP Cores• Design Space Exploration (DSE)• Genetic-based DSE Case Study• Results

Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Introduction

3Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Embedded Systems• An embedded system: A device that utilizes

computational hardware and application-specific software to carry out a specific task.

• Often hidden from the user of the device (i.e. “embedded” within a larger system)

4Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Major Components of an Embedded System

• Digital Hardware:– Microprocessor or µC– Application-specific

hardware generally used for accelerating time-critical tasks

• Embedded software running on the µP or µC

5Design Space Exploration Using Parameterized Cores

Application-specific

hardware

Software runningon CPU

Embedded CPU

Memory& I/O

Embedded System

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The Challenges of Designing Embedded Systems

• Improvements in IC process tech. enable more complex and intricate designs to be realized

• Therefore, designing from scratch is too expensive and time-consuming for many people

• Traditional or “co-design”methodology

6Design Space Exploration Using Parameterized Cores

Final Embedded SystemFinal Embedded System

Hardware/Software Partitioning

Hardware/Software Partitioning

Hardware Design

Hardware Design

Hardware Synthesis

Hardware Synthesis

Placement & Routing

Placement & Routing

Software Development

Software Development

CompilerCompiler

Assembler/Linker

Assembler/Linker

Integration & TestingIntegration & Testing

System Specification

System Specification

HW/SW Interface Design

HW/SW Interface Design

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Designing Systems Using IP Cores

7Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Core-based Design

• It makes sense for many designers to use and re-use pre-designed and pre-tested hardware and software components

• These are generally known as “Intellectual Property (IP) Cores”

• Reduce design time at the expense of some flexibility and area/performance penalty

8Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Three Classes of (Hardware) IP Cores• Soft cores: components

described in a hardware description language (HDL)

• Firm cores: gate-level netlist that is ready for technology mapping, placement and routing, etc.

• Hard cores: pre-placed and pre-routed circuits

9Design Space Exploration Using Parameterized Cores

Hard CoreHard Core

Firm CoreFirm Core

Circuit Layout

Logic primitives(gates, FF’s, etc.)

Soft CoreSoft CoreHDL description

RTLevelRT

Level

LogicLevelLogicLevel

CircuitLevel

CircuitLevel

HDL Synthesis

Tech. mapping, placement & routing, etc.

IncreasingAbstraction& Flexibility

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Soft IP Cores

• Hardware components described in a hardware description language (HDL) such as VHDL, Verilog, etc.

• Some advantages of soft cores:• Higher level of abstraction – easier to

understand• More flexible – designers can change the core

by editing source code or selecting parameters (more on that later)

• Platform independent – can be synthesized for any IC technology, incl. FPGAs, ASICs, etc.

• More immune to obsolescence

10Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Popular Examples of Soft IP Cores

• Altera Corp. – Nios and Nios II processors– Customizable embedded

RISC microprocessors targeting certain AlteraFPGAs

• Xilinx Inc. – MicroBlaze– Flexible 32-bit

microprocessor for XilinxFPGA families

• Tensilica Xtensa• Open-source cores:

– LEON2 and LEON3 by Gaisler Research

– www.opencores.org

11Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Parameterized Cores

• In order to increase core flexibility, many IP cores (esp. soft cores) are “parameterized”

• Certain aspects of the hardware’s architecture can be changed so that the core can be tailored to suit a specific application more closely• E.g. Bit-widths, functional unit

implementation, etc.• “Parameters” are essentially variables with a

finite set of possible values• Assigning values to all parameters of a core

produces one “configuration”

12Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Classification of Core Parameters• Static or dynamic parameters:

• Static – Must be set prior to chip fabrication (e.g. HDL generic statements)

• Dynamic – Can be set after chip fabrication provided the chip has proper facilities built-in• Extreme example: FPGAs

• Two or more parameters can share interdependencies with each other:• Hard interdependency: requires simultaneous

parameter selection for a valid configuration• Soft interdependency: value selection should be

done simultaneously in order to create an optimal configuration

13Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Classification of Core Parameters (Cont’d)• Classification by Function:

• Parameters affecting:1. The bit-width of parts of the core

• Datapath width, width of address bus, etc.2. How many sub-components are

instantiated• E.g. # of registers in register file

3. The type or implementation of components being instantiated• E.g. Multiplier implementation

4. How components are connected together5. Some combination of 1, 2, 3 and 4

14Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Design Space Exploration (DSE)

15Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

What is the Design Space?• “Design space” – the set

of all possible HW and SW configurations that will achieve the system’s required functionality

• Configurations are evaluated in terms of how well they meet “objectives”

• Design space often contains a large number of possibilities that are sub-optimal

• Therefore the design space should be “explored” to determine the best configuration for the job

16Design Space Exploration Using Parameterized Cores

Objective 2

Objective 1

Design Space

The Design Space can be picturedAs an n-dimensional space, wheren is the number of objectives. Forexample, a 2-objective space:

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

DSE and Multi-objective Optimization• DSE is essentially a

multi-objective optimization problem

• The designer must balance a set of competing objectives– i.e. min. chip area &

power while max. performance

• Often, there is not one single “optimal”configuration, but rather a set called the “Pareto-optimal” set

17Design Space Exploration Using Parameterized Cores

Objecti

ve 1

Objective 2

Obj

ectiv

e 3

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Pareto-Optimality

• When optimizing several objectives at once, a configuration is Pareto-optimal if you cannot improve on one objective without sacrificing another

• Example from geometry: optimize the area of three non-overlapping circles, A, B and C, within the area of the triangle

18Design Space Exploration Using Parameterized Cores

Vilfredo Pareto

A

B

C

Pareto-optimal

Pareto-optimal

A

B

C

NOT Pareto-optimal(Area of C can be increased without reducing A or B)

A

B

C

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Pareto-Optimality (Cont’d)• If you do not know the

relative priority of each objective, then you are left with a set of “non-dominated” solutions

• No one solution is better than another, unless one knows which objectives have priority (e.g. it may be most important that circle A be larger)

19Design Space Exploration Using Parameterized Cores

Objective 2

Objective 1

Design Space

The Pareto-optimal set lieson the lower boundary of thedesign space known as the“Pareto-optimal front”.

Pareto-optimalfront

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

DSE Using Parameterized Cores

• Many parameterized cores have multiple parameters and each parameter can have numerous possible values

• This can lead to potentially thousands, millions (or more) of different possible configurations

• Each parameter can affect the area, performance and power consumption of the core • Many configurations are sub-optimal

• The goal of DSE is to determine the set of combinations of parameter values that constitute the Pareto-optimal set of configurations

• The “best” configuration for an application can be chosen from that set

20Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Automated Approaches to DSE

• Obviously, exhaustively searching thedesign space is tedious and a big wasteof time when the number of parametersis large

• Therefore, a lot of research has focused on automating the process

• One of the most widely known and applied approaches involves using some form of a genetic or evolutionary-based algorithm

21Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Genetic and Evolutionary Algorithms

• A class of optimization algorithms that have been applied to a wide array of problems

• Many variations, but they all have one thing in common: they take their inspiration from the field of biological sciences

• They attempt to emulate the biological process of natural selection

• They have found to be good at solving multi-objective optimization problems

22Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Genetic-based DSE Case Study

23Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Objectives of Case Study

• Preliminary study with the following objective:• To investigate the feasibility of applying a genetic

algorithm-based approach to a parameterized soft IP core with a sizeable design space in order to approximate its Pareto-optimal set of configurations

• Altera Nios soft-core processor was chosen as the test-case

• Ultimately this technique will be applied to other parameterized components in order to assist designers in deriving application-specific processing cores

• Nios is just a convenient test case

24Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

A Bit About the Altera Nios Processor

• Popular embedded RISC processor targeting Altera FPGAs

• Flexible; with the parameters shown at right

• With just the processor, there are a total of 15,696 possible configurations

25Design Space Exploration Using Parameterized Cores

Yes or NoSupport OCI Module

Yes or NoSupport interrupts/traps

Yes or NoSupport RLC/RRC

More stalls, Fewer stallsPipeline optimization

Software, MSTEP, MULInteger multiplication

Off, 1, 2, 4, 8 or 16 kBData cache size

Off, 1, 2, 4, 8 or 16 kBInstruction cache size

Read-only or writableWVALID register

128, 256 or 512Register file size

LE’s or ROMInstruction decoder

16 or 32 bitDatapath width

Possible Vals.Parameter

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The SEAMO Algorithm

• The Simple Evolutionary Algorithm for Multi-objective Optimization (SEAMO) by C. Valenzuela (2002) was chosen as the exploration algorithm

• It is population-based – it maintains a set or “population” of configurations rather than just a single solution

• As the algorithm progresses, it gradually “evolves”the population until it converges towards the Pareto-optimal set

26Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

How it works…

• Parameters of the core are represented as “genes” – discrete variables (pi) with a finite set of possible values

• Configurations are represented as strings of n genes called “chromosomes”

27Design Space Exploration Using Parameterized Cores

p1 p2 p3 pn…

“Chromosome”

“Gene”

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

How it works… (Cont’d)

• The “population” is made up of a set of Nchromosomes

• Each chromosome has an “objective vector”which stores the values of each objective separately

• There can be any number of objectives

28Design Space Exploration Using Parameterized Cores

p1 p2 pn…1

p1 p2 pn…

p1 p2 pn…

p1 p2 pn…

2

3

N

o1 o2

o1 o2

o1 o2

o1 o2

“Population” “Objectives”

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The Algorithm - Initialization

• Create an initial population of Nindividuals randomly

• Evaluate the objective vectors for each chromosome

• Record the “best-so-far” values for each objective

29Design Space Exploration Using Parameterized Cores

p1 p2 pn…1

p1 p2 pn…

p1 p2 pn…

p1 p2 pn…

2

3

N

o1 o2

o1 o2

o1 o2

o1 o2

“Population” “Objectives”

o1 o2Best-so-far:

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The Algorithm – Offspring Creation• For each

chromosome in the population:– Pair with another,

randomly selected individual

– Apply the “crossover” operator to produce an “offspring”

– “Mutate” the offspring

30Design Space Exploration Using Parameterized Cores

p1 p2 pn…Parent 1

p1 p2 pn…Parent 2

+Random cut-point

p1 p2 pn…Offspring

Crossover

Mutationp1 p2 pn…

Gene selected at random and Changed to another possible value

Offspring

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The Algorithm – Replacement Strategy• Parent chromosomes are replaced by their offspring based

on three rules:1. Parents are replaced only by their own offspring2. Offspring only replace parents if they are superior

(“elitist strategy”)3. Duplicates in the population are deleted

• The newly formed offspring is evaluated based on its objectives

• One of the two parents is replaced by the offspring if the offspring:• Improves on one of the “best-so-far” values• Dominates a parent (i.e. is superior in all objectives)

• If the offspring already exists in the population, then it is deleted

31Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

The Algorithm – Iteration

• After all individuals in the population have had a chance to produce offspring, one “generation” of the algorithm has passed

• The algorithm will pass through several generations before the population converges

• The population size, N, and the number of generations, G, constitute the parameters of the algorithm

• Also the number of genes in the chromosome, and the number of objectives can be changed to fit different problems

32Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Evaluation of Configurations• Each individual in the population needs to

evaluated in terms of its objectives• In this case study, objectives are to:

• Minimize equivalent LE usage on StratixFPGA

• Minimize critical path delay• 47 different Nios configurations were

synthesized; area and delay data were collected from Quartus II reports

• Using these data, area and delay estimation equations were established using n-dimensional regression techniques

33Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Results(To be presented at CCECE06)

34Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Implementation of the Algorithm

• Testing of objective functions for 20 random test cases:• Area estimation: within 7.22% of actual

values (on average)• Delay estimation: within 7.58% of actual

values (on average)• Estimation equations were integrated into a

C++ implementation of the SEAMO algorithm• The algorithm was run for various population

sizes to determine suitable values

35Design Space Exploration Using Parameterized Cores

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Algorithm Convergence CharacteristicsAverage LE Usage Vs. Generation

1500

2000

2500

3000

3500

4000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Generation

Ave

rage

Equ

ival

ent L

E's Population = 10

Population = 15Population = 20Population = 25Population = 30Population = 35Population = 40Population = 45Population = 50

36Design Space Exploration Using Parameterized Cores

Average Delay Vs. Generation

10

12

14

16

18

20

22

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Generation

Ave

rage

Del

ay (n

s)

Population = 10Population = 15Population = 20Population = 25Population = 30Population = 35Population = 40Population = 45Population = 50

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Results

37Design Space Exploration Using Parameterized Cores

Area Versus Critical Path Delay forInitial and Evolved Population

1000

2000

3000

4000

5000

6000

7000

10 15 20 25 30Critical Path Delay (ns)

Are

a (E

quiv

alen

t LE'

s)

Initial Population After 20 Generations

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Conclusions and Future Work

38Design Space Exploration Using Parameterized Cores

• The purpose of this study was to investigate the feasibility of using a genetic algorithm to design embedded systems• It is still a work in progress…

• Genetic algorithms may be useful in assisting designers to make good decisions when deriving application-specific components from parameterized cores

• Current work involves the development of a tool that will utilize a genetic approach to semi-automatically generate application-specific soft processors from parameterized components

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

References

39Design Space Exploration Using Parameterized Cores

• [1] Altera Corporation, “Nios 3.0 CPU datasheet", October 2004, Version 2.2

• [2] Altera Corporation Website, www.altera.com, February 2006

• [3] Altera Corporation, “Nios embedded processor 16-bit programmer's reference manual", January 2004, Version 3.1

• [4] Altera Corporation, “Nios embedded processor 32-bit programmer's reference manual", January 2003, Version 3.1

• [5] Altera Corporation, “Avalon bus specification reference manual", July 2003, Version 2.3

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

References (Cont’d)

40Design Space Exploration Using Parameterized Cores

• [6] C. L. Valenzuela, “A simple evolutionary algorithm for multi-objective optimization (SEAMO)," Proceedings of the 2002 Congress on Evolutionary Computation, 2002, CEC '02, vol. 1, 12-17 May 2002, pp. 717-722

• [7] P. K. Jha and N. D. Dutt, “Rapid estimation for parameterized components in high-level synthesis,“ IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 1, issue 3, Sept. 1993, pp. 296-303

• [8] P. Yiannacouras, “The microarchitecture of FPGA-based soft processors," Master's Thesis, University ofToronto, 2005, pp. 47-48