iccad’01: november, 2001 instruction generation for hybrid reconfigurable systems ryan kastner,...

22
ICCAD’01: November, 2001 Instruction Generation for Instruction Generation for Hybrid Reconfigurable Hybrid Reconfigurable Systems Systems Ryan Kastner, Seda Ogrenci-Memik, Elaheh Bozorgzadeh and Majid Sarrafzadeh {kastner,seda,elib,majid}@cs.ucla.edu Embedded and Reconfigurable Systems Embedded and Reconfigurable Systems Group Group Computer Science Department Computer Science Department UCLA UCLA Los Angeles, CA 90095 Los Angeles, CA 90095

Post on 21-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

ICCAD’01: November, 2001

Instruction Generation for Instruction Generation for Hybrid Reconfigurable SystemsHybrid Reconfigurable Systems

Instruction Generation for Instruction Generation for Hybrid Reconfigurable SystemsHybrid Reconfigurable Systems

Ryan Kastner, Seda Ogrenci-Memik,

Elaheh Bozorgzadeh and Majid Sarrafzadeh

{kastner,seda,elib,majid}@cs.ucla.edu

Ryan Kastner, Seda Ogrenci-Memik,

Elaheh Bozorgzadeh and Majid Sarrafzadeh

{kastner,seda,elib,majid}@cs.ucla.edu

Embedded and Reconfigurable Systems GroupEmbedded and Reconfigurable Systems Group

Computer Science DepartmentComputer Science Department

UCLAUCLA

Los Angeles, CA 90095Los Angeles, CA 90095

Embedded and Reconfigurable Systems GroupEmbedded and Reconfigurable Systems Group

Computer Science DepartmentComputer Science Department

UCLAUCLA

Los Angeles, CA 90095Los Angeles, CA 90095

ICCAD’01: November, 2001

OutlineOutlineOutlineOutline Introduction

Programmability Hybrid Reconfigurable Systems Strategically Programmable System

Instruction Generation Uses in Hybrid Reconfigurable Systems Relation to Template Generation and Matching

Algorithm for Template Generation and Matching Experiments Conclusion

Introduction Programmability Hybrid Reconfigurable Systems Strategically Programmable System

Instruction Generation Uses in Hybrid Reconfigurable Systems Relation to Template Generation and Matching

Algorithm for Template Generation and Matching Experiments Conclusion

ICCAD’01: November, 2001

ProgrammabilityProgrammabilityProgrammabilityProgrammability Future systems need programmability multiple levels of

computation hierarchy

Computational Hierarchy:

Future systems need programmability multiple levels of computation hierarchy

Computational Hierarchy:

Gate LevelGate Level

ADD Register

MUL

Control

-Architecture -Architecture

LevelLevel

FU

MemoryRegister

Bank

Control

Architecture Architecture

LevelLevel

FU

ProgrammabilityProgrammability BitBit ByteByte Instruction Instruction

(8 – 128 bits)(8 – 128 bits)

Basic Unit of Basic Unit of ComputationComputation

Boolean OperationBoolean Operation

(and, or, xor)(and, or, xor)

Arithmetic OperationArithmetic Operation Functional OperationFunctional Operation

CommunicationCommunication Direct wires Direct wires connectionsconnections

Bundles of wires, Bundles of wires, registersregisters

Bus, memoryBus, memory

Hybrid Reconfigurable Systems have programmability at Hybrid Reconfigurable Systems have programmability at one or more levelsone or more levels

Hybrid Reconfigurable Systems have programmability at Hybrid Reconfigurable Systems have programmability at one or more levelsone or more levels

Register

ICCAD’01: November, 2001

TradeoffsTradeoffsTradeoffsTradeoffs

ADD Register

MUL

Control

FU

MemoryRegister

Bank

Control

FU

Register

Example Example PlatformPlatform

Types of Types of Programmable Programmable

UnitsUnits

Custom Custom instructions, instructions,

Register banksRegister banks

Datapath unit, Datapath unit, Control unit, RAMControl unit, RAMCLBs, LUTsCLBs, LUTs

Architecture Architecture levellevel

Micro-Micro-architecture architecture

levellevelGate levelGate level

Hybrid Reconfigurable Systems should find a happy mediumHybrid Reconfigurable Systems should find a happy medium

Tensilica, ImprovTensilica, ImprovChameleon Chameleon SystemsSystemsXilinx, AlteraXilinx, Altera

FlexibilityConfiguration TimeThousands

of cycles

Hundreds

of cycles

ICCAD’01: November, 2001

SPS - Strategically Programmable SystemSPS - Strategically Programmable SystemSPS - Strategically Programmable SystemSPS - Strategically Programmable System

Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric

Combine programmable units from gate, microarchitecture and architecture levels

Balance flexibility and configuration time

Embed (hard or soft) computational units – Versatile Programmable Blocks (VPB) - into FPGA-like fabric

Combine programmable units from gate, microarchitecture and architecture levels

Balance flexibility and configuration time

VPB VPB

VPB

Memory

Memory

Need automated method of determining the functionality of VPBs

Need automated method of determining the functionality of VPBs

ICCAD’01: November, 2001

SPS ArchitectureSPS Architecture

RoutingRouting

Arch.Arch.

Overview of SPSOverview of SPSOverview of SPSOverview of SPSSPS CompilerSPS Compiler

SPS Architecture GenerationSPS Architecture Generation

VPB VPB

SynthesisSynthesis

SPSSPS

Module Module

PlacementPlacement

Set of applications Set of applications specified in high level code specified in high level code

(c/c++, fortran, MOC)(c/c++, fortran, MOC)• Compile to low Compile to low level specificationlevel specification• Determine VPB Determine VPB functionalityfunctionality

ICCAD’01: November, 2001

VPB Instruction GenerationVPB Instruction GenerationVPB Instruction GenerationVPB Instruction Generation Given a set of applications, what computation should be

implemented on VPBs?

Given a set of applications, what computation should be implemented on VPBs?

RA

M

VPB

VPBs?

Want complex, commonly occurring computation patterns

Look for computational patterns at the instruction level Basic operation is add, multiply, shift, etc.

Want complex, commonly occurring computation patterns

Look for computational patterns at the instruction level Basic operation is add, multiply, shift, etc.

Set of applicationsSet of applications

VPB

RAM

ICCAD’01: November, 2001

Problem DefinitionProblem DefinitionProblem DefinitionProblem Definition

Determining VPB functionality requires regularity extraction

Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs

Each application can be specified by collection of graphs (CDFGs)

Templates are implemented as VPBs Two related sub-problems:

Template Matching Template Generation

Determining VPB functionality requires regularity extraction

Regularity Extraction - find common sub-structures (templates) in one or a collection of graphs

Each application can be specified by collection of graphs (CDFGs)

Templates are implemented as VPBs Two related sub-problems:

Template Matching Template Generation

ICCAD’01: November, 2001

Template Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’n

Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V,E), find every subgraph of G that is

isomorphic to any Ti

Problem 1: Given a directed, labeled graph G(N, A), a library of templates, each of which is a directed labeled graph Ti(V,E), find every subgraph of G that is

isomorphic to any Ti

+

*

*

+

+

* +

+

* &

+ ||

+

+

&

* *

Templates T+ *

* +

+

&

% +

+

%

* *

* & ||

* * +

+ +

Directed Labeled Graph G

T1 T2 T3

T4T5T6

ICCAD’01: November, 2001

Template Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’nTemplate Matching – Formal Def’n Problem 2: Given an infinite number of each set of

templates = T1, … , Tk and an overlapping set of

subgraphs of the given graph G(N,E) which are isomorphic to some member of ; minimize k as well as xi where xi is the number of templates of type Ti used

such that the number of nodes left uncovered is the minimum.

Problem 2: Given an infinite number of each set of templates = T1, … , Tk and an overlapping set of

subgraphs of the given graph G(N,E) which are isomorphic to some member of ; minimize k as well as xi where xi is the number of templates of type Ti used

such that the number of nodes left uncovered is the minimum.

+ *

* +

+

&

% +

+

%

+ *

* & ||

* * +

+ +

ICCAD’01: November, 2001

Template GenerationTemplate GenerationTemplate GenerationTemplate Generation

Templates may not always be given as input

An automatic regularity extraction algorithm must develop it’s own templates

Generate a set of templates such that: Number of templates is minimized Covering of the graph is maximized

Templates may not always be given as input

An automatic regularity extraction algorithm must develop it’s own templates

Generate a set of templates such that: Number of templates is minimized Covering of the graph is maximized

ICCAD’01: November, 2001

Related WorkRelated WorkRelated WorkRelated Work

Useful in a wide variety of CAD applications

Data path regularity [Chowdhary98], [Callahan99]

Scheduling [Ly95] System partitioning [Rao93] Low power design [Mehra96] Soft macros – CPR [Cadambi99] for PipeRench

architecture

Useful in a wide variety of CAD applications

Data path regularity [Chowdhary98], [Callahan99]

Scheduling [Ly95] System partitioning [Rao93] Low power design [Mehra96] Soft macros – CPR [Cadambi99] for PipeRench

architecture

ICCAD’01: November, 2001

An Algorithm for Simultaneous An Algorithm for Simultaneous Template Generation and MatchingTemplate Generation and Matching An Algorithm for Simultaneous An Algorithm for Simultaneous Template Generation and MatchingTemplate Generation and Matching

1.1. Given a labeled digraph Given a labeled digraph G(V, E)G(V, E)

2.2. # C is a set of edge types# C is a set of edge types

3.3. C C

4.4. while (stop_conditions_not_met(while (stop_conditions_not_met(GG))))

5.5. C C profile_graph( profile_graph(GG))

6.6. cluster_common_edges(cluster_common_edges(G, CG, C))

1.1. Find the most common Find the most common edge typeedge type

2.2. Contract common Contract common edgesedges

3.3. Repeat until stopping Repeat until stopping condition metcondition met

Formal DefinitionFormal DefinitionFormal DefinitionFormal Definition Informal DefinitionInformal DefinitionInformal DefinitionInformal Definition

ICCAD’01: November, 2001

Explanation of AlgorithmExplanation of AlgorithmExplanation of AlgorithmExplanation of Algorithm

Edge contraction: Merge adjacent nodes and maintain connectivity

Edge contraction: Merge adjacent nodes and maintain connectivity

Stopping Conditions Reach certain number of templates Graph sufficiently covered No frequently occurring edge type

Stopping Conditions Reach certain number of templates Graph sufficiently covered No frequently occurring edge type

Profile Edges: Find most common edge types Profile Edges: Find most common edge types

Contract Contract

EdgeEdge

+ *

*

*

*

+

*

*

**

+ *

*

*

**

*Most Common Most Common

Edge TypeEdge Type

ICCAD’01: November, 2001

Edge 1 Edge 2 Edge 3Edge 4

Algorithm in ActionAlgorithm in ActionAlgorithm in ActionAlgorithm in Action

* * *

* *

>> %

*

&+

Iteration 2* * *

* *

>> %

*

&+

MIS

Edge 2

Conflict GraphConflict Graph

Edge 1Edge 3Edge 4

Create Conflict Graph

Determine MIS

* * *

* *

>> %

*

&+

Contract edges 2 and 4

TemplatesTemplates

* * *

* *

>> %

*

&+

Contract edges

TemplatesTemplates

ICCAD’01: November, 2001

Algorithm SummaryAlgorithm SummaryAlgorithm SummaryAlgorithm Summary

Algorithm can be generalized and used in a variety of applications

Easily extended to hypergraphs

Input/output pin restrictions can easily be added

Performs template generation and matching simultaneously

Algorithm can be generalized and used in a variety of applications

Easily extended to hypergraphs

Input/output pin restrictions can easily be added

Performs template generation and matching simultaneously

We target algorithm towards VPB We target algorithm towards VPB generation in SPSgeneration in SPS

We target algorithm towards VPB We target algorithm towards VPB generation in SPSgeneration in SPS

ICCAD’01: November, 2001

Experimental SetupExperimental SetupExperimental SetupExperimental Setup

Set of applicationsSet of applicationsspecified in Cspecified in C

SUIFSUIF

&&

Machine-SUIFMachine-SUIF

Control Flow GraphControl Flow Graph

+ *

+*

+

Control Dataflow GraphControl Dataflow Graph

Dataflow Dataflow

Graph Graph

GenerationGeneration

PassPass

ICCAD’01: November, 2001

Perform Perform

Template Template

Generation Generation

and Matchingand Matching

Experimental SetupExperimental SetupExperimental SetupExperimental Setup

MediaBench FilesMediaBench Files+ *

+*

+

Control Dataflow GraphControl Dataflow Graph

Compile to CDFGs

GatherGather

Statistics:Statistics:

Graph Coverage,Graph Coverage,

Num. TemplatesNum. Templates

ICCAD’01: November, 2001

Benchmark C File Description

mpeg2 motion.c Motion vector decoding

mpeg2 getblk.c DCT block decoding

adpcm adpcm.c ADPCM to/from 16-bit PCM

epic convolve.c 2D general image convolution

jpeg jctrans.c Transcoding compression

jpeg jdmerge.c Color conversion

rasta fft.c Fast Fourier Transform

rasta noise_est.c Noise estimation functions

gsm gsm_decode.c GSM decoding

gsm gsm_encode.c GSM encoding

Experimental Setup - BenchmarksExperimental Setup - BenchmarksExperimental Setup - BenchmarksExperimental Setup - Benchmarks

Selected files from MediaBench Selected files from MediaBench

ICCAD’01: November, 2001

Similarity Across ApplicationsSimilarity Across ApplicationsSimilarity Across ApplicationsSimilarity Across ApplicationsOper-ation

MediaBench file name

motion jdmerge getblk gsm_dec jctrans

ADD 50.3% 84.6% 44.5% 29.6% 84.6%

MUL 36.3% 13.8% 24.0% 22.4% 13. 8%

Template Coverage

MUL- MUL

0.0% 0.0% 1.3% 0.0% 0.0%

ADD-ADD

14.5% 9.1% 3.2% 3.6% 9.1%

ADD-MUL

0.0% 0.4% 0.6% 0.0% 0.4%

MUL-ADD

36.3% 13.0% 21.5% 22.4% 13.0%

ICCAD’01: November, 2001

Experimental ResultsExperimental ResultsExperimental ResultsExperimental Results

30%

40%

50%

60%

70%

80%

90%

0 10 20 30number of templates

% n

od

es

co

ve

red

No restrictions

Simple

Techniques Simple – restrict templates to two operations No restrictions – unlimited amount of operations

Stopping condition: most common edge occurs < x% (x5-25)

Techniques Simple – restrict templates to two operations No restrictions – unlimited amount of operations

Stopping condition: most common edge occurs < x% (x5-25)

ICCAD’01: November, 2001

SummarySummarySummarySummary

Systems need programmability at multiple levels of the computational hierarchy

Introduced SPS as a Hybrid Reconfigurable System Developed an instruction generation algorithm to

determine VPB functionality Showed that common templates can be found across a

similar set of applications An efficient covering possible using simple templates

Future work: Create methods to uncover more complex templates

Systems need programmability at multiple levels of the computational hierarchy

Introduced SPS as a Hybrid Reconfigurable System Developed an instruction generation algorithm to

determine VPB functionality Showed that common templates can be found across a

similar set of applications An efficient covering possible using simple templates

Future work: Create methods to uncover more complex templates