optimal fsmd partitioning for low power

25
OPTIMAL FSMD PARTITIONING FOR OPTIMAL FSMD PARTITIONING FOR LOW POWER LOW POWER Nainesh Agarwal and Nikitas Dimopoulos Electrical and Computer Engineering University of Victoria

Upload: lance

Post on 17-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

OPTIMAL FSMD PARTITIONING FOR LOW POWER. Nainesh Agarwal and Nikitas Dimopoulos Electrical and Computer Engineering University of Victoria. Summary. Power and energy Power gating Partitioning as means to achieve optimal power gating What next. Computation Power and Energy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OPTIMAL FSMD PARTITIONING FOR LOW POWER

OPTIMAL FSMD PARTITIONING FOR OPTIMAL FSMD PARTITIONING FOR LOW POWERLOW POWER

Nainesh Agarwal and Nikitas DimopoulosElectrical and Computer Engineering

University of Victoria

Page 2: OPTIMAL FSMD PARTITIONING FOR LOW POWER

SummarySummary

Power and energyPower gatingPartitioning as means to achieve

optimal power gatingWhat next

Page 3: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Computation Power and EnergyComputation Power and Energy

What is the minimum energy a computation can expend?

Are we there yet?

Page 4: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Computation Power and Energy cont’dComputation Power and Energy cont’d

Feynman gives a relation between free energy and computation rate for reversible computation– E = kTlogr– Where r is the computation rate.

This means that at the limit, we may expend zero energy (when r =1) but then the computation will take infinitely long.

Page 5: OPTIMAL FSMD PARTITIONING FOR LOW POWER

For irreversible computation, E=kTblog2– Where b is the number of bits involved in

the computation (entropy)

Computation Power and Energy cont’dComputation Power and Energy cont’d

Page 6: OPTIMAL FSMD PARTITIONING FOR LOW POWER

In both cases, these quantities are wxceptionally small. – k =1.3806504×10−23 J/K

At T=300ºK, kT= 4.14x10-21JA 50W 3GHz processor, in one cycle, consumes 1.65x10-8J

Computation Power and Energy cont’dComputation Power and Energy cont’d

Page 7: OPTIMAL FSMD PARTITIONING FOR LOW POWER

DSPstone benchmarks synthesized in 180 nm and 90 nm technologies

Computation Power and Energy cont’dComputation Power and Energy cont’d

Page 8: OPTIMAL FSMD PARTITIONING FOR LOW POWER

DSPstone dynamic energyDSPstone dynamic energy

Dynamic Energy

0

5E-11

1E-10

1.5E-10

2E-10

2.5E-10

3E-10

3.5E-10

0 100 200 300 400 500 600

Simulation Period (ns)

Energy (J)

180nm

Gen Purp 90nm

High Perf 90nm

Page 9: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Total Energy

0.E+00

2.E-10

4.E-10

6.E-10

8.E-10

1.E-09

1.E-09

1.E-09

0 100 200 300 400 500 600

Simulation Period (ns)

Energy (J)

180nmGen Purp 90nmHigh Perf 90nm

3.86x10-11 J

DSPstone total energyDSPstone total energy

Page 10: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Computational energy is far above the theoretical minimum (by more than 10 orders of magnitude)

Technological drive reduces total energy (an order of magnitude per generation)

Leakage power has become an issue Power gating may provide efficiencies to

further scale the technology

Computation Power and Energy cont’dComputation Power and Energy cont’d

Page 11: OPTIMAL FSMD PARTITIONING FOR LOW POWER

PartitioningPartitioning

Controller and datapath are considered together

Problem is formulated as – Integer Linear

Programming– Non-linear programming

solved using simulated annealing

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 12: OPTIMAL FSMD PARTITIONING FOR LOW POWER

NotationNotation

si represents a state of a FSMD vk represents a variable associated

with one or more states A variable vk is considered to be shared between

two states si and sj if the variable is read and/or written at both states

Tij Is the total number of bits of all variables shared by states si and sj

Eij is 1 if there is a transition between states si and sj, otherwise it is 0.

Page 13: OPTIMAL FSMD PARTITIONING FOR LOW POWER

ILP formulationILP formulation

Minimizes the number of bits that are shared between the partitions and the number of times that control could between the partitions– sij is 1 if both states si and

sj are in the same partition.

Otherwise, it is 0.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 14: OPTIMAL FSMD PARTITIONING FOR LOW POWER

ILP formulation - completeILP formulation - complete

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 15: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Simulated Annealing formulationSimulated Annealing formulation

xi is -1 if state si is in the left partition, and it is 1 if si is in the right partition

These quantities count the number of variable bits and transition edges shared between the two partitions

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 16: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Simulated Annealing formulationSimulated Annealing formulation

simplification steps

Observe that is

constant (the total number of variable-bits)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Tiji, j=1

S

Page 17: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Simulated Annealing formulationSimulated Annealing formulation

Minimizes both the shared bits and the transition edges.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 18: OPTIMAL FSMD PARTITIONING FOR LOW POWER

EvaluationEvaluation

Implemented four integer algorithms– 8-bit counter– 5/3 wavelet transform using lifting– multiplierless approximation to the eight-point Discrete Cosine

Transform (DCT)– Integer transform from the H.264 standard

Used CoDeL to implement the designs. Trace data were obtained from simulations using

Synopsys The ILP model was solved using the CPLEX solver

included in the AIMMS modeling environment The simulated annealing used MATLAB

Page 19: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Evaluation cont’dEvaluation cont’d

Power savings were estimated (no partitioned design implementation yet)– The static power savings depends on the size of

the sequential logic and the portion of time spent in each partition.

– The dynamic power savings depends on the number of bits that are not clocked while the partition is not powered mediated by the overhead due to data communication when the active partition changes.

Page 20: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Evaluation (Static Power Evaluation (Static Power savings)savings)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 21: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Evaluation (Dynamic Power Savings)Evaluation (Dynamic Power Savings)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 22: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Results (ILP)Results (ILP)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 23: OPTIMAL FSMD PARTITIONING FOR LOW POWER

Results (Simulated Annealing)Results (Simulated Annealing)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 24: OPTIMAL FSMD PARTITIONING FOR LOW POWER

DiscussionDiscussion

Results show that partitioning the control and datapaths could potentially save up to 50% of power (static power)

Some circuits could not partition (DWT includes one tight loop where it spends more than 90% of the time)

Simulated annealing and ILP (for the partitioned circuits) give identical results.

Simulated annealing is much faster.

Page 25: OPTIMAL FSMD PARTITIONING FOR LOW POWER

FutureFuture

Extend methodology to more than 2 partitions

Implement the partitioned FSMD machines and confirm the realized power savings

Lower energy!