integrating adaptive on-chip storage structures for reduced dynamic power

Post on 06-Jan-2016

27 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power. Steve Dropsho , Alper Buyuktosunoglu , Rajeev Balasubramonian , David H. Albonesi , Sandhya Dwarkadas , Greg Semeraro , Grigorios Magklis , and Michael Scott ECE and CS Departments University of Rochester. - PowerPoint PPT Presentation

TRANSCRIPT

1

Integrating Adaptive On-Chip Storage Structures for Reduced

Dynamic Power

Steve Dropsho,

Alper Buyuktosunoglu, Rajeev Balasubramonian,

David H. Albonesi, Sandhya Dwarkadas,

Greg Semeraro, Grigorios Magklis, and Michael Scott

ECE and CS Departments

University of Rochester

2

Why Adaptive Structures?

• General purpose uP are “one size fits all”

• But, needs vary across (within) applications

• Can save considerable energy by matching resources to the application

Objective: Less energy for same performanceby adapting storage structures to application

3

Related Work

• Adaptable cache– Balasubramonian et al., MICRO 2000– Dhodapkar and Smith, ISCA 2002

• Adaptable issue logic– Buyuktosunoglu et al., GLS VLSI 2001– Folegnani and Gonzalez, ISCA 2000

4

Common Themes

• A single adaptive structure

• Use of global information for feedback

• Exploration-based (caches)

5

Related Work (cont)

• Adaptable IQ, LSQ, and ROB– Ponomarev et al., MICRO 2001– Three (3) adaptable structures– Reconfigurations based on local state

6

Integrating Multiple Adaptive Structures

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

7

Challenges

• Multiple (9) adaptive structures creates state explosion problem

• Use of global information makes assigning cause and effect difficult

• Potential for additive performance effects among the structures

8

Approach: Local Management

• Local information for configuration decisions

• Tight control over performance variance

9

Part I: The Caches

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

10

The Accounting Cache

A access (primary)

B access (secondary)

• Sequential accesses, A then B• Save energy on A access hit• Swap blocks on A access miss

20 1 3

20 1 3

20 1 3

20 1 3

20 1 3 Swap

A1 B3

A2 B2

A3 B1

A4 B0

11

Most-Recently-Used Statistics

0 1 2 3

Way 1 2 3 4

Line A B C D

0 1 2 3

0 1 2 3

01 2 3

0 1 2 3

01 2 3

01 2 3

MRU StateTransitions

MRU[0]

MRU StateCounters

MRU[1]

MRU[2]

MRU[3]

Misses

3

2

1

0

0A

A

A

B

B

C

12

Configuration Evaluation

MRU[0] MRU[1] MRU[2] MRU[3] Misses

3 2 1 0 0

(lru)(mru)

Delay = 6 DA + 3 DB

Delay = 6 DA + 1 DB

Delay = 6 DA

Delay = 6 DA

Energy = 6 E1 + 3 E3

Energy = 7 E2

Energy = 6 E3

Energy = 6 E4BASE

13

Tolerance and the Bank Account

• Tolerance allows more delay than BASE– DTOL = DBASE (1 + TOL)

– TOL = {0.015, 0.062, 0.25} (1/64, 1/16, 1/4)

• Bank account allows accumulation of unused tolerance

• Use account credits in later intervals– Allows aggressive resizing– Amortizes mistakes over many intervals

14

Memory Hierarchy

20 1 3 20 1 3

20 1 3

L1I-Cache

(A/B)

L1D-Cache(A, no B)

L2Unified Cache

(A/B)

One PossibleConfiguration

15

Environment

• Simplescalar simulator

• Microarchitecture is similar to Alpha 21264

• Benchmarks are a mix of SPEC95, SPEC2K, and Olden

• Energy models for buffers and caches from Buyuktosunoglu et al., GLS VLSI 2001 and Balasubramonian et al., MICRO 2000

16

Cache Results

17

Part II: Queues, Regs, and ROB

L2UnifiedCache

ROBRename

map

FPQ

IPREG

IIQ

LSQL1

Dcache

Branchpredict

L1Icache

Integer

Memory

Floating Pt

FPREG

Int FUs

FP FUs

FetchQ

18

Resizable Queues/Reg File

m

Buffer

PN

P1

N partitions of m elements

19

Buffer SizingDistribution ofBuffer Size

0

0

0

Full

Full

Full

Grow buffer

Proper size

Precise shrink

ave

ave

• 8K cycle period• Tolerances:

• 1.5% (1/64)• 6.2% (1/16)• 25.0% (1/4)

WithLimited Histogramming

20

Resizing the Register File

• Issue: Do not know when registers expire

• Solution: To make reg file smaller, move values out of partition (P) to be turned off– First, inhibit new assignments to P– Next, use a software interrupt routine to move

values via normal rename logic mov r1 r1

– Register mappings automatically updated

21

Floating Point App Results

22

Summary Results

23

Conclusion

• Simultaneous adaptation of all major regular structures– Accounting cache

– Limited histogramming for buffers

– Adaptable register file

• Local control yet tolerable performance loss

• Future work– Augment local control with global control for bounded

performance loss

top related