guaranteeing hits to improve the efficiency of a small ...€¦ · guaranteeing hits to improve the...

Post on 18-Oct-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Guaranteeing Hits to Improve the Efficiency ofa Small Instruction Cache

Stephen Hines, David Whalley, and Gary Tyson

Department of Computer ScienceFlorida State University

December 5, 2007

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

L0/Filter Instruction Caches

Programs spend a great deal of time executing small loops

Small, direct-mapped→ Better energy/access

Low hit rate→Worse execution times (4–8%)Power/Performance tradeoff in embedded systems

Prediction can reduce some of this cycle penalty ...

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

L0/Filter Instruction Caches

Programs spend a great deal of time executing small loops

Small, direct-mapped→ Better energy/access

Low hit rate→Worse execution times (4–8%)Power/Performance tradeoff in embedded systems

Prediction can reduce some of this cycle penalty ...But guarantees will give us the best of both worlds

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Instruction Fetch

Types of Instruction Fetch

Sequential FetchesNon-sequential (branching) fetches

Direct branchesIndirect branches

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Instruction Fetch

Types of Instruction Fetch

Sequential FetchesNon-sequential (branching) fetches

Direct branchesIndirect branches

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Intra-line Sequential Fetch

Traditional Line Buffers

Perform tag comparison early→ disable L1-IC on hit

May lengthen cycle time for fetch

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Intra-line Sequential Fetch

Traditional Line Buffers

Perform tag comparison early→ disable L1-IC on hit

May lengthen cycle time for fetch

Tagless Hit Line Buffer (TH-LB)

Keep track of previous cycle’s branch status/prediction

If not-taken and intra-line (not last inst), access TH-LB;otherwise fetch from the L1-IC and write into TH-LB

Eliminates the need for TH-LB tag comparison or storage

Identical hit and miss behavior as LB and cycle time is thesame as just L1-IC

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Guaranteeing Tagless Hits

A TH-LB makes guarantees about intra-line sequentialfetch behavior

Extend this principle to handle other regular fetches

Tagless Hit Instruction Cache (TH-IC)

Tag comparisons for most accesses

Recognize/capture regularity inherent to fetch

Employ more useful metadata to handle links betweeninstructions in the cache

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Tagless Hits

Fetching Sequential Lines

Next Sequential bit (NS) associated with each TH-IC lineSequential fetches that cross a line boundary examine NSto determine the next cycle’s fetch

NS is not set – fetch from L1-IC and set the NS bitNS is set – guaranteed hit in TH-IC (L1-IC, tag check)

When a line is replaced, we clear its NS bit as well as theprevious line’s NS bit

Essentially we now have a multiple line buffer

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Tagless Hits

Handling Direct Branches

Next Target bit (NT) associated with each TH-ICinstructionDirect branches that are predicted taken examine NT todetermine next cycle’s fetch

NT is not set – fetch from L1-IC and set the NT bitNT is set – guaranteed hit in TH-IC (L1-IC, tag check)

On line replacement, we will need to invalidate some ofthe NT bits contained in the TH-IC

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Tagless Hits

TH-IC Guaranteed Hit and False Miss Rates

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Tagless Hits

TH-IC Line and State Information

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1inst 1

inst 2inst 3inst 4

inst 5inst 6inst 7

inst 8. . .

0

0000

000

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 5inst 6inst 7

inst 8. . .

01

0000

000

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 6inst 6inst 7inst 7

inst 8. . .

01

0000

000

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 2inst 3inst 4

inst 5inst 6inst 7

inst 8. . .

01

0000

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 3inst 4inst 4

inst 5inst 6inst 7

inst 8. . .

01

0000

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 5inst 6inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 6inst 6inst 7inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 2inst 3inst 4

inst 5inst 6inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 3inst 4inst 4

inst 5inst 6inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 5inst 6inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hitsinst 5 hit

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 6inst 6inst 7inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hitsinst 5 hit

insts 6,7 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 2inst 3inst 3inst 4inst 4

inst 5inst 5inst 6inst 6inst 7inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hitsinst 5 hit

insts 6,7 hits

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 6inst 7

inst 8inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hitsinst 5 hit

insts 6,7 hitsinst 8 hit

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Example

Fetching a Loop

. . .

inst 1

inst 2inst 3inst 4

inst 5inst 6inst 7

inst 8. . .

01

00001

0001

00

11

NSNT

Line

1Li

ne2

Fetched Result Metadata Set L1/ITLB?

inst 1 miss set line 0 NS bit X

inst 5 miss set inst 1 NT bit X

insts 6,7 hitsinst 2 false miss set inst 7 NT bit X

insts 3,4 hitsinst 5 false miss set line 1 NS bit X

insts 6,7 hitsinst 2 hit

insts 3,4 hitsinst 5 hit

insts 6,7 hitsinst 8 hit

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Tag Comparisons in TH-IC

Guaranteed HitsTag comparison is completely unnecessary

ITLB access can also be skipped

Potential Misses

Access L1-IC and TH-ICDetermine if line is actually in TH-IC (false miss)

TH-IC is inclusive of L1-IC, so . . .

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Tag Comparisons in TH-IC

Guaranteed HitsTag comparison is completely unnecessary

ITLB access can also be skipped

Potential Misses

Access L1-IC and TH-ICDetermine if line is actually in TH-IC (false miss)

TH-IC is inclusive of L1-IC, so . . .Only need to check whether TH-IC line points to L1-IC lineto verify a false miss

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Reducing TH-IC Tag Size

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Reducing TH-IC Tag Size

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Reducing TH-IC Tag Size

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Metadata Invalidation

TH-IC line replacements can corrupt NT and NS links

NS: Clear previous line’s NS (easy)NT: Clear any NT that points to replaced line (harder)

Too much metadata – inefficient energy usage due to extratracking bitsToo little metadata – overly aggressive invalidation kicks outlinks that are still valid (i.e. point to other lines)

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Invalidation Policies

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Invalidation Policies

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Invalidation Policies

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Metadata Management

Invalidation Policies

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Configuration

SimpleScalar PISA with Wattch extensions forpower/energy modelingStrongARM-derived configuration (in-order, 1-issue, . . . )

L0-IC (128B – 512B)TH-IC (128B – 512B) x (TN, TT, TL, TI) + TH-LBSlides only show 256B (16x4) configurations

VPO optimized MiBench benchmarks

Benchmarks run to completion

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Total Processor Energy

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Fetch Efficiency Across Different Application Domains

Fetch Statistics

MiBench Average 176.gcc (SPECInt2k)L0_16x4 TL_16x4 L0_16x4 TL_16x4

Execution Cycles 106.05% 100.00% 104.10% 100.00%Total Energy 75.17% 68.58% 83.81% 79.03%Small Cache Hit Rate 87.63% 84.96% 77.86% 73.57%Fetch Power 43.81% 35.47% 63.72% 56.07%Energy-Delay Squared 84.57% 68.58% 90.82% 79.03%

TH-IC is beneficial even for applications with more diverseinstruction and data access behavior like 176.gcc

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Fetch Efficiency Across Different Application Domains

Fetch Statistics

MiBench Average 176.gcc (SPECInt2k)L0_16x4 TL_16x4 L0_16x4 TL_16x4

Execution Cycles 106.05% 100.00% 104.10% 100.00%Total Energy 75.17% 68.58% 83.81% 79.03%Small Cache Hit Rate 87.63% 84.96% 77.86% 73.57%Fetch Power 43.81% 35.47% 63.72% 56.07%Energy-Delay Squared 84.57% 68.58% 90.82% 79.03%

TH-IC is beneficial even for applications with more diverseinstruction and data access behavior like 176.gcc

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Fetch Efficiency Across Different Application Domains

Fetch Statistics

MiBench Average 176.gcc (SPECInt2k)L0_16x4 TL_16x4 L0_16x4 TL_16x4

Execution Cycles 106.05% 100.00% 104.10% 100.00%Total Energy 75.17% 68.58% 83.81% 79.03%Small Cache Hit Rate 87.63% 84.96% 77.86% 73.57%Fetch Power 43.81% 35.47% 63.72% 56.07%Energy-Delay Squared 84.57% 68.58% 90.82% 79.03%

TH-IC is beneficial even for applications with more diverseinstruction and data access behavior like 176.gcc

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Fetch Efficiency Across Different Application Domains

Fetch Statistics

MiBench Average 176.gcc (SPECInt2k)L0_16x4 TL_16x4 L0_16x4 TL_16x4

Execution Cycles 106.05% 100.00% 104.10% 100.00%Total Energy 75.17% 68.58% 83.81% 79.03%Small Cache Hit Rate 87.63% 84.96% 77.86% 73.57%Fetch Power 43.81% 35.47% 63.72% 56.07%Energy-Delay Squared 84.57% 68.58% 90.82% 79.03%

TH-IC is beneficial even for applications with more diverseinstruction and data access behavior like 176.gcc

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Related Work

L0/Filter Cache Prediction

Cannot eliminate performance penalty completely

Still using full size tags along with prediction metadata andthere are still tag checks for hits

Way Memoization

Utilizes NT/NS concept to avoid 64-way tag comparisons inL1-IC for many fetches

Large amount of metadata required and expensiveinvalidation mitigates much of the energy consumptionbenefit

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Conclusions

TH-IC simultaneously eliminates performance penalty ofsmall caches, and further reduces fetch energy

Eliminates majority of tag checks and ITLB accessesReduces size of tag/ID check on miss due to inclusion

Make guarantees, not predictions for regularly behavedpipeline features like instruction fetch

Suitable for high-performance computing due to energyreductions and lack of performance penalty

Ease of integration with nearly any processor design

Introduction Tagless Hit Instruction Cache (TH-IC) Experimental Evaluation Summary

Questions???

Backup Slides Graphs

Backup Slides Graphs

Execution Time

Backup Slides Graphs

Average Fetch Power

Backup Slides Graphs

Energy-Delay2

top related