cache physical implementation panayiotis charalambous xi research group panayiotis charalambous xi...

19
Cache Physical Implementation Panayiotis Charalambous Xi Research Group

Upload: augustus-riley

Post on 20-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Cache Physical ImplementationCache Physical Implementation

Panayiotis CharalambousXi Research Group

Panayiotis CharalambousXi Research Group

Page 2: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

ContentsContents

Cache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache

Cache Logical ViewPhysical ViewCase Study – Power 4 L2 Cache

Page 3: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Logical Cache StructureLogical Cache Structure

n-way associative cachen-way associative cachen-elements per set

2m Sets

Tag Index

Address (32 bits)

= =

DataHit

m

32 – m - k

Offset

k

or

Page 4: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Cache StructureCache Structure

Page 5: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Cache AccessCache Access

Steps1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output

Steps1. Decode address2. Enable the word line3. Raise the bit lines to high4. Get the tag value from the tag array5. Check for tag match6. Select data output

Page 6: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Conventional Cache Organization

Conventional Cache Organization

Memory Cell

Page 7: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Memory CellMemory Cell

bit' bit

Read: Set bit and bit´

high If the value in the

cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged

Write: Set bit´ to 0. This

forces 1 in the latch.

Read: Set bit and bit´

high If the value in the

cell is 1, then bit´ is discharged. It the value is 0, then bit is discharged

Write: Set bit´ to 0. This

forces 1 in the latch.

Page 8: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Decoder with DriverDecoder with Driver

Page 9: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Various ComponentsVarious Components

Comparator is xor logic Multiplexer hierarchy for offset. First get

block (from output drive), then word, then byte

Comparator is xor logic Multiplexer hierarchy for offset. First get

block (from output drive), then word, then byte

Output Driver Maximum of one

input bits high If input 0, then high

resistant output

Output Driver Maximum of one

input bits high If input 0, then high

resistant output

…I0 I1 I7

Page 10: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

BankingBanking

Idea: Support Multiple Cache Accesses

Solution: Use multiporting

on bit cells (Cost is big)

Divide the cache into independent banks

Idea: Support Multiple Cache Accesses

Solution: Use multiporting

on bit cells (Cost is big)

Divide the cache into independent banks

Page 11: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Cache SearchCache Search

Steps:1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the

cache (tag match)4. If all ok return data (block and byte

offset), else check lower level memory

Steps:1. Find Bank (bank index)2. Find Set in Bank (index)3. Check if data is valid and in the

cache (tag match)4. If all ok return data (block and byte

offset), else check lower level memory

Page 12: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Case Study - Power 4Case Study - Power 4

Dual Core 64-bit Processors

32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line

64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4

sectors x 32B) ~1.5MB L2 Cache

8-way set associative 128 Bytes line

Dual Core 64-bit Processors

32KB L1 D-Cache (Per Processor) 2-way associative 128 Bytes Line

64KB L1 I-Cache (Per Processor) Direct Mapped 128 Bytes Line (4

sectors x 32B) ~1.5MB L2 Cache

8-way set associative 128 Bytes line

Page 13: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Power4 FloorplanPower4 Floorplan

Page 14: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Power4 L2 Logical ViewPower4 L2 Logical View

Cache Split into 3 Parts, 0.5Mb each

Control by 4 Coherency Processors

1 64B Store Queue per Processor

Cache Split into 3 Parts, 0.5Mb each

Control by 4 Coherency Processors

1 64B Store Queue per Processor

Page 15: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Power4 L2UPower4 L2U

~512 KB 8 Banks 128 B block size 8-way associative

~512 KB 8 Banks 128 B block size 8-way associative

Word lines

Bit lines

Decoders

Address Bus

Page 16: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Power4Power4

L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:

Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets

L2 Cache Block Size C = 512 KB = 219 B Block Size = 128 B = 27 B 8-way associative 8 Banks per Cache Block Therefore:

Set Size is 23*27 B= 210 B Sets in Cache are 219/210 =29 sets Sets per Bank are 29 / 23 = 26 sets

tag index offset

bank index set index

64-bit

79

63

Page 17: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Power4: CACTI ResultsPower4: CACTI Resultscacti 524288 128 8 0.8um 8

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

cacti 524288 128 8 0.8um 8

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 8 Total Cache Size: 524288 Size in bytes of Subbank: 65536 Number of sets: 64 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.3473Cycle Time (wave pipelined) (ns): 4.97337Total Power all Banks (nJ): 418.337Total Power Without Routing (nJ): 198.563Total Routing Power (nJ): 219.774Maximum Bank Power (nJ): 63.5175

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

cacti 524288 128 8 0.8um 16

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

cacti 524288 128 8 0.8um 16

---------- CACTI version 3.2 ----------

Cache Parameters: Number of Subbanks: 16 Total Cache Size: 524288 Size in bytes of Subbank: 32768 Number of sets: 32 Associativity: 8 Block Size (bytes): 128 Read/Write Ports: 1 Read Ports: 0 Write Ports: 0 Technology Size: 0.80um Vdd: 4.5V

Access Time (ns): 12.434Cycle Time (wave pipelined) (ns): 4.85483Total Power all Banks (nJ): 793.381Total Power Without Routing (nJ): 341.424Total Routing Power (nJ): 451.957Maximum Bank Power (nJ): 63.1382

Best Ndwl (L1): 16Best Ndbl (L1): 1Best Nspd (L1): 1Best Ntwl (L1): 1Best Ntbl (L1): 1Best Ntspd (L1): 1Nor inputs (data): 2Nor inputs (tag): 2

Page 18: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

CACTICACTI

Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line

(sectors) Tag Array

Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line

(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the

increase of sense amplifiers Increase of Ndwl and Ntwl increases the

number of word line drivers

Data Array Ndwl: World line split factor Ndbl: Bit line split factor Nspd: Number of sets mapped to a single word line

(sectors) Tag Array

Ntwl: World line split factor Ntbl: Bit line split factor Nspt: Number of sets mapped to a single word line

(sectors) Increase of Ndbl, Nspd, Ntbl, Nspt requires the

increase of sense amplifiers Increase of Ndwl and Ntwl increases the

number of word line drivers

Page 19: Cache Physical Implementation Panayiotis Charalambous Xi Research Group Panayiotis Charalambous Xi Research Group

Thank YouThank You