intel i7 cache

3
Intel i7 Cache For those that follow processor architecture you will notice a brand new cache structure on the Core i7 . All Intel Core i7 processors feature L1, L2, and shared L3 caches. Before, Intel Core 2 Duo and Quad processors had just an L1 and L2 cache. The break down on the cache is as follows: there is a 64K L1 cache (32K Instruction, 32K Data) per core, 1MB of total L2 cache, and an impressive 8MB chunk of L3 cache that is shared across all the cores. That means that all Intel Core i7 processors have over 9MB of memory right there on the 45nm processor. The memory hierarchy of Conroe (the code name for many Intel processors sold as Core 2 Duo, Xeon, Pentium Dual-Core and Celeron) was extremely simple and Intel was able to concentrate on the performance of the sha red L2 cache, which was the best solution for an architecture that was aimed mostly at dual-core implementations. But with Nehalem, the engineers started from scratch and came to the same conclusions as their competitors: a shared L2 cache was not suited to a native quad-core architecture. The different cores can too frequently flush data needed by another core and that surel y would have involved too many  problems in terms of internal buses and arbitration to provide all four cores with sufficient  bandwidth while keeping latency sufficiently low. To solve the problem, the engineers  provided each core with a Level 2 cache of its own. Since it’s dedicated to a single core and relatively small (256 KB), the engineers were able to endow it with very high performance; latency, in particular, has reportedly improved significantly over Penryn   from 15 cycles to approximately 10 cycles. Then comes an enormous Level 3 cache memory (8 MB) for managing communications between cores. While at first glance Nehalem’s cache hierarchy reminds one of Barcelona, the operation of the Level 3 cache is very different from AMD’s—it’s inclusive of all lower levels of the cache hierarchy. That means that if a core tries to access a data item and it’s not present in the Level 3 cache, there’s no need to look in the other cores’ private caches  —the data item won’t be there either. Conversely, if the data are present, four bits associated with each line of the cache memory (one bit per core) show whether or not the data are potentially present (potentially, but not with certainty) in the lower-level cache of another core, and which one. This technique is effective for ensuring the cohere ncy of the private caches because it limits the need for exchanges between cores. It has the disadvantage of wasting part of the

Upload: husainibaharin

Post on 14-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

7/27/2019 Intel i7 Cache

http://slidepdf.com/reader/full/intel-i7-cache 1/3

Intel i7 Cache

For those that follow processor architecture you will notice a brand new cache

structure on the Core i7 . All Intel Core i7 processors feature L1, L2, and shared L3 caches.

Before, Intel Core 2 Duo and Quad processors had just an L1 and L2 cache. The break down

on the cache is as follows: there is a 64K L1 cache (32K Instruction, 32K Data) per core,

1MB of total L2 cache, and an impressive 8MB chunk of L3 cache that is shared across all

the cores. That means that all Intel Core i7 processors have over 9MB of memory right there

on the 45nm processor.

The memory hierarchy of Conroe (the code name for many Intel processors sold as

Core 2 Duo, Xeon, Pentium Dual-Core and Celeron) was extremely simple and Intel was

able to concentrate on the performance of the shared L2 cache, which was the best solution

for an architecture that was aimed mostly at dual-core implementations. But with Nehalem,

the engineers started from scratch and came to the same conclusions as their competitors: a

shared L2 cache was not suited to a native quad-core architecture. The different cores can too

frequently flush data needed by another core and that surely would have involved too many

 problems in terms of internal buses and arbitration to provide all four cores with sufficient

 bandwidth while keeping latency sufficiently low. To solve the problem, the engineers

 provided each core with a Level 2 cache of its own. Since it’s dedicated to a single core and

relatively small (256 KB), the engineers were able to endow it with very high performance;

latency, in particular, has reportedly improved significantly over Penryn — from 15 cycles to

approximately 10 cycles.

Then comes an enormous Level 3 cache memory (8 MB) for managing

communications between cores. While at first glance Nehalem’s cache hierarchy reminds one

of Barcelona, the operation of the Level 3 cache is very different from AMD’s—it’s inclusive

of all lower levels of the cache hierarchy. That means that if a core tries to access a data item

and it’s not present in the Level 3 cache, there’s no need to look in the other cores’ private

caches —the data item won’t be there either. Conversely, if the data are present, four bits

associated with each line of the cache memory (one bit per core) show whether or not the

data are potentially present (potentially, but not with certainty) in the lower-level cache of 

another core, and which one.

This technique is effective for ensuring the coherency of the private caches because itlimits the need for exchanges between cores. It has the disadvantage of wasting part of the

7/27/2019 Intel i7 Cache

http://slidepdf.com/reader/full/intel-i7-cache 2/3

cache memory with data that is already in other cache levels. That’s somewhat mitigated,

however, by the fact that the L1 and L2 caches are relatively small compared to the L3

cache — all the data in the L1 and L2 caches takes up a maximum of 1.25 MB out of the 8 MB

available. As on Barcelona, the Level 3 cache doesn’t operate at the same frequency as the

rest of the chip. Consequently, latency of access to this level is variable, but it should be in

the neighbourhood of 40 cycles.

The only real disappointment with Nehalem’s new cache hierarchy is its L1 cache.

The bandwidth of the instr uction cache hasn’t been increased—it’s still 16 bytes per cycle

compared to 32 on Barcelona. This could create a bottleneck in a server-oriented architecture

since 64-bit instructions are larger than 32-bit ones, especially since Nehalem has one more

decoder than Barcelona, which puts that much more pressure on the cache. As for the data

cache, its latency has increased to four cycles compared to three on the Conroe, facilitating

higher clock frequencies. To end on a positive note, though, the engineers at Intel have

increased the number of Level 1 data cache misses that the architecture can process in

 parallel.

Core i7 cache structure

7/27/2019 Intel i7 Cache

http://slidepdf.com/reader/full/intel-i7-cache 3/3

 INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

ECE 3244 COMPUTER ORGANIZATION

AND ARCHITECTURE

ASSIGNMENT 4

CACHE

Matric No: 1021513 Name : MUHAMMAD HUSAINI BIN BAHARIN

SECTION 1