william stallings computer organization and architecture
DESCRIPTION
William Stallings Computer Organization and Architecture. Chapter 4 Internal Memory. The four-level memory hierarchy. ♦ Computer memory is organized into a hierarchy. - PowerPoint PPT PresentationTRANSCRIPT
William Stallings William Stallings Computer Organization Computer Organization and Architectureand Architecture
Chapter 4Chapter 4Internal MemoryInternal Memory
♦Computer memory is organized into a hierarchy.
♦Decreasing cost/bit, increasing capacity, slower access time, and decreasing frequency of access of the memory by the processor
♦The cache automatically retains a copy of some of the recently used words from the DRAM.
The four-level memory hierarchyThe four-level memory hierarchy
Memory Hierarchy
Registers In CPU
Internal or Main memory May include one or more levels of cache “RAM”
External memory Backing store
4.1 COMPUTER MEMORY SYSTEM OVERVIEW
Characteristics of Memory SystemsCharacteristics of Memory Systems LocationLocation CapacityCapacity Unit of transferUnit of transfer Access methodAccess method PerformancePerformance Physical typePhysical type Physical characteristicsPhysical characteristics OrganisationOrganisation
LocationLocation
The term location refers to whether memory is internal or external to the computer.
CPU The processor requires its own local memory , in
the form of registersregisters.
Internal Main memory, cacheMain memory, cache
External Peripheral storage devices, such as disk and tape
CapacityCapacity
Internal memory capacity typically expressed in terms of bytesbytes(1byte=8bits)or wordswords.
External memory capacity expressed in bytes. WordWord
The natural unit of organisation Word length usually 8, 16 and 32 bits The size of the wordThe size of the word is typically equal to the
number of bits used to represent a number and to the instruction length. Unfortunately, there are many exceptions.
Unit of TransferUnit of Transfer
Internal Usually governed by data bus widthdata bus width
External Usually a blockblock which is much larger than a word
Addressable unit Smallest location which can be uniquely addressed At the word level or byte level In any case,
22AA=N, A is the length in bits of an address =N, A is the length in bits of an address
N is the number of addressable N is the number of addressable unitsunits
Access Methods (1)Access Methods (1) SequentialSequential access access
Start at the beginning and read through in order Access time depends on location of data and previous
locationvariablevariable
e.g. tapee.g. tape
DirectDirect access access Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location
variablevariable
e.g. diske.g. disk
Access Methods (2)Access Methods (2)
RandomRandom Individual addresses identify locations exactly Access time is independent of location or previous
access and is constantconstant e.g. RAMe.g. RAM
AssociativeAssociative Data is located by a comparison with contentscontents of a
portion of the store Access time is independent of location or previous
access and is constant e.g. cachee.g. cache
PerformancePerformance Parameters Parameters
Access timeAccess time For random-access memoryFor random-access memory
the time it takes to perform a read or write operation.the time it takes to perform a read or write operation. Time between presenting the address Time between presenting the address to the memory to the memory and and
getting the valid datagetting the valid data For non-random-access memoryFor non-random-access memory
The time it takes to position the read-write mechanism at the The time it takes to position the read-write mechanism at the desired location.desired location.
Memory Cycle timeMemory Cycle time Cycle time is accessaccess time plus time plus additional timeadditional time Time may be required for the memory to “recover” before
next access
Transfer RateTransfer Rate Rate at which data can be moved
Physical TypesPhysical Types
SemiconductorSemiconductor RAM
MagneticMagnetic Disk & Tape
OpticalOptical CD (Compact Disk) & DVD (Digital Video Disk)
Others Bubble Hologram
Physical CharacteristicsPhysical Characteristics
DecayDecay VolatilityVolatility
In a volatile memory, information decays naturally or is lost when electrical power is switched off.
In a nonvolatile memory, no electrical power is needed to retain information, e.g. magnetic-surface memory.
ErasableErasable Power consumptionPower consumption
OrganisationOrganisation
Organisation means pphysical hysical arrangement of bits into wordsarrangement of bits into words
Obvious arrangement not always used
Memory Hierarchy
Registers In CPU
Internal or Main memory May include one or more levels
of cache “RAM”
External memory Backing store
The Bottom Line
The design constraints on a computer’s memory:
How much? Capacity
How fast? Time is money
How expensive?
A trade-ff among the three key characteristics of memory: cost, capacity, and access time.
Hierarchy List
RegistersL1 CacheL2 CacheMain memoryDisk cacheDiskOpticalTape
Hierarchy List
Across this spectrum of technologies: Faster access time, greater cost per
bit Greater capacity, smaller cost per
bit Greater capacity, slower access
time
From top to down: Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of
the memory by the processor
So you want fast?
It is possible to build a computer which uses only static RAM (see later)
This would be very fastThis would need no cache
How can you cache cache?
This would cost a very large amount
Locality of Reference
During the course of the execution of a program, memory references tend to clustercluster e.g. loops and subroutines
Main memory is usually extended with a higher-speed, smaller cachecache. It is a device for stagingstaging the movement of data between main memory and processor registers to improve performance.
External memoryExternal memory, called Secondary or auxiliary Secondary or auxiliary memorymemory are used to store program and data files and visible to the programmer only in terms of files and records.
4.2 Semiconductor Main Memory
Table 4.2 Semiconductor Memory TypesTable 4.2 Semiconductor Memory Types
Types of Random-Access Semiconductor Memory
RAM RAM Misnamed as all semiconductor memory is
random accessrandom access, because all of the types listed in the table are random access.
Read/WriteRead/Write VolatileVolatile
A RAM must be provided with a constant power supply.
Temporary storage Static or dynamic
Dynamic RAM (DRAM)
Bits stored as chargecharge in capacitorsCharges leakNeed refreshing even when poweredSimpler constructionSmaller per bitLess expensiveNeed refresh circuitsrefresh circuitsSlower Main memoryMain memory
Static RAM (SRAM)
Bits stored as on/off switchesswitchesNo charges to leakNo refreshing needed when poweredMore complex constructionLarger per bitMore expensiveDoes not need refresh circuitsFaster CacheCache
Read Only Memory (ROMROM)
Permanent storageApplications
Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables
Types of ROMWritten during manufacture
Very expensive for small runs
Programmable (once) PROM Needs special equipment to program
Read “mostly” Erasable Programmable (EPROM)
Erased by UV
Electrically Erasable (EEPROM)Takes much longer to write than read
Flash memoryIt is intermediate between EPROM and EEPROM in both cost and functionality.Erase whole memory electrically or erase blocks of memory
OrganisationOrganisation in detail
Memory cell The basic element of a semiconductor memory Two stable states being written into to set the state, or being read to sense the state
Chip LogicChip Logic One extreme organization : the physical arrangement of
cells in the array is the same as the logical arrangement. The array is organized into W words of B bits each.The array is organized into W words of B bits each. e.g. A 16Mbit chip can be organised as 1M 16-bit words
One-bit-per-chipOne-bit-per-chip in which data is read/written one bit at a time A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each
word in chip 1 and so on
Typical organization of a 16-Mbit DRAM A 16Mbit chip can be organised as a 2048 x 2048
x 4bit array Reduces number of address pins
Multiplex row address and column address11 pins to address (211=2048)An additional 11 address lines select one of 2048
columns of 4bits per column. Four data lines are for the input and output of 4 bits to and from a data buffer. On write, the bit driver of each bit line is activated for a 1 or 0 according to the value of the corresponding data line. On read, the value of each bit line selects which row of cells is used for reading or writing.
Adding one more pin devoted to addressing doubles the number of rows and columns, and so the size of the chip memory grows by a factor 4.
Chip LogicChip Logic
Typical 16 Mb DRAM (4M x 4)
Refreshing
Refresh circuit included on chipDisable chipCount through rowsrowsRead & Write backTakes timeSlows down apparent performance
Chip Packaging
EPROM packageEPROM package , which is a one-word-per-chip, 8-Mbit chip organized as 1M×8
•The address of the word being accessed . For 1M words, a total of 20 pins (220=1M) are needed.
•D0~D7
•The power supply to the chip (VCC)
•A ground pin (Vss)
•A chip enable (CE) pin: the CE pin is used to indicate whether or not the address is valid for this chip.
•A program voltage (Vpp)
DRAM packageDRAM package, 16-Mbit chip organized as 4M×4
RAM chip can be updated, the data pins are input/output different from ROM chip
•Write Enable pin (WE)
•Output Enable pin (OE)
•Row Address Select (RAS)
•Column Address Select (CAS)
Module Organisation
·If a RAM chip contain only 1bit per word1bit per word, clearly a a number of chips equal to the number of chips equal to the number of bits per wordsnumber of bits per words are needed.
e.g. How a memory module e.g. How a memory module consisting of 256K 8-bit consisting of 256K 8-bit words could be organized?words could be organized?
256K=218, an 18-bit address needed;
The address is presented to
8 256K×1-bit chips, each of which provides the input/output of 1 bit. Figure 4.6 256kbyte memory Organization
Module Organisation (2)
Figure 4.7 1-Mbyte Memory Organization
(1M×8bit/256K×8bit)=4=22
As show in figure 4.7, 1M word by 8bits per word is organized as four columns of chips, each column containing 256K words arranged as in Figure 4.6.
1M=220
For 1M word, 20 address lines are needed.The 18 least significant bits are routed to all 32 modules.The high-order 2 bits are input to a group select logic module that sends a chip enable chip enable signalsignal to one of the four columns of modules.
Error Correction
Hard Failure Permanent defect
Soft Error Random, non-destructive No permanent damage to memory
Detected using Hamming error correcting Hamming error correcting codecode
Error Correcting Code Function
•A function f, is performed on the data to produce a code.
•When the previously stored word is read out, the code is used to detect and possible correct errors.
•A new set of K code bits is generated from the M data bits and compared with the fetched code bits.
Figure 4.9 Hamming Error-Correcting CodeFigure 4.9 Hamming Error-Correcting Code
Even Parity bits
Figure 4.9 uses Venn diagramsVenn diagrams to illustrate the use of Hamming code on 4-bit words (M=4). With three intersection circles, there are seven compartments. We assign the 4 data bits to the inner compartments. The remaining compartments are filled with parity parity bitsbits. Each parity bit is chosen so that the total number of 1s in its circle is eveneven.
The comparison logic receives as input two k-bit values. A bit-by-bit comparison is done by taking the exclusive-or exclusive-or of the two inputs. The results is called the syndrome wordsyndrome word.. The syndrome word is therefore K bits wide and has a range
between 0 and 2K-1. The value 0 indicates that no error was The value 0 indicates that no error was detected. Leaving 2detected. Leaving 2KK-1 values to indicate, if there is an error, -1 values to indicate, if there is an error, which bit was in error which bit was in error (the numerical value of the syndrome (the numerical value of the syndrome indicates the position of the data bit in error).indicates the position of the data bit in error).
An error could occur on any of the M data bitsM data bits or K check bits K check bits so, 22KK-1-1≥≥M+KM+K
(This equation gives the number of bits needed to correct a single bit error in a word containing M data bits.)
Figure 4.8 Error-Correcting Code
Figure 4.10 Layout of Data bits and Check bits
Those bit positions whose position number are powers of 2powers of 2 are designated as check bits.
Each check bit operates on every data bit position whose position number contains a 1 in the corresponding column position.
Bit position n is checked by those bits Ci such that ∑i=n.
C8 C4 C2 C1C8 C4 C2 C1
The check bits are calculated as follows, where the symbol designates the exclusive-or operation:
Assume that the 8-bit input words is 0011100100111001, with data bit M1 in the right-most position. The calculations are as follows:
Suppose the data bit 3 sustains an error and is changed from 0 to 1.
When the new check bits are compared with the old check bits, the syndrome word is formed:
The result is 0110, indicating that bit position 6, which contains data bit 3, in error.
Figure 4.11 Check Bit Degeneration
a single-error-correction (SEC) codea single-error-correction (SEC) code
More commonly, semiconductor memory is equipped with a a single-error-correcting double-error-detecting (SEC-DED) codesingle-error-correcting double-error-detecting (SEC-DED) code. An error-correction code enhances the reliability of the memory at the cost of added complexity.
Table 4.3 Increase in Word Length with Error CorrectionTable 4.3 Increase in Word Length with Error Correction
11
Figure 4.12 Hamming SEC-DEC Code
The sequence show that if two errors occurtwo errors occur (Figure 4.12 c), the checking procedure goes astray (d) and worsens the problem by creating a third error (e). To overcome the problem, an eighth bit is added that is set so that the total number of 1s in the diagram is even.
4.3 CASHE MEMORY
Small amount of fast memorySits between normal main memory and
CPUMay be located on CPU chip or module
Cache operation - overview Figure 4.14 Cache/Main-Memory Structure (P118)Figure 4.14 Cache/Main-Memory Structure (P118) Cache includes tags to identify which block of main memory is in each
cache slot. The tag is usually a portion of the main memory address.
TagTag BlockBlockLine Line NumberNumber
00
11
22
C-1C-1
Block length (k words)Block length (k words)(a) Cache(a) Cache
MemoryMemory
addressaddress 0
1
2
3
2n-1
Block (K words)
Block
Word LengthWord Length(b) Main Memory(b) Main Memory
Figure 4.15 Cache Read Operation (P119)Figure 4.15 Cache Read Operation (P119)• CPU requests contents of memory location• Check cache for this data• If present, get from cache (fast)• If not present, read required block from main memory to cache• Then deliver from cache to CPU
Typical Cache Organization
Figure 4.16 Typical Cache Organization
•In this organization, the cache connects to the processor via data, control, and address linesdata, control, and address lines.
•The data and address lines attach to data and address data and address buffersbuffers, which attach to a system bussystem bus from which main memory is reached.
•When a cache hita cache hit occurs, the data and address buffers are disabled and communication is only between processor and cache, with no system bus no system bus traffictraffic
•When a cache misscache miss occurs, the desired address is loaded onto the system bus and the data are returned through a data buffer a data buffer to both the cache and main to both the cache and main memory.memory.
Elements of Cache Design
SizeSize Mapping FunctionMapping Function
Direct Associative Set Associative
Replacement AlgorithmReplacement Algorithm Least recently used (LRU) First in first out (FIFO) Least frequently used (LFU) Random
Write PolicyWrite Policy Write through Write back Write once
Block SizeBlock Size Number of CachesNumber of Caches
Single or two level Unified or split
Cache Cache SizeSize
A trade-off between cost per bit and access time
Cost More cache is expensive
Speed More cache is faster (up to a point) Checking cache for data takes time
“Optimum” cache sizes Suggested : between 1K and 512K words.
Mapping FunctionMapping Function
Three techniques direct, associative, and set associativedirect, associative, and set associative
Elements of the example Cache of 64kByte Cache block of 4 bytes
Data is transferred between memory and the cache in blocksblocks of 4 bytes each.
i.e. cache is 16k (214) lines of 4 bytes
16MBytes main memory24 bit address (224=16M)Main memory (4M blocks of 4 bytes each)
Direct Mapping
Each block of main memory maps to only Each block of main memory maps to only one cache lineone cache line i.e. if a block is in cache, it must be in one specific
place
Address is in two partsLeast Significant ww bits identify unique word or
byte within a block of main memory.Most Significant ss bits specify one memory blockThe MSBs are split into a cache line field r r and a
tag of s-rs-r (most significant)The line field of rr identifies one of the m=2r lines
of the cache
Direct Mapping Cache Line Table
Cache lineCache line Main Main Memory blocks Memory blocks assignedassigned
0 0, m, 2m, … 2s-m
1 1, m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1, 3m-1… 2s-1
The mapping is expressed as: i= j modulo mi= j modulo mwhere i =cache line number j = main memory block number m = number of lines in the cache
Every row has the same cache Every row has the same cache line number; Every column has line number; Every column has
the same tag number.the same tag number.
No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field!!
Direct Mapping Cache Organization
The r-bit r-bit line number is used as an index into the cache to access a particular lineline.
If the (s-r) bit(s-r) bit tag number matches the tag numbertag number currently stored in that line, then the w-bitw-bit word number is used to select one of the 2w bytes in that line.
Otherwise, the s s bits tag-plus-line field is used to fetch a block from main memory.
Direct MappingAddress Structure
Tag s-rs-r Line or Slot rr Word ww
8 14 2
24 bit address w =2 bit word identifier (4 byte block) s=22 bit block identifier
8 bit tag (=22-14) 14 bit slot or line
No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tagfinding line and checking Tag
Direct Mapping Example
Main Memory AddressMain Memory Address
• The cache is organized as 16K=214 lines of 4 bytes each.
• The main memory consists of 16Mbytes, organized as 4M blocks of 4 bytes each.
• i= j modulo m
i = cache line number
j = main memory block number
m = number of lines in the cache
• Note that no two blocks that map into the same line numberline number have the same tag numbertag number.
Direct Mapping pros & cons
Advantages Simple Inexpensive
Disadvantages Fixed location for given block
If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into any A main memory block can load into any line of cacheline of cache
Memory address is interpreted as a a tag and tag and a a wordword field. field.
Tag Tag uniquely identifies block of memoryEvery line’s tag is examined for a matchDisadvantages of associative mapping
Cache searching gets expensive Complex circuitry required to examine the tags of
all caches in parallelin parallel.
Fully Associative Cache Organization
Tag 22 bitWord2 bit
Associative MappingAddress Structure
22 bit tag stored with each 32 bit (4B) block of dataCompare tag field with tag entry in cache to check
for hitLeast significant 2 bits of address identify which 16-
bit word is required from 32 bit data blocke.g.
Address Tag Data Cache line
16339C 058CE7 FEDCBA98 0001
Associative Mapping Example
Main Memory Address
Set Associative Mapping
Cache is divided into a number of setssetsEach set contains a number of lineslinesA given block maps to any line in a any line in a
given setgiven set e.g. Block B can be in any line of set i
e.g. 2 lines per set 2 way associative mapping2 way associative mapping A given block can be in one of 2 lines in only
one set
Set Associative Mapping
In this case , the cache is divided into vv sets, each of which consists of k k lines.
The relationships are m = v × km = v × k
i = j modulo vi = j modulo vwhere
i=cache set number j=main memory block number
m=number of lines in the cache
This is referred to as k-way set associative k-way set associative mappingmapping.
Two Way Set Associative Cache Organization
The dd set bits set bits specify one of v=2 v=2dd setssets. The s s bits of the tag and set fields specify one of the 2one of the 2ss blocks blocks of main memory.
With K-wayK-way set associative mapping, the tagthe tag in a memory address is much smaller and is only compared to the k tags compared to the k tags within a single setwithin a single set.
Set Associative MappingExample
13 bit set numberBlock number in main memory is modulo
213 000000, 00A000, 00B000, 00C000 … map
to same set
Set Associative MappingAddress Structure
Use set field to determine cache setcache set to look in Tag+Set field Tag+Set field specifies one of the blocks in the
main memory.Compare tag fieldtag field to see if we have a hite.g
Address Tag Data Set number
1FF 7FFC 1FF 24682468 1FFF
Tag 9 bit Set 13 bitWord2 bit
Two Way Set Associative Mapping Example
e.gAddress Tag Data Set number1FF 7FFC 1FF 24682468 1FFF
02C 0004 02C 11235813 0001
Main Memory AddressMain Memory Address
Replacement Algorithms (1)Direct mappingDirect mapping
When a new block is brought into the cache, one of the existing blocks must be replaced.
Direct mappingDirect mappingNo choiceEach block only maps to one lineReplace that line
Replacement Algorithms (2)Associative & Set AssociativeAssociative & Set Associative
Hardware implemented algorithmHardware implemented algorithm (speed)Least Recently used (LRU)
Replace that block in the set which has been in the cache longest with no reference to it. (hit ratio + time)
e.g. in 2 way set associativeWhich of the 2 block is LRU?
First in first out (FIFO) replace block in the set that has been in cache longest. (time)
Least frequently used replace block in the set which has had fewest hits. (hit ratio)
Random
Write Policy
Must not overwrite a cache block unless main memory is up to date
Problems to contend with More than one device may have access to main
memory.Data inconsistent between memory and cache
Multiple CPUs may have individual cachesData inconsistent among caches
Write PolicyWrite Policy Write through Write back Write once
Write throughWrite through
All writes go to main memory as well as cache
Any other processor-cache can monitor main memory traffic to keep local (to CPU) cache updated.
Disadvantages Lots of traffic Slows down writes
Write backWrite back
Updates initially made in cache only Update bitUpdate bit for cache slot is set when update
occurs If block in cache is to be replaced, write If block in cache is to be replaced, write
to main memory only if update bit is setto main memory only if update bit is setOther caches get out of syncI/O must access main memory through cache
Because portions of main memory are invalid
Approaches to cache coherency
Bus watching with write through Each cache controller monitors the address lines to detect
write operations to memory by other bus masters. This strategy depends on the use of a write-through
policy by all cache controller. Hardware transparency
Additional hardware is used to ensure that all the updates to main memory via cache are reflected in all caches.
Noncachable memory Only a portion of main memory is shared by more than
one processor. In such a system, all accesses to shared memory are
cache misses, because the shared memory is never copied to the cache.
The noncachable memory can be identified using chip-select logic or high-access bits.
Line Size
The principle of locality Data in the vicinity of a referenced word is likely
to be referenced in the near future.
The relationship between block size and hit The relationship between block size and hit ratioratio is complex, depending on the locality characteristics of a particular program, and no definitive optimum value has been found.
A size of from two to eight wordstwo to eight words seems reasonably close to optimum.
Number of caches
A single cacheMultiple caches
The number of levels of caches The use of unified versus split caches
Split cachesSplit caches: one dedicated to instructions and one dedicated to data
• Key advantage of split caches: eliminate contention for cache between the instruction processor and the execution unit.
Unified cacheUnified cache: a single cache used to store references to both data and instructions
For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically.
Number of caches
The on-chip cache: cache and processor on the same chip When the requested instruction or data is found in the on-chip
cache, the bus access is eliminated. Because of the short data paths internal to the processor, on-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycles.
Advantages Reduce the processor’s external bus activity Speed up execution times Increase overall system performance
A two-level cache The internal cache designated as level 1 (L1) The external cache designated as level 2 (L2)
4.4 Pentium Cache
Foreground readingFind out detail of Pentium II cache
systemsNOT just from Stallings!
4.5 Newer RAM Technology (1)
Basic DRAM same since first RAM chips ConstraintsConstraints of the traditional DRAM chip: its internal architecture and its interface to the
processor’s memory bus.Enhanced DRAM
Contains small SRAM as well SRAM holds last line read A comparator stores the 11-bit value of the most
recent row address selection.Cache DRAM (CDRAM)
Larger SRAM component Use as cache or serial buffer
Newer RAM Technology (2)
Synchronous DRAM (SDRAM) Access is synchronized with an external clock
unlike DRAM asynchronous. Address is presented to RAM Since SDRAM moves data in time with system
clock, CPU knows when data will be ready CPU does not have to wait, it can do something
else Burst mode allows SDRAM to set up stream of
data and fire it out in block
Internal logic of the SDRAM
• In burst mode, a series of data bits can be clocked out rapidly after the first bit has been accessed.
Burst modeBurst mode is useful when all the bits to be accessed are in sequence and in the same row of the array as the initial access
•A dual-bank internal A dual-bank internal architecturearchitecture that improves opportunities for on-chip parallelism.
• The mode registerThe mode register and associated control logiccontrol logic provide a mechanism to customize the SDRAM to suit specific system needs.
Newer RAM Technology (3)
Foreground readingCheck out any other RAM you can findSee Web site:
The RAM Guide
Exercises Exercises
P143 4.4, 4.6, 4.7, 4.8 P145 4.20
DeadlineDeadline