william stallings computer organization and architecture

William Stallings William Stallings Computer Organization Computer Organization and Architectureand Architecture

Chapter 4Chapter 4Internal MemoryInternal Memory

♦Computer memory is organized into a hierarchy.

♦Decreasing cost/bit, increasing capacity, slower access time, and decreasing frequency of access of the memory by the processor

♦The cache automatically retains a copy of some of the recently used words from the DRAM.

The four-level memory hierarchyThe four-level memory hierarchy

Memory Hierarchy

Registers In CPU

Internal or Main memory May include one or more levels of cache “RAM”

External memory Backing store

4.1 COMPUTER MEMORY SYSTEM OVERVIEW

Characteristics of Memory SystemsCharacteristics of Memory Systems LocationLocation CapacityCapacity Unit of transferUnit of transfer Access methodAccess method PerformancePerformance Physical typePhysical type Physical characteristicsPhysical characteristics OrganisationOrganisation

LocationLocation

The term location refers to whether memory is internal or external to the computer.

CPU The processor requires its own local memory , in

the form of registersregisters.

Internal Main memory, cacheMain memory, cache

External Peripheral storage devices, such as disk and tape

CapacityCapacity

Internal memory capacity typically expressed in terms of bytesbytes(1byte=8bits)or wordswords.

External memory capacity expressed in bytes. WordWord

The natural unit of organisation Word length usually 8, 16 and 32 bits The size of the wordThe size of the word is typically equal to the

number of bits used to represent a number and to the instruction length. Unfortunately, there are many exceptions.

Unit of TransferUnit of Transfer

Internal Usually governed by data bus widthdata bus width

External Usually a blockblock which is much larger than a word

Addressable unit Smallest location which can be uniquely addressed At the word level or byte level In any case,

22AA=N, A is the length in bits of an address =N, A is the length in bits of an address

N is the number of addressable N is the number of addressable unitsunits

Access Methods (1)Access Methods (1) SequentialSequential access access

Start at the beginning and read through in order Access time depends on location of data and previous

locationvariablevariable

e.g. tapee.g. tape

DirectDirect access access Individual blocks have unique address Access is by jumping to vicinity plus sequential search Access time depends on location and previous location

variablevariable

e.g. diske.g. disk

Access Methods (2)Access Methods (2)

RandomRandom Individual addresses identify locations exactly Access time is independent of location or previous

access and is constantconstant e.g. RAMe.g. RAM

AssociativeAssociative Data is located by a comparison with contentscontents of a

portion of the store Access time is independent of location or previous

access and is constant e.g. cachee.g. cache

PerformancePerformance Parameters Parameters

Access timeAccess time For random-access memoryFor random-access memory

the time it takes to perform a read or write operation.the time it takes to perform a read or write operation. Time between presenting the address Time between presenting the address to the memory to the memory and and

getting the valid datagetting the valid data For non-random-access memoryFor non-random-access memory

The time it takes to position the read-write mechanism at the The time it takes to position the read-write mechanism at the desired location.desired location.

Memory Cycle timeMemory Cycle time Cycle time is accessaccess time plus time plus additional timeadditional time Time may be required for the memory to “recover” before

next access

Transfer RateTransfer Rate Rate at which data can be moved

Physical TypesPhysical Types

SemiconductorSemiconductor RAM

MagneticMagnetic Disk & Tape

OpticalOptical CD (Compact Disk) & DVD (Digital Video Disk)

Others Bubble Hologram

Physical CharacteristicsPhysical Characteristics

DecayDecay VolatilityVolatility

In a volatile memory, information decays naturally or is lost when electrical power is switched off.

In a nonvolatile memory, no electrical power is needed to retain information, e.g. magnetic-surface memory.

ErasableErasable Power consumptionPower consumption

OrganisationOrganisation

Organisation means pphysical hysical arrangement of bits into wordsarrangement of bits into words

Obvious arrangement not always used

Memory Hierarchy

Registers In CPU

Internal or Main memory May include one or more levels

of cache “RAM”

External memory Backing store

The Bottom Line

The design constraints on a computer’s memory:

How much? Capacity

How fast? Time is money

How expensive?

A trade-ff among the three key characteristics of memory: cost, capacity, and access time.

Hierarchy List

RegistersL1 CacheL2 CacheMain memoryDisk cacheDiskOpticalTape

Hierarchy List

Across this spectrum of technologies: Faster access time, greater cost per

bit Greater capacity, smaller cost per

bit Greater capacity, slower access

time

From top to down: Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access of

the memory by the processor

So you want fast?

It is possible to build a computer which uses only static RAM (see later)

This would be very fastThis would need no cache

How can you cache cache?

This would cost a very large amount

Locality of Reference

During the course of the execution of a program, memory references tend to clustercluster e.g. loops and subroutines

Main memory is usually extended with a higher-speed, smaller cachecache. It is a device for stagingstaging the movement of data between main memory and processor registers to improve performance.

External memoryExternal memory, called Secondary or auxiliary Secondary or auxiliary memorymemory are used to store program and data files and visible to the programmer only in terms of files and records.

4.2 Semiconductor Main Memory

Table 4.2 Semiconductor Memory TypesTable 4.2 Semiconductor Memory Types

Types of Random-Access Semiconductor Memory

RAM RAM Misnamed as all semiconductor memory is

random accessrandom access, because all of the types listed in the table are random access.

Read/WriteRead/Write VolatileVolatile

A RAM must be provided with a constant power supply.

Temporary storage Static or dynamic

Dynamic RAM (DRAM)

Bits stored as chargecharge in capacitorsCharges leakNeed refreshing even when poweredSimpler constructionSmaller per bitLess expensiveNeed refresh circuitsrefresh circuitsSlower Main memoryMain memory

Static RAM (SRAM)

Bits stored as on/off switchesswitchesNo charges to leakNo refreshing needed when poweredMore complex constructionLarger per bitMore expensiveDoes not need refresh circuitsFaster CacheCache

Read Only Memory (ROMROM)

Permanent storageApplications

Microprogramming (see later) Library subroutines Systems programs (BIOS) Function tables

Types of ROMWritten during manufacture

Very expensive for small runs

Programmable (once) PROM Needs special equipment to program

Read “mostly” Erasable Programmable (EPROM)

Erased by UV

Electrically Erasable (EEPROM)Takes much longer to write than read

Flash memoryIt is intermediate between EPROM and EEPROM in both cost and functionality.Erase whole memory electrically or erase blocks of memory

OrganisationOrganisation in detail

Memory cell The basic element of a semiconductor memory Two stable states being written into to set the state, or being read to sense the state

Chip LogicChip Logic One extreme organization : the physical arrangement of

cells in the array is the same as the logical arrangement. The array is organized into W words of B bits each.The array is organized into W words of B bits each. e.g. A 16Mbit chip can be organised as 1M 16-bit words

One-bit-per-chipOne-bit-per-chip in which data is read/written one bit at a time A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each

word in chip 1 and so on

Typical organization of a 16-Mbit DRAM A 16Mbit chip can be organised as a 2048 x 2048

x 4bit array Reduces number of address pins

Multiplex row address and column address11 pins to address (211=2048)An additional 11 address lines select one of 2048

columns of 4bits per column. Four data lines are for the input and output of 4 bits to and from a data buffer. On write, the bit driver of each bit line is activated for a 1 or 0 according to the value of the corresponding data line. On read, the value of each bit line selects which row of cells is used for reading or writing.

Adding one more pin devoted to addressing doubles the number of rows and columns, and so the size of the chip memory grows by a factor 4.

Chip LogicChip Logic

Typical 16 Mb DRAM (4M x 4)

Refreshing

Refresh circuit included on chipDisable chipCount through rowsrowsRead & Write backTakes timeSlows down apparent performance

Chip Packaging

EPROM packageEPROM package , which is a one-word-per-chip, 8-Mbit chip organized as 1M×8

•The address of the word being accessed . For 1M words, a total of 20 pins (220=1M) are needed.

•D0~D7

•The power supply to the chip (VCC)

•A ground pin (Vss)

•A chip enable (CE) pin: the CE pin is used to indicate whether or not the address is valid for this chip.

•A program voltage (Vpp)

DRAM packageDRAM package, 16-Mbit chip organized as 4M×4

RAM chip can be updated, the data pins are input/output different from ROM chip

•Write Enable pin (WE)

•Output Enable pin (OE)

•Row Address Select (RAS)

•Column Address Select (CAS)

Module Organisation

·If a RAM chip contain only 1bit per word1bit per word, clearly a a number of chips equal to the number of chips equal to the number of bits per wordsnumber of bits per words are needed.

e.g. How a memory module e.g. How a memory module consisting of 256K 8-bit consisting of 256K 8-bit words could be organized?words could be organized?

256K=218, an 18-bit address needed;

The address is presented to

8 256K×1-bit chips, each of which provides the input/output of 1 bit. Figure 4.6 256kbyte memory Organization

Module Organisation (2)

Figure 4.7 1-Mbyte Memory Organization

(1M×8bit/256K×8bit)=4=22

As show in figure 4.7, 1M word by 8bits per word is organized as four columns of chips, each column containing 256K words arranged as in Figure 4.6.

1M=220

For 1M word, 20 address lines are needed.The 18 least significant bits are routed to all 32 modules.The high-order 2 bits are input to a group select logic module that sends a chip enable chip enable signalsignal to one of the four columns of modules.

Error Correction

Hard Failure Permanent defect

Soft Error Random, non-destructive No permanent damage to memory

Detected using Hamming error correcting Hamming error correcting codecode

Error Correcting Code Function

•A function f, is performed on the data to produce a code.

•When the previously stored word is read out, the code is used to detect and possible correct errors.

•A new set of K code bits is generated from the M data bits and compared with the fetched code bits.

Figure 4.9 Hamming Error-Correcting CodeFigure 4.9 Hamming Error-Correcting Code

Even Parity bits

Figure 4.9 uses Venn diagramsVenn diagrams to illustrate the use of Hamming code on 4-bit words (M=4). With three intersection circles, there are seven compartments. We assign the 4 data bits to the inner compartments. The remaining compartments are filled with parity parity bitsbits. Each parity bit is chosen so that the total number of 1s in its circle is eveneven.

The comparison logic receives as input two k-bit values. A bit-by-bit comparison is done by taking the exclusive-or exclusive-or of the two inputs. The results is called the syndrome wordsyndrome word.. The syndrome word is therefore K bits wide and has a range

between 0 and 2K-1. The value 0 indicates that no error was The value 0 indicates that no error was detected. Leaving 2detected. Leaving 2KK-1 values to indicate, if there is an error, -1 values to indicate, if there is an error, which bit was in error which bit was in error (the numerical value of the syndrome (the numerical value of the syndrome indicates the position of the data bit in error).indicates the position of the data bit in error).

An error could occur on any of the M data bitsM data bits or K check bits K check bits so, 22KK-1-1≥≥M+KM+K

(This equation gives the number of bits needed to correct a single bit error in a word containing M data bits.)

Figure 4.8 Error-Correcting Code

Figure 4.10 Layout of Data bits and Check bits

Those bit positions whose position number are powers of 2powers of 2 are designated as check bits.

Each check bit operates on every data bit position whose position number contains a 1 in the corresponding column position.

Bit position n is checked by those bits Ci such that ∑i=n.

C8 C4 C2 C1C8 C4 C2 C1

The check bits are calculated as follows, where the symbol designates the exclusive-or operation:

Assume that the 8-bit input words is 0011100100111001, with data bit M1 in the right-most position. The calculations are as follows:

Suppose the data bit 3 sustains an error and is changed from 0 to 1.

When the new check bits are compared with the old check bits, the syndrome word is formed:

The result is 0110, indicating that bit position 6, which contains data bit 3, in error.

Figure 4.11 Check Bit Degeneration

a single-error-correction (SEC) codea single-error-correction (SEC) code

More commonly, semiconductor memory is equipped with a a single-error-correcting double-error-detecting (SEC-DED) codesingle-error-correcting double-error-detecting (SEC-DED) code. An error-correction code enhances the reliability of the memory at the cost of added complexity.

Table 4.3 Increase in Word Length with Error CorrectionTable 4.3 Increase in Word Length with Error Correction

11

Figure 4.12 Hamming SEC-DEC Code

The sequence show that if two errors occurtwo errors occur (Figure 4.12 c), the checking procedure goes astray (d) and worsens the problem by creating a third error (e). To overcome the problem, an eighth bit is added that is set so that the total number of 1s in the diagram is even.

4.3 CASHE MEMORY

Small amount of fast memorySits between normal main memory and

CPUMay be located on CPU chip or module

Cache operation - overview Figure 4.14 Cache/Main-Memory Structure (P118)Figure 4.14 Cache/Main-Memory Structure (P118) Cache includes tags to identify which block of main memory is in each

cache slot. The tag is usually a portion of the main memory address.

TagTag BlockBlockLine Line NumberNumber

00

11

22

C-1C-1

Block length (k words)Block length (k words)(a) Cache(a) Cache

MemoryMemory

addressaddress 0

1

2

3

2n-1

Block (K words)

Block

Word LengthWord Length(b) Main Memory(b) Main Memory

Figure 4.15 Cache Read Operation (P119)Figure 4.15 Cache Read Operation (P119)• CPU requests contents of memory location• Check cache for this data• If present, get from cache (fast)• If not present, read required block from main memory to cache• Then deliver from cache to CPU

Typical Cache Organization

Figure 4.16 Typical Cache Organization

•In this organization, the cache connects to the processor via data, control, and address linesdata, control, and address lines.

•The data and address lines attach to data and address data and address buffersbuffers, which attach to a system bussystem bus from which main memory is reached.

•When a cache hita cache hit occurs, the data and address buffers are disabled and communication is only between processor and cache, with no system bus no system bus traffictraffic

•When a cache misscache miss occurs, the desired address is loaded onto the system bus and the data are returned through a data buffer a data buffer to both the cache and main to both the cache and main memory.memory.

Elements of Cache Design

SizeSize Mapping FunctionMapping Function

Direct Associative Set Associative

Replacement AlgorithmReplacement Algorithm Least recently used (LRU) First in first out (FIFO) Least frequently used (LFU) Random

Write PolicyWrite Policy Write through Write back Write once

Block SizeBlock Size Number of CachesNumber of Caches

Single or two level Unified or split

Cache Cache SizeSize

A trade-off between cost per bit and access time

Cost More cache is expensive

Speed More cache is faster (up to a point) Checking cache for data takes time

“Optimum” cache sizes Suggested : between 1K and 512K words.

Mapping FunctionMapping Function

Three techniques direct, associative, and set associativedirect, associative, and set associative

Elements of the example Cache of 64kByte Cache block of 4 bytes

Data is transferred between memory and the cache in blocksblocks of 4 bytes each.

i.e. cache is 16k (214) lines of 4 bytes

16MBytes main memory24 bit address (224=16M)Main memory (4M blocks of 4 bytes each)

Direct Mapping

Each block of main memory maps to only Each block of main memory maps to only one cache lineone cache line i.e. if a block is in cache, it must be in one specific

place

Address is in two partsLeast Significant ww bits identify unique word or

byte within a block of main memory.Most Significant ss bits specify one memory blockThe MSBs are split into a cache line field r r and a

tag of s-rs-r (most significant)The line field of rr identifies one of the m=2r lines

of the cache

Direct Mapping Cache Line Table

Cache lineCache line Main Main Memory blocks Memory blocks assignedassigned

0 0, m, 2m, … 2s-m

1 1, m+1, 2m+1…2s-m+1

m-1 m-1, 2m-1, 3m-1… 2s-1

The mapping is expressed as: i= j modulo mi= j modulo mwhere i =cache line number j = main memory block number m = number of lines in the cache

Every row has the same cache Every row has the same cache line number; Every column has line number; Every column has

the same tag number.the same tag number.

No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field!!

Direct Mapping Cache Organization

The r-bit r-bit line number is used as an index into the cache to access a particular lineline.

If the (s-r) bit(s-r) bit tag number matches the tag numbertag number currently stored in that line, then the w-bitw-bit word number is used to select one of the 2w bytes in that line.

Otherwise, the s s bits tag-plus-line field is used to fetch a block from main memory.

Direct MappingAddress Structure

Tag s-rs-r Line or Slot rr Word ww

8 14 2

24 bit address w =2 bit word identifier (4 byte block) s=22 bit block identifier

8 bit tag (=22-14) 14 bit slot or line

No two blocks in the same line have the same Tag fieldNo two blocks in the same line have the same Tag field Check contents of cache by finding line and checking Tagfinding line and checking Tag

Direct Mapping Example

Main Memory AddressMain Memory Address

• The cache is organized as 16K=214 lines of 4 bytes each.

• The main memory consists of 16Mbytes, organized as 4M blocks of 4 bytes each.

• i= j modulo m

i = cache line number

j = main memory block number

m = number of lines in the cache

• Note that no two blocks that map into the same line numberline number have the same tag numbertag number.

Direct Mapping pros & cons

Advantages Simple Inexpensive

Disadvantages Fixed location for given block

If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Associative Mapping

A main memory block can load into any A main memory block can load into any line of cacheline of cache

Memory address is interpreted as a a tag and tag and a a wordword field. field.

Tag Tag uniquely identifies block of memoryEvery line’s tag is examined for a matchDisadvantages of associative mapping

Cache searching gets expensive Complex circuitry required to examine the tags of

all caches in parallelin parallel.

Fully Associative Cache Organization

Tag 22 bitWord2 bit

Associative MappingAddress Structure

22 bit tag stored with each 32 bit (4B) block of dataCompare tag field with tag entry in cache to check

for hitLeast significant 2 bits of address identify which 16-

bit word is required from 32 bit data blocke.g.

Address Tag Data Cache line

16339C 058CE7 FEDCBA98 0001

Associative Mapping Example

Main Memory Address

Set Associative Mapping

Cache is divided into a number of setssetsEach set contains a number of lineslinesA given block maps to any line in a any line in a

given setgiven set e.g. Block B can be in any line of set i

e.g. 2 lines per set 2 way associative mapping2 way associative mapping A given block can be in one of 2 lines in only

one set

Set Associative Mapping

In this case , the cache is divided into vv sets, each of which consists of k k lines.

The relationships are m = v × km = v × k

i = j modulo vi = j modulo vwhere

i=cache set number j=main memory block number

m=number of lines in the cache

This is referred to as k-way set associative k-way set associative mappingmapping.

Two Way Set Associative Cache Organization

The dd set bits set bits specify one of v=2 v=2dd setssets. The s s bits of the tag and set fields specify one of the 2one of the 2ss blocks blocks of main memory.

With K-wayK-way set associative mapping, the tagthe tag in a memory address is much smaller and is only compared to the k tags compared to the k tags within a single setwithin a single set.

Set Associative MappingExample

13 bit set numberBlock number in main memory is modulo

213 000000, 00A000, 00B000, 00C000 … map

to same set

Set Associative MappingAddress Structure

Use set field to determine cache setcache set to look in Tag+Set field Tag+Set field specifies one of the blocks in the

main memory.Compare tag fieldtag field to see if we have a hite.g

Address Tag Data Set number

1FF 7FFC 1FF 24682468 1FFF

Tag 9 bit Set 13 bitWord2 bit

Two Way Set Associative Mapping Example

e.gAddress Tag Data Set number1FF 7FFC 1FF 24682468 1FFF

02C 0004 02C 11235813 0001

Main Memory AddressMain Memory Address

Replacement Algorithms (1)Direct mappingDirect mapping

When a new block is brought into the cache, one of the existing blocks must be replaced.

Direct mappingDirect mappingNo choiceEach block only maps to one lineReplace that line

Replacement Algorithms (2)Associative & Set AssociativeAssociative & Set Associative

Hardware implemented algorithmHardware implemented algorithm (speed)Least Recently used (LRU)

Replace that block in the set which has been in the cache longest with no reference to it. (hit ratio + time)

e.g. in 2 way set associativeWhich of the 2 block is LRU?

First in first out (FIFO) replace block in the set that has been in cache longest. (time)

Least frequently used replace block in the set which has had fewest hits. (hit ratio)

Random

Write Policy

Must not overwrite a cache block unless main memory is up to date

Problems to contend with More than one device may have access to main

memory.Data inconsistent between memory and cache

Multiple CPUs may have individual cachesData inconsistent among caches

Write PolicyWrite Policy Write through Write back Write once

Write throughWrite through

All writes go to main memory as well as cache

Any other processor-cache can monitor main memory traffic to keep local (to CPU) cache updated.

Disadvantages Lots of traffic Slows down writes

Write backWrite back

Updates initially made in cache only Update bitUpdate bit for cache slot is set when update

occurs If block in cache is to be replaced, write If block in cache is to be replaced, write

to main memory only if update bit is setto main memory only if update bit is setOther caches get out of syncI/O must access main memory through cache

Because portions of main memory are invalid

Approaches to cache coherency

Bus watching with write through Each cache controller monitors the address lines to detect

write operations to memory by other bus masters. This strategy depends on the use of a write-through

policy by all cache controller. Hardware transparency

Additional hardware is used to ensure that all the updates to main memory via cache are reflected in all caches.

Noncachable memory Only a portion of main memory is shared by more than

one processor. In such a system, all accesses to shared memory are

cache misses, because the shared memory is never copied to the cache.

The noncachable memory can be identified using chip-select logic or high-access bits.

Line Size

The principle of locality Data in the vicinity of a referenced word is likely

to be referenced in the near future.

The relationship between block size and hit The relationship between block size and hit ratioratio is complex, depending on the locality characteristics of a particular program, and no definitive optimum value has been found.

A size of from two to eight wordstwo to eight words seems reasonably close to optimum.

Number of caches

A single cacheMultiple caches

The number of levels of caches The use of unified versus split caches

Split cachesSplit caches: one dedicated to instructions and one dedicated to data

• Key advantage of split caches: eliminate contention for cache between the instruction processor and the execution unit.

Unified cacheUnified cache: a single cache used to store references to both data and instructions

For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically.

Number of caches

The on-chip cache: cache and processor on the same chip When the requested instruction or data is found in the on-chip

cache, the bus access is eliminated. Because of the short data paths internal to the processor, on-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycles.

Advantages Reduce the processor’s external bus activity Speed up execution times Increase overall system performance

A two-level cache The internal cache designated as level 1 (L1) The external cache designated as level 2 (L2)

4.4 Pentium Cache

Foreground readingFind out detail of Pentium II cache

systemsNOT just from Stallings!

4.5 Newer RAM Technology (1)

Basic DRAM same since first RAM chips ConstraintsConstraints of the traditional DRAM chip: its internal architecture and its interface to the

processor’s memory bus.Enhanced DRAM

Contains small SRAM as well SRAM holds last line read A comparator stores the 11-bit value of the most

recent row address selection.Cache DRAM (CDRAM)

Larger SRAM component Use as cache or serial buffer

Newer RAM Technology (2)

Synchronous DRAM (SDRAM) Access is synchronized with an external clock

unlike DRAM asynchronous. Address is presented to RAM Since SDRAM moves data in time with system

clock, CPU knows when data will be ready CPU does not have to wait, it can do something

else Burst mode allows SDRAM to set up stream of

data and fire it out in block

Internal logic of the SDRAM

• In burst mode, a series of data bits can be clocked out rapidly after the first bit has been accessed.

Burst modeBurst mode is useful when all the bits to be accessed are in sequence and in the same row of the array as the initial access

•A dual-bank internal A dual-bank internal architecturearchitecture that improves opportunities for on-chip parallelism.

• The mode registerThe mode register and associated control logiccontrol logic provide a mechanism to customize the SDRAM to suit specific system needs.

Newer RAM Technology (3)

Foreground readingCheck out any other RAM you can findSee Web site:

The RAM Guide

Exercises Exercises

P143 4.4, 4.6, 4.7, 4.8 P145 4.20

DeadlineDeadline

william stallings computer organization and architecture

Documents