chapter 13: direct memory access …dma…provides direct access to the memory while the...
TRANSCRIPT
![Page 1: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/1.jpg)
Chapter 13: Direct Memory Access
• “…DMA…provides direct access to the memory while the microprocessor is temporarily disabled.”
• Typical uses of DMA– Video displays for refreshing the screen– Hard disk reads and writes– High-speed memory-to-memory transfers
• Timing behavior– Shown in Fig. 13-1– HOLD HLDA
• Microprocessor suspends execution of its program and places its address, data, and control bus into high-impendence (Z) states
![Page 2: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/2.jpg)
Basic DMA Definitions
• DMA normally occurs between an I/O device and memory without the use of the CPU
• DMA read– Transfers data from the memory to the I/O device
• DMA write– Transfers data from an I/O device to memory
• DMAC controls both memory and I/O device simultaneously
![Page 3: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/3.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip (initialization)
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 4: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/4.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O (via DREQs, e.g., 4 channels in 8237)
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 5: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/5.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC (HRQ, hold request in 8237)
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 6: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/6.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU (setting all the bus outputs of processor to Z)
5. DMA grant from DMAC
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 7: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/7.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC (via DACKs)
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 8: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/8.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer (if DRAM read) // MRDC & IOWC signals are controlled
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 9: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/9.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer (if DMA write) // MWTC & IORC signals are controlled
7. DMA sends INTR to CPU to inform completion of DMA.
MWTCMRDCIOWCIORC
![Page 10: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/10.jpg)
CPU
DMACHold
HoldA
ADDR
DATA
I/O Memory
(Bus Request)
(Bus Grant)
(DMA Request)
(DMA Grant)
1. CPU sends information for data transfer to DMAC chip
2. DMA request from I/O
3. Bus request from DMAC
4. Bus grant from CPU
5. DMA grant from DMAC
6. Data transfer
7. DMA sends INTR to CPU to inform completion of DMA
MWTCMRDCIOWCIORC
![Page 11: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/11.jpg)
DMA Operation Initiation
• CPU sends information about required data transfer operation to the DMAC chip– Source device/address, destination device/address, data block size,
type of data transfer (demand, single, block), etc.– Uses OUT assembly instructions to send DMAC chip this
information
• DMAC chip requests a DMA to the CPU by asserting the HOLD line (via its HRQ)
• CPU acknowledges request by asserting HLDA• Request priority in the microprocessor
– Reset > Hold > Interrupt
![Page 12: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/12.jpg)
Three Types of DMA Mode
• Demand mode– transfers data until DREQ becomes inactive
• Single mode– releases HOLD after each byte of data is transferred
– If DREQ is active, DMAC requests a DMA transfer to microprocessor
• Block mode– automatically transfers the number of bytes indicated
by the count register for the channel
![Page 13: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/13.jpg)
Advanced Topics
• Lecture– Cache (5/31)
– DRAM (already touched in Chapter 10)
– Flash memory-based storage (6/14)
• Practice– Introduction to RTL design in Verilog (6/2, LG105)
– Two practices (6/7 and 6/9, LG114)• Note: the two practices are run in the same manner as the
normal practices. 1st and 2nd sessions (3:20pm~4:00pm, and 4:00pm~4:40pm)
![Page 14: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/14.jpg)
Processor-DRAM Gap (latency)
Time
µProc 60%/year
DRAM7%/year
1
10
100
1000198
0198
1
198
3198
4198
5198
6198
7
198
8198
9199
0199
1199
2199
3199
4199
5199
6199
7199
8199
9200
0
DRAM
CPU198
2
Processor-MemoryPerformance Gap:(grows 50% / year)
Perf
orm
ance “Moore’s Law”
Four-issue 2GHz superscalar accessing 100ns DRAM could execute 800 instructions during time for one memory access!
[Source: K. Asanovic, 2008]
![Page 15: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/15.jpg)
What is a cache?• Small, fast storage used to improve average access time to
slow memory.• Exploits spatial and temporal locality• In computer architecture, almost everything is a cache!
– Registers a cache on variables– First-level cache a cache on second-level cache– Second-level cache a cache on memory– Memory a cache on disk (virtual memory)– TLB a cache on page table– Branch-prediction a cache on prediction information?
Proc/Regs
L1-Cache
L2-Cache
Memory
Disk, Tape, etc.
Bigger Faster
[Source: J. Kubiatowicz, 2000]
![Page 16: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/16.jpg)
Typical Memory Reference PatternsAddress
Time
Instruction fetches
Stackaccesses
Dataaccesses
n loop iterations
subroutine call
subroutine return
argument access
vector access
scalar accesses
[Source: K. Asanovic, 2008]
Temporal locality
Spatial locality
Temporal locality
Spatial localityTemporal & Spatial locality
![Page 17: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/17.jpg)
Memory Reference Patterns
Donald J. Hatfield, Jeanette Gerald: Program Restructuring for Virtual Memory. IBM Systems Journal 10(3): 168-192 (1971)
Time
Mem
ory
Ad
dre
ss (
on
e d
ot
per
acc
ess)
[Source: K. Asanovic, 2008]
![Page 18: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/18.jpg)
A Typical Memory Hierarchy c.2008
L1 Data Cache
L1 Instruction
CacheUnified L2
Cache
RF Memory
Memory
Memory
Memory
Multiported register file
(part of CPU)
Split instruction & data primary caches (on-chip SRAM)
Multiple interleaved memory banks(off-chip DRAM)
Large unified secondary cache (on-chip SRAM)
CPU
[Source: K. Asanovic, 2008]
![Page 19: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/19.jpg)
Itanium-2 On-Chip Caches(Intel/HP, 2002)
Level 1, 16KB, 4-way s.a., 64B line, quad-port (2 load+2 store), single cycle latency
Level 2, 256KB, 4-way s.a, 128B line, quad-port (4 load or 4 store), five cycle latency
Level 3, 3MB, 12-way s.a., 128B line, single 32B port, twelve cycle latency
[Source: K. Asanovic, 2008]
L3 and L2 caches occupy more than 2/3 of total area!
![Page 20: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/20.jpg)
Workstation Memory System(Apple PowerMac G5, 2003)
Dual 2GHz processors, each has:• 64KB I-cache, direct mapped• 32KB D-cache, 2-way• 512KB L2 unified cache, 8-way• All 128B lines
Up to 8GB DDR SDRAM, 400MHz, 128-
bit bus, 6.4GB/s
1GHz, 2x32-bit bus, 16GB/s
North Bridge Chip
AGP Graphics Card, 533MHz, 32-bit bus,
2.1GB/s
PCI-X Expansion, 133MHz, 64-bit bus,
1 GB/s
[Source: K. Asanovic, 2008]
![Page 21: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/21.jpg)
Cache Policies
• Inclusion
• Placement
• Replacement
![Page 22: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/22.jpg)
Inclusion Policy
• Inclusive multilevel cache: – Inner cache holds copies of data in outer cache– External access need only check outer cache– Most common case
• Exclusive multilevel caches:– Inner cache may hold data not in outer cache– Swap lines between inner/outer caches on miss– Used in AMD Athlon with 64KB primary and 256KB secondary cache
Why choose one type or the other?– Cache size matters. – In general, if L2 size >> L1 size, then inclusion policy
[Source: K. Asanovic, 2008]
![Page 23: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/23.jpg)
Types of Cache Miss
• “Three Cs”• 1st C: Compulsory Misses
– Happen when warming up the cache• 2nd C: Conflict Misses
– E.g., two addresses are mapped to the same cache line
– Solution: increase associativity• 3rd C: Capacity Misses
– E.g., sequential access of 40KB data via 32KB data cache
[Source: Garcia, 2008]
![Page 24: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/24.jpg)
Placement Policy
0 1 2 3 4 5 6 70 1 2 3Set Number
Cache
Fully (2-way) Set DirectAssociative Associative Mappedanywhere anywhere in only into
set 0 block 4 (12 mod 4) (12 mod 8)
0 1 2 3 4 5 6 7 8 91 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9
2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9
3 30 1
Memory
Block Number
block 12 can be placed
[Source: K. Asanovic, 2008]
Conflict miss!
![Page 25: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/25.jpg)
Direct-Mapped Cache
Tag Data Block V
=
BlockOffset
Tag Index
t k b
t
HIT Data Word or Byte
2k
lines
[Source: K. Asanovic, 2008]
![Page 26: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/26.jpg)
Placement Policy
0 1 2 3 4 5 6 70 1 2 3Set Number
Cache
Fully (2-way) Set DirectAssociative Associative Mappedanywhere anywhere in only into
set 0 block 4 (12 mod 4) (12 mod 8)
0 1 2 3 4 5 6 7 8 91 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9
2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9
3 30 1
Memory
Block Number
block 12 can be placed
[Source: K. Asanovic, 2008]
Conflict miss!
![Page 27: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/27.jpg)
2-Way Set-Associative Cache
Tag Data Block V
=
BlockOffset
Tag Index
t k
b
HIT
Tag Data Block V
DataWordor Byte
=
t
[Source: K. Asanovic, 2008]
Set
![Page 28: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/28.jpg)
4-Way Set Associative Cache Circuit
tagindex
Mux is time consuming!
[Source: Garcia, 2008]
![Page 29: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/29.jpg)
Fully Associative Cache
Tag Data Block V
=
Blo
ckO
ffse
t
Tag
t
b
HIT
DataWordor Byte
=
=
t
[Source: K. Asanovic, 2008]
![Page 30: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/30.jpg)
Fully Associative Cache
• Benefit of Fully Assoc Cache– No Conflict Misses (since data can go anywhere)
• Drawbacks of Fully Assoc Cache– Need hardware comparator for every single entry
• If we have a 64KB of data in cache with 4B entries, we need 16K comparators and 16K input MUX
• Infeasible for large size caches
– However, used for small size (e.g., 128 entry) caches, e.g., TLB
[Source: Garcia, 2008]
![Page 31: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/31.jpg)
Replacement PolicyIn an associative cache, which block from a set should be evicted when the set becomes full?
• Random•used in highly (fully) associative caches, e.g., TLB
• Least Recently Used (LRU)• LRU cache state must be updated on every access• true implementation only feasible for small sets (2-way)• pseudo-LRU binary tree often used for 4-8 way
• First In, First Out (FIFO) a.k.a. Round-Robin• used in highly associative caches
• Other options, e.g., recent frequently used, etc.
This is a second-order effect. Why?
Replacement only happens on misses
[Source: K. Asanovic, 2008]
![Page 32: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/32.jpg)
Cache Size (KB)
Mis
s R
ate
per
Typ
e
0
0.02
0.04
0.06
0.08
0.1
0.12
0.141 2 4 8
16
32
64
12
8
1-way
2-way
4-way
8-way
Capacity
Compulsory
3Cs Absolute Miss Rate (SPEC92)
Conflict
Compulsory vanishinglysmall
[Source: J. Kubiatowicz, 2000]
![Page 33: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/33.jpg)
Cache Size (KB)
Mis
s R
ate
per
Typ
e
0
0.02
0.04
0.06
0.08
0.1
0.12
0.141 2 4 8
16
32
64
12
8
1-way
2-way
4-way
8-way
Capacity
Compulsory
2:1 Cache Rule
Conflict
miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2
[Source: J. Kubiatowicz, 2000]
![Page 34: Chapter 13: Direct Memory Access …DMA…provides direct access to the memory while the microprocessor is temporarily disabled. Typical uses of DMA –Video](https://reader034.vdocuments.site/reader034/viewer/2022050920/55177be35503460e6e8b52ff/html5/thumbnails/34.jpg)
RuleIf the workload is large, the cache miss rate is observed to decrease as a power law of the cache sizeIf the cache size is doubled, the miss rate drops by the factor of
2
2
[Source: A. Hartstein, 2006]