hermann härtig, marcus völp hardware - tu...

48
08/12/11 Real-Time Systems Hardware Hermann Härtig, Marcus Völp

Upload: vuongbao

Post on 09-Apr-2018

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

08/12/11

Real-Time Systems

Hardware

Hermann Härtig, Marcus Völp

Page 2: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 2

Outline

● Hardware: Source of Unpredictability● Caches● Pipeline● Interrupt Latency● System Management Mode

● Special Purpose Hardware for “Embedded” Real-Time Systems

● Use unpredictable Hardware more predictably

● Real-Time Communication / Buses in separate Lecture

Page 3: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 3

Sources of Unpredictability

● Memory Subsystem● TLB● Caches● Store Buffers

0100301003

store R1, 0x01003025

025025

0100301003 302302TLB:

302302 025025

PT WalkerPT Walkermiss

hit

Virtual address:

Physical address:

30203020

3020 2 5

L1 Cache:

Level 2 - CacheLevel 2 - Cache

Main MemoryMain Memory

0x10030250x1003025 ^R1^R1

Memory ControllerMemory Controller

Page 4: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 4

Sources of Unpredictability

● Processor Pipeline (ARM 11)

DEDEIFIFIFIF ISSISS ALUALU

MACMAC MACMAC MACMAC

ADDADD DCDC DCDC

WBWB

WBWB

WBWB

Page 5: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 5

Sources of Unpredictability

● Interrupt Latency

DeviceDevice

Interrupt ControllerInterrupt Controller

CPUCPU

incoming signal

interrupts enabled?

await instruction boundary

kernel entry

select handler code (index into interrupt vector table)

handle_irq: ... return from interrupt

Page 6: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 6

Interrupt Response Time – RTLinux

Interrupt response time:Time from occurrence of interrupt to first instruction of RT Task

No parallel load (idle): 13 µs

High parallel load: 68 µs (Benchmark, Cache-Flodder)

Measured on Intel P4 1.6 GHz

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45 50

Frequ

ency

IRQ occurence rate (µs)

log.fl.josephina.rtlinux: a

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45 50

Frequ

ency

IRQ occurence rate (µs)

log.t3.josephina.rtlinux: lat_proc_sh_dynamic

Page 7: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 7

Interrupt Response Time – L4RTL (+AS switch)

Interrupt response time:Time from occurrence of interrupt to first instruction of RT Task

No parallel load (idle): 43 µs

High parallel load: 85 µs (Benchmark, Cache-Flooder)

Measured on Intel P4 1.6 GHz

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45 50

Frequ

ency

IRQ occurence rate (µs)

log.fl.josephina.fiasco: a

0.0001

0.001

0.01

0.1

1

0 5 10 15 20 25 30 35 40 45 50

Frequ

ency

IRQ occurence rate (µs)

log.t3.josephina.fiasco: lat_proc_sh_dynamic

Page 8: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 8

System Management Mode (SMM)

● PC platforms

● “sits” underneath operating system

● Invoked using non-maskable interrupt

● Used for platform specifcs, correction of design errors,

thermal management

● Can be switched off, but better do not try

● Adds unpredictable delays

Page 9: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 9

Outline

● Hardware is Source of Unpredictability● Special Purpose Hardware for “Embedded” Real-Time

Systems● Low Latency Interrupt Mode● Peripheral Event Controller● Capture – Compare Units● Scratchpad Memories● Real-Time Clocks

● How to use unpredictable Hardware more predictably

Page 10: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 10

Low Latency Interrupts ...

… are considered of primary importance ● Sampling of Data in a High Rate (Sensors, Video, ...)● Fast Response Times (Break Control, …)● Part of context switch time .....

Page 11: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 11

Low Latency Interrupts ...

… are considered of primary importance ● Sampling of Data in a High Rate (Sensors, Video, ...)● Fast Response Times (Break Control, …)● Part of context switch time .....

Page 12: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 12

Low-Latency Interrupts● Two Interrupt modi: ARM IRQ + FIQ

● FIQ interrupts IRQ handler● 5 registers immediately available for FIQ handler

no need to save register content

● ARM Low Interrupt Latency Confguration● Minimize worst case interrupt latency, recommendations:

– disable Hit-under-Miss in Cache– abandon pending restartable memory OPs

restart memory OP on return from interrupt– do not use multi-word load / store instructions– avoid accesses to slow memory

(device memory, strong ordering of memory accesses)

● Worst Case FIQ Latency (PL190 VIC) : – Interrupt synchronization 3 cycles– Worst case execution time of current instruction 7 cycles– Entry to frst instruction 2 cycles

SUM: 12 cycles

Page 13: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 13

Low Latency Interrupts (2)

Peripheral Event Controller (PEC)● Memory ↔ io-register move ● triggered by signal

● Programmers Interface:– counter decrement on signal,

trigger CPU interrupt on “counter==0”– byte / word select select width of transfer (byte or word)– source / dest increase update source / destination address– source / dest address

● Semantics expressed as Signal-Handler:

signal () { if (counter--) *dest++ = *src; => minimal response time (~3 cycles) else => PEC can execute at external clock rate trigger_interrupt();

}

Example: SAB 80C166 (Siemens, 1990)

Page 14: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 14

Timestamps / Event triggers

● Precise timestamps / trigger times with interrupt-based sampling is difficult

● system load influences jitter in interrupt response time● atomic sections in OS delay interrupts● jitter in signal delivery (from signal source to CPU interrupt pin)

– buses, interrupt controller

Page 15: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 15

Capture + Compare Unit

● Problem● precise timestamp of event

(engine sensor triggered at cylinder position = t)● trigger events at precise points in time

(fre the engine at t + x)

● interrupt handlers' jitter too high to meet the points in time

Capture Compareevent interrupt

ClockClock

Capture RegisterCapture Register Compare RegisterCompare Register

ClockClock

handler

== event

Page 16: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 16

Combination PEC, Capture-Compare

● Compare + PEC: highly efficient triggering of events

● Capture + PEC: highly efficient sampling

● no SW intervention

Page 17: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 17

System on a Chip / Chip Multiprocessor

● Processor Chip contains entire system● CPU Cores / Peripheral Components / Memory available

as masks for weaver● Confgure system as required

● Examples:● Mobile Phones today:

General Purpose Core (e.g., ARM) + Baseband DSP Processor

● Game Console (Cell): General Purpose (PPC) + special purpose Graphics

● Sensor + 8-bit Controller + CAN bus

Page 18: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 18

Scratch-pad Memory vs. Caches

● Synonyms: [Scratchpad / On-Chip / Tightly-Coupled] Memory

● Cache: ● transparent addressing scheme● cache controller flls / replaces cachelines in parallel to

CPU activity● difficult to predict worst-case execution time

L1

L2

Main Memory

Page 19: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 19

Scratch-pad Memory vs. Caches (2)

● Synonyms: [Scratchpad / On-Chip / Tightly-Coupled] Memory

● Cache: ● transparent addressing scheme● cache controller flls / replaces cachelines in parallel to

CPU activity● difficult to predict worst-case execution time

L1

L2

Main Memory

Page 20: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 20

Scratch-pad Memory vs. Caches (3)

● Synonyms: [Scratchpad / On-Chip / Tightly-Coupled] Memory

● Cache: ● transparent addressing scheme● cache controller flls / replaces cachelines in parallel to

CPU activity● difficult to predict worst-case execution time

L1

L2

Main Memory

Page 21: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 21

Scratch-pad Memory vs. Caches (4)

● Synonyms: [Scratchpad / On-Chip / Tightly-Coupled] Memory

● Cache: ● transparent addressing scheme● cache controller flls / replaces cachelines in parallel to

CPU activity● difficult to predict worst-case execution time

L1

L2

Main Memory

CPUCPU

1 cycle ~10 cycles

~50 cycles

Page 22: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 22

Scratch-pad Memory vs. Caches (5)

● Synonyms: [Scratchpad / On-Chip / Tightly-Coupled] Memory

● Cache: ● transparent addressing scheme● cache controller flls / replaces cachelines in parallel to

CPU activity● difficult to predict worst-case execution time

L1

L2

Main Memory

CPUCPU

1 cycle ~10 cycles

~50 cycles

Clean: ~ 60 cyclesDirty: ~ 170 cycles

(Old Pentium 3)

Page 23: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 23

Scratch-pad Memory vs. Caches (6)

Set

= =

Tag Data

Way 1 Way 2

Cache lock:

● Lock complete cache way● Allocate to unlocked cache ways only

Cache line

Terminology:* set* tag* way* cache line

Address example needed ........ for explanation

Page 24: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 24

Scratch-pad Memory vs. Caches (7)

● Scratchpad Memory:● memory is addressed directly

(device mapped to physical memory space)● 2 cycles latency / currently ~ 64 KB (more ~4 MB to

come)● must explicitly manage data allocation● Compiler-based approaches:

– static: determine code + data hot spots + allocate scratchpad memory for hot spots

– dynamic: copy data to / from scratchpad memory

Page 25: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 25

Scratch-pad Memory vs. Caches (8)

● ARM Tightly Coupled Memory:

RAMRAM TCMTCM

TCMTCM

RAMRAM

● overlay normal RAM with TCM

● simplifed cache logic for TCM memory

● keeps track whether RAM or TCM holds current value

● on miss: copies data from underlying RAM to TCM

● Which are the differences ??

Page 26: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 26

Real Time Clocks

● Multiple Clocks in SoC● Core Cycle Count 200 MHz fne grained

+ fast to access

– accuracy ; clock skew ; subject to power management (dvfs)● Audio Codec● PCI 33 MHz/ 66 MHz● Timer● RTC 32.768 kHz ; high precision ;

small clock skew

Page 27: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 27

Outline

● Hardware is Source of Unpredictability● Special Purpose Hardware for “Embedded” Real-Time

Systems● Use unpredictable Hardware more predictable

● Cache Partitioning● Real-Time Communication / Buses in separate Lecture

Page 28: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 28

(OS-Controlled) Cache Partitioning

● Goal● partition cache to minimize interference between

RT/RT or RT/NRT applications

● General method:● address regions map to cache sets

● Control allocation to address regions

● Approaches● Compiler/linker

● OS controlled– transparent to application

– no need to rely on cooperating applications

isolates malicious, erroneous applications

Page 29: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 29

RT and Non-RT

Page 30: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 30

Cache Partitioning

Page 31: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 31

Cache Partitioning

v : vpn

r : rpn

virtual address

(vpn: virtual page number)

real address

MMUL1

Cache

L2 Cache

Memory

Page 32: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 32

rpn

L2 Cache

rpn

tag co

nten

t

… cachelines

{ index

=

{

{

Page 33: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 33

rpn

tag co

nten

t

{ index

={

{

tag co

nten

t

=

2 way associativity

Page 34: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 34

1L Cache

2L Cache

Main

Memory

ppn off

000000

010000

020000

030000

040000

050000

060000

070000

100000

110000

120000

130000

140000

150000

160000

170000

001000

011000

021000

031000

041000

051000

061000

071000

101000

111000

121000

131000

141000

151000

161000

171000

002000

012000

022000

032000

042000

052000

062000

072000

102000

112000

122000

132000

142000

152000

162000

172000

003000

013000

023000

033000

043000

053000

063000

073000

103000

113000

123000

133000

143000

153000

163000

173000

004000

014000

024000

034000

044000

054000

064000

074000

104000

114000

124000

134000

144000

154000

164000

174000

005000

015000

025000

035000

045000

055000

065000

075000

105000

115000

125000

135000

145000

155000

165000

175000

006000

016000

026000

036000

046000

056000

066000

076000

106000

116000

126000

136000

146000

156000

166000

176000

007000

017000

027000

037000

047000

057000

067000

077000

107000

117000

127000

137000

147000

157000

167000

177000

Page 35: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 35

1L Cache

2L Cache

Main

Memory

ppn off

000000

010000

020000

030000

040000

050000

060000

070000

100000

110000

120000

130000

140000

150000

160000

170000

001000

011000

021000

031000

041000

051000

061000

071000

101000

111000

121000

131000

141000

151000

161000

171000

002000

012000

022000

032000

042000

052000

062000

072000

102000

112000

122000

132000

142000

152000

162000

172000

003000

013000

023000

033000

043000

053000

063000

073000

103000

113000

123000

133000

143000

153000

163000

173000

004000

014000

024000

034000

044000

054000

064000

074000

104000

114000

124000

134000

144000

154000

164000

174000

005000

015000

025000

035000

045000

055000

065000

075000

105000

115000

125000

135000

145000

155000

165000

175000

006000

016000

026000

036000

046000

056000

066000

076000

106000

116000

126000

136000

146000

156000

166000

176000

007000

017000

027000

037000

047000

057000

067000

077000

107000

117000

127000

137000

147000

157000

167000

177000

Page 36: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 36

1L Cache

2L Cache

Main

Memory

ppn off

000000

010000

020000

030000

040000

050000

060000

070000

100000

110000

120000

130000

140000

150000

160000

170000

001000

011000

021000

031000

041000

051000

061000

071000

101000

111000

121000

131000

141000

151000

161000

171000

002000

012000

022000

032000

042000

052000

062000

072000

102000

112000

122000

132000

142000

152000

162000

172000

003000

013000

023000

033000

043000

053000

063000

073000

103000

113000

123000

133000

143000

153000

163000

173000

004000

014000

024000

034000

044000

054000

064000

074000

104000

114000

124000

134000

144000

154000

164000

174000

005000

015000

025000

035000

045000

055000

065000

075000

105000

115000

125000

135000

145000

155000

165000

175000

006000

016000

026000

036000

046000

056000

066000

076000

106000

116000

126000

136000

146000

156000

166000

176000

007000

017000

027000

037000

047000

057000

067000

077000

107000

117000

127000

137000

147000

157000

167000

177000

Page 37: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 37

v : vpn

r : rpn

virtual address

real address

MMU

L2 Cache

L1 Cache

Memoryr´: rpnhardware address

Page 38: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 38

OS Controlled Cache Partitioning (2)

Cache Coloring

16 bit address (binary representation): 1001 0011 0010 0100

0x93

offset into 32 byte cachelineidxtag

+0 +32 +64 +96 +128 +160 +192 +224

Cach

e S

ize

Farben besser erklären

Page 39: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 39

OS-Controlled Cache Partitioning

Address translation and caches

0 10000 10000000 0000 0011 0000 000000 0000 0011 0000 00 10 1000 01110 1000 011

1000 0110 10001000 0110 1000

0x30a0x30aphysical frame number

864864

8648640x400030x40003virtual address:

physical address:

virtual page number

MMUMMU

0000 0000 0011 0000 00100000 0000 0011 0000 0010offset

tag idx ofs

1010

1010

Page 40: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 40

OS Controlled Cache Partitioning (4)

Cache Coloring

Address translation and caches

0 10000 10000000 0000 0011 0000 000000 0000 0011 0000 00 10 1000 01110 1000 011

1000 0110 10001000 0110 1000

0x30a0x30aphysical frame number

864864

8648640x400030x40003virtual address:

physical address:

virtual page number

MMUMMU

0000 0000 0011 0000 00100000 0000 0011 0000 0010offset

tag idx ofs L1 Cache:64KB, 4 Way, 32byte CL2 bits:

● subject to address translation● evaluated to determine cache set => assign different colors to different RT + non RT Apps. => allocate in the OS memory frames of respective color

1010

1010

Page 41: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 41

OS-Controlled Cache Partitioning

Scenario● high priority real-time task (color = 00)● runs in frequent short intervals

● other tasks in the backgound (e.g., cache flodder)

Example: Filter

● Worst case:● without cache partitioning:

background tasks may evict all cachelines background tasks may load conflicting cachelines, which need

writeback

Flush FlushFlush

Page 42: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 42

OS-Controlled Cache Partitioning

1 0 0 %

1 5 0 %

2 0 0 %

2 5 0 %

3 0 0 %

3 5 0 %

4 0 0 %

4 8 1 6 3 2 6 4 1 2 8 2 5 6 5 1 2 1 0 2 4 2 0 4 8 4 0 9 6

u n p a r t i t i o n e d

s n g l w r p a r t

p a r t i n t i o n e d

Filter

PentiumPro, 133 MHz

(sngl wr = single writer – only RT app may write to pages of color x ; others may read)

Page 43: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 43

OS-Controlled Cache Partitioning

Example 2: low priority task; frequently interrupted e.g., matrix multiplication

64 x 64 Matrix Multiplication

643 x 5.51 cycles: 10.9 ms

Flush Flush Flush Flush

timeslice

Page 44: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 44

1 0 0 %

2 0 0 %

3 0 0 %

4 0 0 %

5 0 0 %

6 0 0 %

7 0 0 %

8 0 0 %

9 0 0 %

1 0 0 0 %

1 1 0 0 %

1 2 0 0 %

1 3 0 0 %

1 4 0 0 %

1 5 0 0 %

1 6 0 0 %

1 7 0 0 %

5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0

u n p a r t i t i o n e d

s n g l w r p a r t

p a r t i t i o n e d

1 0 0 %

1 2 0 %

1 4 0 %

1 6 0 %

1 8 0 %

2 0 0 %

2 2 0 %

2 4 0 %

2 6 0 %

3 0 0 4 0 0 5 0 0 6 4 0 7 6 8 8 9 6 1 0 2 4

OS-Controlled Cache Partitioning

64 x 64 Matrix Multiplication

Pentium, 133 MHz

[µs]

[µs]

timeslice

Page 45: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 45

100%

105%

110%

115%

120%

125%

130%

135%

300 400 500 640 768 896 1024

1 0 0 %

1 5 0 %

2 0 0 %

2 5 0 %

3 0 0 %

3 5 0 %

4 0 0 %

4 5 0 %

5 0 0 %

5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0

u n p a r t i t i o n e d

p a r t i t i o n e d

OS-Controlled Cache Partitioning

64 x 64 Matrix Multiplication

Pentium, 133 MHz

write through

[µs]timeslice [µs]

[µs]

Page 46: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 46

OS-Controlled Cache Partitioning

Experiment 3:Combination: Matrix Multiplication + Filter

0

5 0

1 0 0

1 5 0

2 0 0

2 5 0

3 0 0

2 0 k H z 1 0 k H z 5 k H z 3 . 3 k H z 2 k H z

u n p a r t i t i o n e d

p a r t i o t i o n e d

Matrix Multi- plication64 x 64

MM + Filter

1 0 0 %

1 2 0 %

1 4 0 %

1 6 0 %

1 8 0 %

2 0 0 %

2 2 0 %

2 4 0 %

2 6 0 %

2 8 0 %

2 0 k H z 1 0 k H z 5 k H z 3 . 3 k H z 2 k H z

[ms]

Page 47: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 47

OS-Controlled Cache Partitioning

Caveat: application transparency

some applications require knowledge / control of physical addressese.g., drivers need physical addresses for direct memory address transfers (DMA)

● modify driver● use recent hardware (e.g., Intel VT-d) with address translation

for DMA

Caveat:

caches (especially in SMP) have become much more

complicated

Page 48: Hermann Härtig, Marcus Völp Hardware - TU Dresdenos.inf.tu-dresden.de/Studium/RTS/WS2011/06-Hardware.pdf · WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp

WS 2011/12 Real-Time Systems, Hardware / Hermann Härtig / Marcus Völp 48

Conclusions

● High performance CPUs are rarely used in embedded systems

● HW means to increase predictability

● SW tweaks to use unpredictable HW in a more predictable way

● References:● Liedtke, Härtig, Hohmuth (RTAS '97):

Operating system control cache predictability for real-time applications

● ARM Real View Emulation Board User Guide

● ARM 1176jzfs Manual

● SAB 80C166 Handbook