lecture 27 multiprocessor scheduling. last lecture: vmm two old problems: cpu virtualization and...

32
Lecture 27 Multiprocessor Scheduling

Upload: randall-neal

Post on 18-Jan-2018

233 views

Category:

Documents


0 download

DESCRIPTION

Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues related to multi-core: scheduling and scalability

TRANSCRIPT

Page 1: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Lecture 27Multiprocessor

Scheduling

Page 2: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues
Page 3: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

• Last lecture: VMM• Two old problems: CPU virtualization and memory

virtualization

• I/O virtualization

• Today• Issues related to multi-core: scheduling and scalability

Page 4: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem• Since we have multiple private caches:

How to keep the data consistent across caches?• Each core should perceive the memory as a

monolithic array, shared by all the cores

Page 5: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=15213

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=15213

multi-core chip

Page 6: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=15213

multi-core chipassuming write-back caches

Page 7: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=15213

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=15213

multi-core chip

Page 8: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=21660

multi-core chipassuming write-through caches

Page 9: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Solutions for cache coherence• There exist many solution algorithms, coherence

protocols, etc.

• A simple solution:Invalidation protocol with bus snooping

Page 10: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Inter-core bus

Core 1 Core 2 Core 3 Core 4

One or more levels of

cache

One or more levels of

cache

One or more levels of

cache

One or more levels of

cache

Main memory multi-core chip

inter-core bus

Page 11: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Invalidation protocol with snooping• Invalidation:

If a core writes to a data item, all other copies of this data item in other caches are invalidated• Snooping:

All cores continuously “snoop” (monitor) the bus connecting the cores.

Page 12: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=15213

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=15213

multi-core chip

Page 13: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=21660

multi-core chipassuming write-through caches

INVALIDATEDsendsinvalidationrequest

Page 14: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

The cache coherence problem

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=21660

One or more levels of

cache

One or more levels of

cache

Main memoryx=21660

multi-core chipassuming write-through caches

Page 15: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Alternative to invalidate protocol: update protocol

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=15213

One or more levels of

cache

One or more levels of

cache

Main memoryx=21660

multi-core chipassuming write-through caches

broadcastsupdatedvalue

Page 16: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Alternative to invalidate protocol: update protocol

Core 1 Core 2 Core 3 Core 4

One or more levels of

cachex=21660

One or more levels of

cachex=21660

One or more levels of

cache

One or more levels of

cache

Main memoryx=21660

multi-core chipassuming write-through caches

broadcastsupdatedvalue

Page 17: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Invalidation vs update• Multiple writes to the same location• invalidation: only the first time• update: must broadcast each write

(which includes new variable value)

• Invalidation generally performs better:it generates less bus traffic

Page 18: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Programmers still Need to Worry about Concurrency• Mutex

• Condition variables

• Lock-free data structures

Page 19: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Single-QueueMultiprocessor Scheduling• reuse the basic framework for single processor

scheduling• put all jobs that need to be scheduled into a single

queue• pick the best two jobs to run, if there are two CPUs• Advantage: simple• Disadvantage: does not scale

Page 20: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

SQMS and Cache Affinity

Page 21: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Cache Affinity• Thread migration is costly• Need to restart the execution pipeline• Cached data is invalidated• OS scheduler tries to avoid migration as much as

possible: it tends to keeps a thread on the same core

Page 22: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

SQMS and Cache Affinity.

Page 23: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Multi-Queue Multiprocessor Scheduling

• Scalable• Cache affinity

Page 24: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Load Imbalance

• Migration

Page 25: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Work Stealing• A (source) queue that is low on jobs will

occasionally peek at another (target) queue• If the target queue is (notably) more full than the

source queue, the source will “steal” one or more jobs from the target to help balance load

• Cannot look around at other queues too often

Page 26: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Linux Multiprocessor Schedulers• Both approaches can be successful• O(1) scheduler• Completely Fair Scheduler (CFS)• BF Scheduler (BFS), uses a single queue

Page 27: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

An Analysis of Linux Scalability to Many Cores• This paper asks whether traditional kernel designs

can be used and implemented in a way that allows applications to scale

Page 28: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Amdahl's Law• N: the number of threads of execution• B: the fraction of the algorithm that is strictly serial• the theoretical speedup:

Page 29: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Scalability Issues• Global lock used for a shared data structure

• longer lock wait time

• Shared memory location• overhead caused by the cache coherency algorithms

• Tasks compete for limited size-shared hardware cache • increased cache miss rates

• Tasks compete for shared hardware resources (interconnects, DRAMinterfaces)• more time wasted waiting

• Too few available tasks:• less efficiency

Page 30: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

How to avoid/fix• These issues can often be avoided (or limited) using

popular parallel programming techniques• Lock-free algorithms• Per-core data structures• Fine-grained locking• Cache-alignment

• Sloppy Counters

Page 31: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues
Page 32: Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues

Current bottlenecks

• https://www.usenix.org/conference/osdi10/analysis-linux-scalability-many-cores