tornado: maximizing locality and concurrency in a shared memory multiprocesor operating system by:...

40
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm Presented by: Holly Grimes

Upload: amos-chapman

Post on 28-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System

By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

Presented by: Holly Grimes

Page 2: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Locality

In uniprocessors Spatial Locality—access neighboring memory

locations Temporal Locality—access memory locations that

where accessed recently

In multiprocessors, locality involves each processor using data from its own cache

Page 3: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important? Modern multiprocessors exhibit

Higher memory latency Large write-sharing costs Large cache lines (false sharing) Larger system sizes Large secondary caches NUMA effects

Page 4: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 5: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 6: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 7: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 8: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 9: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important?

Page 10: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important? Sharing the counter requires moving it back

and forth between the CPU caches Solution??

Split the counter into an array of integers Each CPU gets its own counter

Page 11: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Achieving Locality

Page 12: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Achieving Locality

Page 13: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Achieving Locality

Page 14: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Achieving Locality Using an array of counters seems to solve our

problem… But what happens if both array elements map

to the same cache line? False sharing

Solution: Pad each array element

Different elements map to different cache lines

Page 15: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Comparing Counter Implementations

Page 16: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Why is Locality Important? Modern multiprocessors exhibit

Higher memory latency Large write-sharing costs Large cache lines (false sharing) Larger system sizes Large secondary caches NUMA effects

Page 17: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

NUMA Effects

Page 18: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

NUMA Effects

Page 19: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Achieving Locality in an OS We’ve seen several ways to achieve locality in

the implementation of a counter

Now we extend these concepts to see how locality can be achieved in an OS

Tornado’s Approach – Built a new OS from the ground up Make locality the primary design goal

Page 20: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Tornado’s Approach

Locality

Independence

User Application

Locality

Independence

OS Implementation

Illustration by Philip Howard

Page 21: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Tornado’s Locality-Maximizing features

Object-oriented Design

Clustered Objects

A New Locking Strategy

Protected Procedure Calls

Page 22: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Object-oriented Design

Page 23: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Object-oriented Structure Each OS resource is represented as a

separate object All locks and data structures are internal to

the objects localizes and encapsulates the locks and data

This structure allows different resources to be managed without accessing shared data structures without acquiring shared locks

Simplifies OS implementation modular design

Page 24: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Example: Memory Management Objects

HAT

Process

Region FCM

COR

Region FCM

COR

DRAM

HAT Hardware Address TranslationFCM File Cache ManagerCOR Cached Object RepresentativeDRAM Memory manager

Illustration by Philip Howard

Page 25: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Object-oriented Design

For each resource, this design provides one object to be shared by all CPUs Comparable to having one copy of the counter

shared among all CPUs

To maximize locality, something more needs to be done

Page 26: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Clustered Objects

Page 27: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Clustered Objects Clustered objects are composed of a set of

representative objects There can be one rep for the system, one rep per

processor, or one rep for a cluster of processors Clients access a clustered object using a

common reference to the object Each call to an object using this reference is

automatically directed to the appropriate rep Clients do not need to know anything about the

location or organization of the reps to use a clustered object

Impact on locality is similar to the effect of padded arrays on the counter

Page 28: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Clustered Objects

Page 29: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Keeping Clustered Objects Consistent

When a clustered object has multiple reps, there must be a way of keeping the reps consistent

Coordination between reps can happen via Shared Memory Protected Procedure Calls

Page 30: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Clustered Object Implementation Each processor has a translation table

The table is located at the same virtual address in every processor

For each clustered object, the table contains a pointer to the rep that serves the given processor

A clustered object reference is just a pointer into this table

Reps for a clustered object are created dynamically when they are first accessed Dynamic rep creation is dealt with by the global

miss handler

Page 31: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Clustered Object Implementation

Page 32: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

A New Locking Strategy

Page 33: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Synchronization

Two kinds of synchronization issues must be dealt with

Using locks to protect data structures

Ensuring the existence of needed data structures

Page 34: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Locks

Tornado uses spin-then-block locks to minimize the overhead of the lock/unlock instructions

Tornado limits lock contention by Encapsulating the locks in objects to limit the

scope of locks Using clustered objects to provide multiple copies

of a lock

Page 35: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Existence Guarantees Traditional Approach: use locks to protect object

references Prevents races where one process destroys an object

that another process is using Tornado’s Approach: semi-automatic garbage

collection Garbage collection destroys a clustered object only

when there are no more references to the object When clients use an existing reference to access a

clustered object, they have a guarantee that the referenced object still exists

References can be safely accessed without the use of (global) locks

Page 36: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Protected Procedure Calls

Page 37: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Interprocess Communication

Tornado uses Protected Procedure Calls (PPCs) to bring locality and concurrency to interprocess communication

A PPC is a call from a client object to a server object Acts like a clustered object call that passes between the

protection domains of the client and server processes Advantages of PPC

Client requests are always serviced on their local processor Clients and servers share the CPU in a manner similar to

handoff scheduling The server has one thread of control for each client request

Page 38: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Protected Procedure Calls

Page 39: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Performance

Page 40: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocesor Operating System By: Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael

Conclusions

Tornado’s increased locality and concurrency has produced a scalable OS design for multiprocessors

This locality was provided by several key system features

An object-oriented design Clustered objects A new locking strategy Protected procedure calls