tornado: maximizing locality and concurrency in a smmp os
Post on 21-Dec-2015
232 views
TRANSCRIPT
Contents
Types of Locality Locality: A closer look Requirements for locality Design Basics of Tornado Test Results Conclusion
Types of Locality* Temporal locality
“The concept that a resource that is referenced at one point in time will be referenced again sometime in the near future.”
Spatial locality“The concept that the likelihood of referencing a
resource is higher if a resource near it has been referenced.”
Sequential locality“The concept that memory is accessed
sequentially.”
*Source: Wikipedia
Locality: A closer look, Read only case
bool x = true;while (x) { // Do some work // reading but not // writing x…}
Processor # 1
x
Processor # 2
xCache Cache
xMemory
Locality: A closer look, Read only case
bool x = true;while (x) { // Do some work // reading but not // writing x…}
Processor # 1
x
Processor # 2
x
x
Cache Cache
Memory
Locality: A closer look, Read only case
bool x = true;while (x) { // Do some work // reading but not // writing x…}
Processor # 1
x
Processor # 2
x
x
Cache Cache
Memory
Locality: A closer look, Read only case
bool x = true;while (x) { // Do some work // reading but not // writing x…}
Processor # 1
x
Processor # 2
x
x
Cache Cache
Memory
Notes: No accesses on the bus Because accesses are
reads that are satisfied in local caches and no invalidations are sent
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x Cache
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
Invalidate block containing x
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
2. Read request
1. Cache miss
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
2. Read request
1. Cache miss
3. Data
Locality: A closer look, Read/Write case
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x
Processor # 2
x
xMemory
bool x = true;while (x) { x = false; // Do other // work…}
2. Read request
1. Cache miss
3. Data
4. Write
5. Invalidate block containing x Notes:
x becomes a bottleneck, the valid copy keeps jumping from one cache to the other
Every write access causing invalidation
Almost every read causing a read miss and a bus read
Locality: A closer look, Effect of Cache Line Length
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x,y
Processor # 2
x
Memory
bool y = true;while (y) { y = false; // Do other // work…}
y0x00x4
x,y
Notes: x & y have different
addresses but fall into the same cache line (block)!
Locality: A closer look, Effect of Cache Line Length
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x,y
Processor # 2
x
Memory
bool y = true;while (y) { y = false; // Do other // work…}
y0x00x4
x,y
Notes: Read doesn’t cause
any problem
Locality: A closer look, Effect of Cache Line Length
bool x = true;while (x) { x = false; // Do other // work…}
Processor # 1
x,y
Processor # 2
x
Memory
bool y = true;while (y) { y = false; // Do other // work…}
y0x00x4
x,y
Notes: Remember: Invalidations are
per cache-line/block not word! So we have pretty much the
same behavior as the read/write case on a single variable
Invalidate block containing x & y
Requirements for Locality
Spatial and temporal locality Minimizing read/write and write
sharing Minimize false sharing Minimize the distance between the
accessing processor and the target memory module.
Design Basics for Tornado
Individual resources are individual objects
Clustering objects Protected procedure calls (PPC) Semi-automatic garbage collection
Clustered Objects Appears as a
single object from the outside but is internally split into reps
Each rep handles requests from one or more processors
Lots of advantages to this design
Clustered Objects (cont.) Per-processor
translation tables Partitioned global
translation table Default “miss”
handlers
Protected Procedure Calls Microkernel: relies
on servers to carry on part of the OS job
As many server threads as there are clients
A request is handled on the same processor where it was issued
*Image source: Wikipedia
Garbage Collection
Semi-automatic Makes distinction between
temporary and persistent references to objects
Eliminates the need for two locks to guarantee existence and locking altogether for read only data