tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system
DESCRIPTION
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System. Ben Gamsa , Orran Krieger, Jonathan Appavoo , Michael Stumm. Locality. What do they mean by l ocality? locality of reference? temporal locality? spatial locality? . Temporal Locality. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/1.jpg)
Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System
Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm
![Page 2: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/2.jpg)
Locality
• What do they mean by locality?– locality of reference?– temporal locality?– spatial locality?
![Page 3: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/3.jpg)
Temporal Locality
• Recently accessed data and instructions are likely to be accessed in the near future
![Page 4: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/4.jpg)
Spatial Locality
• Data and instructions close to recently accessed data and instructions are likely to be accessed in the near future
![Page 5: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/5.jpg)
Locality of Reference
• If we have good locality of reference, is that a good thing for multiprocessors?
![Page 6: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/6.jpg)
Locality in Multiprocessors
• Good performance depends on data being local to a CPU– Each CPU uses data from its own cache• cache hit rate is high• each CPU has good locality of reference
– Once data is brought into cache it stays there• cache contents not invalidated by other CPUs• different CPUs have different locality of reference
![Page 7: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/7.jpg)
Example: Shared Counter
Memory
CPU
Cache
CPU
Cache
Counter
![Page 8: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/8.jpg)
Example: Shared Counter
Memory
CPU CPU
0
![Page 9: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/9.jpg)
Example: Shared Counter
Memory
CPU
0
CPU
0
![Page 10: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/10.jpg)
Example: Shared Counter
Memory
CPU
1
CPU
1
![Page 11: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/11.jpg)
Example: Shared Counter
Memory
CPU
1
CPU
1
1
Read : OK
![Page 12: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/12.jpg)
Example: Shared Counter
Memory
CPU CPU
2
2
Invalidate
![Page 13: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/13.jpg)
Performance
![Page 14: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/14.jpg)
Problems
• Counter bounces between CPU caches– cache miss rate is high
• Why not give each CPU its own piece of the counter to increment?– take advantage of commutativity of addition– counter updates can be local– reads require all counters
![Page 15: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/15.jpg)
Array-based Counter
Memory
CPU CPU
0 0
![Page 16: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/16.jpg)
Array-based Counter
Memory
CPU
1
CPU
1 0
![Page 17: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/17.jpg)
Array-based Counter
Memory
CPU
1
CPU
1
1 1
![Page 18: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/18.jpg)
Array-based Counter
Memory
CPU
1
CPU
1
1 1
CPU
2
Read Counter
Add All Counters
(1 + 1)
![Page 19: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/19.jpg)
PerformancePerforms no better than ‘shared counter’!
![Page 20: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/20.jpg)
Problem: False Sharing
• Caches operate at the granularity of cache lines– if two pieces of the counter are in the same cache
line they can not be cached (for writing) on more than one CPU at a time
![Page 21: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/21.jpg)
False Sharing
Memory
CPU CPU
0,0
![Page 22: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/22.jpg)
False Sharing
Memory
CPU
0,0
CPU
0,0
![Page 23: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/23.jpg)
False Sharing
Memory
CPU
0,0
CPU
0,0
0,0
Sharing
![Page 24: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/24.jpg)
False Sharing
Memory
CPU
1,0
CPU
1,0
Invalidate
![Page 25: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/25.jpg)
False Sharing
Memory
CPU
1,0
CPU
1,0
1,0
Sharing
![Page 26: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/26.jpg)
False Sharing
Memory
CPU CPU
1,1
1,1
Invalidate
![Page 27: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/27.jpg)
Solution?
• Spread the counter components out in memory: pad the array
![Page 28: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/28.jpg)
Padded Array
Memory
CPU CPU
00
![Page 29: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/29.jpg)
Padded Array
Memory
CPU
1
CPU
1
11
Updates independent of each other
![Page 30: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/30.jpg)
PerformanceWorks better
![Page 31: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/31.jpg)
Locality in OS
• Serious performance impact• Difficult to retrofit• Tornado– Ground up design– Object Oriented approach (natural locality)
![Page 32: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/32.jpg)
Tornado
• Object oriented approach• Clustered objects• Protected procedure call• Semi-automatic garbage collection– Simplifies locking protocols
![Page 33: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/33.jpg)
Object Oriented Structure
• Each resource is represented by an object• Requests to virtual resources handled
independently– No shared data structure access– No shared locks
![Page 34: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/34.jpg)
Why Object Oriented?
Process 1
Process 2
…
Process Table
![Page 35: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/35.jpg)
Why Object Oriented?
Coarse-grain locking:Process 1
Process 2
…
Process Table
Process 1
Lock
![Page 36: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/36.jpg)
Why Object Oriented?
Coarse-grain locking:Process 1
Process 2
…
Process Table
Process 1
Lock
Process 2
![Page 37: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/37.jpg)
Object Oriented Approach
Class ProcessTableEntry{datalock
code}
![Page 38: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/38.jpg)
Object Oriented Approach
Fine-grain, instance locking:Process 1
Process 2
…
Process Table
Process 1
Lock
Process 2
Lock
![Page 39: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/39.jpg)
Clustered Objects
• Problem: how to improve locality for widely shared objects?
• A single logical object can be composed of multiple local representatives– the reps coordinate with each other to manage
the object’s state– they share the object’s reference
![Page 40: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/40.jpg)
Clustered Objects
![Page 41: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/41.jpg)
Clustered Object References
![Page 42: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/42.jpg)
Clustered Objects : Implementation
• A translation table per processor– Located at same virtual address– Pointer to rep
• Clustered object reference is just a pointer into the table– created on demand when first accessed– global miss handling object
![Page 43: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/43.jpg)
Clustered Objects
• Degree of clustering• Management of state– partitioning– distribution– replication (how to maintain consistency?)
• Coordination between reps?– Shared memory– Remote PPCs
![Page 44: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/44.jpg)
Counter: Clustered Object
Counter – Clustered Object
CPU CPU
rep 1 rep 1
Object Reference
![Page 45: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/45.jpg)
Counter: Clustered Object
Counter – Clustered Object
CPU
1
CPU
1
rep 1 rep 1
Object Reference
![Page 46: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/46.jpg)
Counter: Clustered Object
Counter – Clustered Object
CPU
2
CPU
1
rep 2 rep 1
Object Reference
Update independent of each other
![Page 47: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/47.jpg)
Counter: Clustered Object
Counter – Clustered Object
CPU
1
CPU
1
rep 1 rep 1
Object Reference
![Page 48: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/48.jpg)
Counter: Clustered Object
rep 1 rep 1
Object Reference
Counter – Clustered Object
CPU
1
CPU
1
rep 1 rep 1
Read Counter
![Page 49: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/49.jpg)
Counter: Clustered Object
rep 1 rep 1
Object Reference
Counter – Clustered Object
CPU
1
CPU
1
rep 1 rep 1
Add All Counters
(1 + 1)
![Page 50: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/50.jpg)
Synchronization
• Two distinct locking issues– Locking• mutually exclusive access to objects
– Existence guarantees• making sure an object is not freed while still in use
![Page 51: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/51.jpg)
Locking in Tornado
• Encapsulate locking within individual objects• Uses clustered objects to limit contention• Uses spin-then-block locks
![Page 52: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/52.jpg)
Existence Guarantees: the problem
• Use a lock to protect all references to an object?– eliminates races where one thread is accessing the
object and another is deallcoating it– results in complex global hierarchy of locks
• Tornado - semi automatic garbage collection– Clustered object reference can be used any time– Eliminates needs for locks
![Page 53: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/53.jpg)
Existence Guarantees in Tornado
• Semi-automatic garbage collection:– programmer decides what to free, system decided
when to free it– guarantees that object references can be used
safely– eliminates needs for reference locks
![Page 54: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/54.jpg)
How does it work?
• Programmer removes all persistent references– Normal cleanup done manually
• System tracks all temporary references– Event driven kernel– Maintain an activity counter for each processor – Delete object only when activity counter is zero
![Page 55: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/55.jpg)
Performance Scalability
![Page 56: Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System](https://reader036.vdocuments.site/reader036/viewer/2022062305/56815fe8550346895dceecb4/html5/thumbnails/56.jpg)
Conclusion
• Object-oriented approach and clustered objects exploit locality to improve concurrency
• OO design has some overhead, but it is low compared to the performance advantages
• Tornado scales extremely well and achieves high performance on shared-memory multiprocessors