sly and the roarvm: exploring the manycore future of programming
DESCRIPTION
The manycore future has several challenges ahead of us that suggest that fundamental assumptions of contemporary programming approaches do not apply anymore when scalability is required.Sly is a language prototype designed to experiment with the inherently nondeterministic properties of parallel systems. It is designed to enable programmers to embrace nondeterminism instead of guiding them to fight it. Nature shows that complex system can be built from independent entities that achieve a common goal without global synchronization/communication. Sly is design to enable the prototyping of algorithms that show such emerging behavior. It will be introduced in the first part of the talk.The second part of the talk will focus on the underlying problems of building virtual machines for the manycore future, which allow to harness the available computing power. The RoarVM was design to experiment on the Tilera TILE64 manycore processor architecture which provides 64 cores and characteristics that are distinctly different from today's commodity multicore processors. Memory bandwidth, caches and communication are the biggest challenges on such architectures and this talk will give a brief overview over the design choices of the RoarVM which tackle the characteristics of the TILE64 architecture.Acknowledgement: Sly and the RoarVM were designed and implemented by David Ungar and Sam Adams at IBM Research.TRANSCRIPT
Sly and the RoarVM Exploring the Manycore Future of Programming
Renaissance Project
h=p://[email protected]/~smarr/renaissance/
Stefan Marr
So@ware Languages Lab, Vrije Universiteit Brussel ExaScience Lab, 2011-‐08-‐24
Agenda
• Context: Renaissance Project • Sly – Demo – Language Overview
• RoarVM – Intercore Messaging – Read-‐mostly heap – Pauseless GC
• Outlook 23/08/11 2
Renaissance Project
• CollaboraWon between – IBM Research, Portland State Univ, VUB
• Problem – Parallel systems are nondeterminisWc – CommunicaWon is expensive
• The vision – Embrace nondeterminism – Harness emergence – Confidence → Delay
23/08/11 3
Possible Real Life ApplicaWon
23/08/11 5
Image: h=p://blogs.technet.com/b/andrew/archive/2007/08/22/olap-‐cubes-‐and-‐mulWdimensional-‐analysis.aspx
Possible Real Life ApplicaWon
23/08/11 6
• ACK: ongoing work at IBM Research
• Online analyWcal processing (OLAP) – Business intelligence, reporWng, data mining
• Query mulWdimensional data sets – Includes dependent/calculated data dimensions – Sync. required to avoid use of stale data
• Without sync.: how to bound error? – RecalculaWng, refreshing results
Sly
• Ensembles: collecWons represenWng a whole, mulW-‐part enWWes – e.g. a flock of birds
• Adverbs: modifiers to operands – individualLY, serialLY, randomLY: n
• Gerunds: reducWon semanWcs – averagING, selectINGpredicate, ensemblING
23/08/11 7
Examples of Boids
• Every boid is its own thread, no synch.
computeCentroid ^ self flock boids averagINGserialLYposition
computeNeighbors ^ self flock boids selectINGisNear: self
matchVelocityWithNeighbors ^ self neighbors averagINGvelocity / 8.0 More details: h=p://[email protected]/~smarr/renaissance/sly3-‐overview/
23/08/11 8
The Manycore Problem
• Shared memory access very expensive
– Inter-‐core messaging – Purpose-‐specific networks
23/08/11 10
0 250 500 750 1000 1 word
10 words
DMA Load Mem. Msg: 3 hops Msg: 1 hop
(from [2])
Architecture Overview
• Core = Interpreter Instance = OS Process/Thread • ‘Single System Image’/shared memory VM
23/08/11 11
Message-‐Passing between Interpreters
• To avoid main-‐memory latency
• VM internal use only – GC
• preGCAcWon + ACK • doAllRootsHere + Response, …
– SafepoinWng • requestSafepointOnOtherCores • grantSafepoint, …
23/08/11 12
Memory System
• Object table, moving stop-‐the-‐world GC – Usually allocaWon on heap of local core
24/08/11 13
Object Heap
Memory System
• Object table, moving stop-‐the-‐world GC – Usually allocaWon on heap of local core
23/08/11 14
Object Heap
Global Safepoint
Read-‐Mostly Heap [2]
• TILE64 cache coherency VERY expensive – Read-‐mostly heap: incoherent, ‘user managed CC’
23/08/11 15
Cache
Cache
Cache
Read/Write
Read-‐mostly
Stop-‐the-‐World in Manycore Era?
23/08/11 16
Work
Arrival at safe point
Do garbage collect
Work
Run out of space
Pauseless GC
• Based on a paper by Click et al., 2005 [3] • No global synchronizaWon – Local checkpoints: 1 mutator and GC thread
• ConWnuously running GC threads • Load-‐Value-‐barrier + self-‐healing of stale refs. • Constant performance overhead – Without hardware support
• Easy to implement (2 MA students, 1 month)
23/08/11 18
Conclusion
• Sly: map/reduce-‐like programming model – Based on ensembles, adverbs and gerunds
• RoarVM – Research arWfact, stable, used daily – No opWmizaWon of sequenWal performance – Tested up to 59cores – Shows weak scalability
23/08/11 20
Outlook
• Sly – Ideas for non-‐linear syntax – Algorithms for nondeterminisWc programming
• RoarVM: focus on research – VMs are mulW-‐language plarorms – Support for concurrency models – DSLs for specific concurrent/parallel problems – Language guarantees in inter-‐model interacWon
23/08/11 21
References and Material [1] J. Pallas and D. Ungar. MulWprocessor Smalltalk: A Case Study of a MulWprocessor-‐based Programming Environment. In PLDI ’88, p. 268–277. ACM, 1988.
[2] D. Ungar and S. Adams. HosWng an Object Heap on Manycore Hardware: An ExploraWon. In DLS’09, p. 99-‐110. ACM, 2009.
[3] C. Click, G. Tene, and M. Wolf. The Pauseless GC Algorithm. In VEE ’05, p. 46-‐56. ACM, 2005.
[4] D. Ungar, D. Kimelman, and S. Adams. Inconsistency Robustness for Scalability in InteracWve Concurrent‑Update In-‐Memory MOLAP Cubes. In Inconsistency Robustness Symposium 2011.
[5] P. Inostroza Valdera. The Sly3 Programming Language: An IntroducWon and Analysis. h=p://[email protected]/~smarr/renaissance/sly3-‐overview/
[6] Renaissance Project: h=p://[email protected]/~smarr/renaissance/
23/08/11 22 Roundup and Conclusion