sly and the roarvm: exploring the manycore future of programming

22
Sly and the RoarVM Exploring the Manycore Future of Programming Renaissance Project h=p://[email protected]/~smarr/renaissance/ Stefan Marr So@ware Languages Lab, Vrije Universiteit Brussel ExaScience Lab, 20110824

Upload: stefan-marr

Post on 02-Jul-2015

6.797 views

Category:

Technology


4 download

DESCRIPTION

The manycore future has several challenges ahead of us that suggest that fundamental assumptions of contemporary programming approaches do not apply anymore when scalability is required.Sly is a language prototype designed to experiment with the inherently nondeterministic properties of parallel systems. It is designed to enable programmers to embrace nondeterminism instead of guiding them to fight it. Nature shows that complex system can be built from independent entities that achieve a common goal without global synchronization/communication. Sly is design to enable the prototyping of algorithms that show such emerging behavior. It will be introduced in the first part of the talk.The second part of the talk will focus on the underlying problems of building virtual machines for the manycore future, which allow to harness the available computing power. The RoarVM was design to experiment on the Tilera TILE64 manycore processor architecture which provides 64 cores and characteristics that are distinctly different from today's commodity multicore processors. Memory bandwidth, caches and communication are the biggest challenges on such architectures and this talk will give a brief overview over the design choices of the RoarVM which tackle the characteristics of the TILE64 architecture.Acknowledgement: Sly and the RoarVM were designed and implemented by David Ungar and Sam Adams at IBM Research.

TRANSCRIPT

Sly  and  the  RoarVM  Exploring  the  Manycore  Future  of  Programming  

 Renaissance  Project  

h=p://[email protected]/~smarr/renaissance/  

 Stefan  Marr  

So@ware  Languages  Lab,  Vrije  Universiteit  Brussel  ExaScience  Lab,  2011-­‐08-­‐24  

Agenda  

•  Context:  Renaissance  Project  •  Sly  – Demo  – Language  Overview  

•  RoarVM  –  Intercore  Messaging  – Read-­‐mostly  heap  – Pauseless  GC  

•  Outlook  23/08/11   2  

Renaissance  Project  

•  CollaboraWon  between  –  IBM  Research,  Portland  State  Univ,  VUB  

•  Problem  – Parallel  systems  are  nondeterminisWc  – CommunicaWon  is  expensive  

•  The  vision  – Embrace  nondeterminism  – Harness  emergence  – Confidence  →  Delay    

23/08/11   3  

Demo  

24/08/11   4  

Possible  Real  Life  ApplicaWon  

23/08/11   5  

Image:  h=p://blogs.technet.com/b/andrew/archive/2007/08/22/olap-­‐cubes-­‐and-­‐mulWdimensional-­‐analysis.aspx  

Possible  Real  Life  ApplicaWon  

23/08/11   6  

•  ACK:  ongoing  work  at  IBM  Research  

•  Online  analyWcal  processing  (OLAP)  –  Business  intelligence,  reporWng,  data  mining  

•  Query  mulWdimensional  data  sets  –  Includes  dependent/calculated  data  dimensions  –  Sync.  required  to  avoid  use  of  stale  data  

•  Without  sync.:  how  to  bound  error?  –  RecalculaWng,  refreshing  results  

Sly  

•  Ensembles:  collecWons  represenWng  a  whole,  mulW-­‐part  enWWes  – e.g.  a  flock  of  birds  

•  Adverbs:  modifiers  to  operands  –  individualLY,  serialLY,  randomLY:  n  

•  Gerunds:  reducWon  semanWcs  –  averagING,  selectINGpredicate,  ensemblING  

23/08/11   7  

Examples  of  Boids  

•  Every  boid  is  its  own  thread,  no  synch.    

computeCentroid          ^  self  flock  boids  averagINGserialLYposition    

computeNeighbors          ^  self  flock  boids  selectINGisNear:  self    

matchVelocityWithNeighbors          ^  self  neighbors  averagINGvelocity  /  8.0    More  details:  h=p://[email protected]/~smarr/renaissance/sly3-­‐overview/  

23/08/11   8  

ROARVM  Virtual  Machines  for  Manycore  Architectures  

23/08/11   9  

The  Manycore  Problem  

•  Shared  memory    access  very  expensive  

 –  Inter-­‐core  messaging  – Purpose-­‐specific  networks  

23/08/11   10  

0   250   500   750   1000  1  word  

10  words  

DMA   Load  Mem.  Msg:  3  hops   Msg:  1  hop  

(from  [2])  

Architecture  Overview  

•  Core  =  Interpreter  Instance  =  OS  Process/Thread  •  ‘Single  System  Image’/shared  memory  VM  

23/08/11   11  

Message-­‐Passing  between  Interpreters  

•  To  avoid  main-­‐memory  latency    

•  VM  internal  use  only  – GC  

•  preGCAcWon  +  ACK  •  doAllRootsHere    +  Response,  …  

– SafepoinWng  •  requestSafepointOnOtherCores  •  grantSafepoint,  …  

23/08/11   12  

Memory  System  

•  Object  table,  moving  stop-­‐the-­‐world  GC  – Usually  allocaWon  on  heap  of  local  core  

24/08/11   13  

Object  Heap  

Memory  System  

•  Object  table,  moving  stop-­‐the-­‐world  GC  – Usually  allocaWon  on  heap  of  local  core  

23/08/11   14  

Object  Heap  

Global  Safepoint  

Read-­‐Mostly  Heap  [2]  

•  TILE64  cache  coherency  VERY  expensive  – Read-­‐mostly  heap:  incoherent,  ‘user  managed  CC’    

23/08/11   15  

Cache      

Cache  

Cache  

Read/Write  

Read-­‐mostly  

Stop-­‐the-­‐World  in  Manycore  Era?  

23/08/11   16  

Work  

Arrival  at  safe  point  

Do  garbage  collect  

Work  

Run  out  of  space  

Stop-­‐the-­‐World  in  Manycore  Era?  

23/08/11   17  

…  …  

…  

…  

Pauseless  GC  

•  Based  on  a  paper  by  Click  et  al.,  2005  [3]  •  No  global  synchronizaWon  – Local  checkpoints:  1  mutator  and  GC  thread  

•  ConWnuously  running  GC  threads  •  Load-­‐Value-­‐barrier  +  self-­‐healing  of  stale  refs.  •  Constant  performance  overhead  – Without  hardware  support  

•  Easy  to  implement  (2  MA  students,  1  month)  

23/08/11   18  

OUTLOOK  

23/08/11   19  

Conclusion  

•  Sly:  map/reduce-­‐like  programming  model  – Based  on  ensembles,  adverbs  and  gerunds  

•  RoarVM  – Research  arWfact,  stable,  used  daily  – No  opWmizaWon  of  sequenWal  performance  – Tested  up  to  59cores  – Shows  weak  scalability  

23/08/11   20  

Outlook  

•  Sly  –  Ideas  for  non-­‐linear  syntax  – Algorithms  for  nondeterminisWc  programming  

•  RoarVM:  focus  on  research  – VMs  are  mulW-­‐language  plarorms  – Support  for  concurrency  models  – DSLs  for  specific  concurrent/parallel  problems  – Language  guarantees  in  inter-­‐model  interacWon  

23/08/11   21  

References  and  Material  [1]  J.  Pallas  and  D.  Ungar.  MulWprocessor  Smalltalk:  A  Case  Study  of  a  MulWprocessor-­‐based  Programming  Environment.  In  PLDI  ’88,  p.  268–277.  ACM,  1988.  

[2]  D.  Ungar  and  S.  Adams.  HosWng  an  Object  Heap  on  Manycore  Hardware:  An  ExploraWon.  In  DLS’09,  p.  99-­‐110.  ACM,  2009.  

[3]  C.  Click,  G.  Tene,  and  M.  Wolf.  The  Pauseless  GC  Algorithm.  In  VEE  ’05,  p.  46-­‐56.  ACM,  2005.  

[4]  D.  Ungar,  D.  Kimelman,  and  S.  Adams.  Inconsistency  Robustness  for  Scalability  in  InteracWve  Concurrent‑Update  In-­‐Memory  MOLAP  Cubes.  In  Inconsistency  Robustness  Symposium  2011.  

[5]  P.  Inostroza  Valdera.  The  Sly3  Programming  Language:  An  IntroducWon  and  Analysis.  h=p://[email protected]/~smarr/renaissance/sly3-­‐overview/  

[6]  Renaissance  Project:  h=p://[email protected]/~smarr/renaissance/    

23/08/11   22  Roundup  and  Conclusion