sep/05/2001pact 20011 fusion of concurrent invocations of exclusive methods yoshihiro oyama (japan...
TRANSCRIPT
Sep/05/2001 PaCT 2001 1
Fusion of Concurrent Invocations of Exclusive Methods
Yoshihiro Oyama
(Japan Science and Technology Corporation,working in University of Tsukuba)
Kenjiro Taura
Akinori Yonezawa
(University of Tokyo)
Sep/05/2001 PaCT 2001 2
Context of This Work
LanguagesOO languages + concurrent extension
(ex. Java)Concurrent OO languages
MachinesShared-memory parallel computers
SMPcc-NUMA MP
Sep/05/2001 PaCT 2001 3
Exclusive Methods
Methods that are executed exclusively
Ex: synchronized methods in Java
Concurrent invocations serialized and executed in turn
Sep/05/2001 PaCT 2001 4
Problem
Low scalabilityReason: serialization of concurrent invocationsAmdahl’s law
5% is sequential speedup < 20
tim
e
#procs
Sep/05/2001 PaCT 2001 5
Goal of This Work
Fusion of concurrent invocations(method fusion)Ex: “plus(1)” + “plus(2)” = “plus(3)”
Fusing two invocations that are serialized dynamically
Reducing the number of executions of exclusive methods eliminating bottlenecks
Sep/05/2001 PaCT 2001 6
Example
class Counter { int val; … sync void inc(int n) { val += n; }
}
inc(10);inc(20);
can be emulated with
inc(30);
observation
fusion inc(x) & inc(y) { inc(x + y);}
Sep/05/2001 PaCT 2001 7
Outline of Presentation
Language design API Sample programs Details of design Discussion
Implementation Experimental results Related work
Sep/05/2001 PaCT 2001 8
Target Language
C++ - Inheritance
+ Thread creation
+ Exclusive methods
+ Fusion rules
Sep/05/2001 PaCT 2001 9
API (1/2)
Syntax
Semantics Invocations of p and q may be replaced with S S is executed concurrently with other exclusive
methods on the same object
fusion p(x1,…, xn) & q(y1,…, yn) { S;}
Sep/05/2001 PaCT 2001 10
API (2/2)
Return value
fusion p(x1,…, xn) & q(y1,…, yn) { … mreturn a and b;}
To p’s caller To q’s caller
Sep/05/2001 PaCT 2001 11
Sample Program (1/2)
class Buffer { int len; double elems[…]; ... sync void put (double f) { elems[len++] = f; } sync double get (void) { return elems[--len]; } fusion put(f) & get() { return f; }}
“Bypassing” put and get
Sep/05/2001 PaCT 2001 12
Sample Program (2/2)
class Window {
…
fusion resize(x1, y1) & resize(x2, y2) {
resize(x1, y1);
}
fusion repaint() & repaint() {
repaint();
}
}
We can easily describe thespecification for omitting GUI events
Sep/05/2001 PaCT 2001 13
Sample Program (2/2)
• Existing technique• Programmers describe the detail implementation of event-queues
• Our technique• Programmers describe only the specifications of event-queues
Sep/05/2001 PaCT 2001 14
What is Fused? What is not?
Two dynamically-serialized invocationsDynamic occurrence of the execution of
two consecutive exclusive methods
Static fusion not supported
c->inc(10);c->inc(20);
Sep/05/2001 PaCT 2001 15
Discussion
How and where we intend method fusion is used
This work from wider view
Sep/05/2001 PaCT 2001 16
How and Where We Intend Method Fusion is Used
Fusion rules should be “performance hints”Should not change the behavior
i.e., whether method fusion happened should not be observable
“transparent” fusion rules
Sep/05/2001 PaCT 2001 17
But Currently...
Our compiler accepts non-transparent fusion rulesNo check for transparencyFusion rules may bring bugs
fusion inc(x) & inc(y) { inc(x - y); }
Sep/05/2001 PaCT 2001 18
This Work from Wider View (1/2)
Found new optimization for parallel langsSequential langs:
Parallel langs: control flow not in program textsOverlooked in existing works
x = y-2;x += 3;
x = y+1;
val +=1; val
+=2;
val +=3;
Sep/05/2001 PaCT 2001 19
This Work from Wider View (2/2)
Proposed useful and novel API for optimizing programs with human’s helpcf. parallel for, register, inline
Technical challenge we addressed: What kind of API is
Easy-to-use?Useful for speedup?
Sep/05/2001 PaCT 2001 20
Implementation
Lock is added to each object
Exclusive methods are serialized with locksAcquires lockExecutes exclusive methodsReleases lock
Sep/05/2001 PaCT 2001 21
Object Representation
FREE LOCKED LOCKED
Task = data structure storing info. of invocationLock = flag + queue
Sep/05/2001 PaCT 2001 22
Implementation of Method Fusion
LOCKED
inc (y)
thread X thread Y
inc (x) thread Z
Sep/05/2001 PaCT 2001 23
Implementation of Method Fusion
LOCKED
inc (y)
thread X
inc (x)
thread Z
thread Y
Sep/05/2001 PaCT 2001 24
Implementation of Method Fusion
LOCKED
thread Z
thread X thread Y
S T
Sep/05/2001 PaCT 2001 25
Implementation of Method Fusion
LOCKED
inc (x+y) thread Z
thread X thread Y
S T
Sep/05/2001 PaCT 2001 26
Implementation of Method Fusion
LOCKED
thread Y
T
thread X
Sinc(x+y)
Sep/05/2001 PaCT 2001 27
Implementation of Method Fusion
FREE
thread Y
T
thread X
S
Ret. val. 1
Ret. val. 2
Ret. val. 1
Sep/05/2001 PaCT 2001 28
Experiment
We implemented a compiler of the language
Sun Ultra Enterprise 10000 (64 CPUs)
Applications: Counter: inc(x) & inc(y) inc(x+y) FileWriter: write(str1) & write(str2) write(str) FileReader: get(path1) & get(path2) get(path1) ImageViewer: repaint() & repaint() repaint()
Sep/05/2001 PaCT 2001 29
Experiment
Implemented exclusive methods in 4 ways
Spin-locksMutex locks provided by OSOur locks (w/o fusion rules)Our locks (with fusion rules)
Sep/05/2001 PaCT 2001 30
FileReader
0
3000
6000
9000
12000
0 20 40 60number of processors
tim
e (m
sec) spin
mutexno fusionfusion
Sep/05/2001 PaCT 2001 31
ImageViewer
0
500
1000
1500
2000
0 5 10 15number of processors
tim
e (m
sec) spin
mutexno fusionfusion
Sep/05/2001 PaCT 2001 32
FileWriter
0
1500
3000
4500
6000
0 20 40 60number of processors
tim
e (m
sec) spin
mutexno fusionfusion
Sep/05/2001 PaCT 2001 33
Counter
0
400
800
1200
0 20 40 60number of processors
tim
e (m
sec) spin
mutexno fusionfusion
Sep/05/2001 PaCT 2001 34
Related Work (1/2)
Concurrent execution of associative exclusive operations Reduction, Concurrent Aggregates (CA) [Chien 91],
Adaptive Replication [Rinard et al. 99], Program fusion [Hu et al. 99]
Partial results are stored in multiple data areaOurs: stored in method arguments
Timing to summarize partial data is obviousOurs: obviousness not assumed
summarize immediately
Sep/05/2001 PaCT 2001 35
Related Work (2/2)
Network combining [Gottlieb et al. 83] Fusing machine instructions Ours: fusing method invocations
Composite functions [Bird 89], [Fisher et al. 94] (map f) ・ (map g) = (map f ・ g) Static fusion of statically-detected consecutive ops Ours: dynamic fusion of dynamically-detected consecutive ops Runtime overhead ⇔ applicability of fusion
Sep/05/2001 PaCT 2001 36
Future Work
Analysis of transparency fusion inc(x) & inc(y) { inc(x - y); }
InheritanceWhat if fused method is overridden?
Parallel fusions on a single object
Using (v + x) + y ≠ v + (x - y)?
Sep/05/2001 PaCT 2001 37
Concluding Remarks
First proposed new optimization for parallel languages
Various studies related to method fusionLanguage design and implementation schemeMany sample programs (see our paper)Experiments