1 on-line parallel tomography shava smallen ucsd
TRANSCRIPT
![Page 1: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/1.jpg)
1
On-line Parallel Tomography
Shava Smallen
UCSD
![Page 2: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/2.jpg)
2
I) Introduction to On-line Parallel Tomography
II) Tunable On-line Parallel Tomography
III) User-directed application-level scheduler
IV) Experiments
V) Conclusion
Talk Outline
![Page 3: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/3.jpg)
3
What is tomography?
• A method for reconstructing the interior of an object from its projections
• At the National Center for Microscopy and Imaging Research (NCMIR), tomography is applied to electron microscopy to study specimens at the cellular and subcellular level
![Page 4: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/4.jpg)
4
Tomogram of spiny dendrite(Images courtesy of Steve Lamont)
Example
![Page 5: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/5.jpg)
5
Parallel Tomography at NCMIR
• Embarrassingly parallel
X
Y
slice
specimen
Z
scanlineprojection
projection
scanline
![Page 6: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/6.jpg)
6
NCMIR Usage Scenarios
Off-line parallel tomography (off-line PT)
– Data resides somewhere on secondary storage
– Single, high quality tomogram
– Reduce turnaround time
– Previous work (HCW’ 00)
On-line parallel tomography (on-line PT)
– Data streamed from the electron microscope
• long makespan, configuration errors, etc.
– Iteratively computed tomogram
– Soft real-time execution
![Page 7: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/7.jpg)
7
On-line PT
• Real-time feedback on quality of data acquisition1 ) First projection acquired from microscope2 ) Generate coarse tomogram3 ) Iteratively refine tomogram using subsequent
projections (refresh)• Update each voxel value • Size of tomogram is constant
![Page 8: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/8.jpg)
8
NCMIR Target Platform
• Multi-user, heterogenous resources– NCMIR cluster
• SGI Indigo2, SGI Octane, SUN ULTRA, SUN Enterprise
• IRIX, Solaris
– Meteor cluster• Pentium III dual proc• Linux, PBS
– Blue Horizon• AIX, Loadleveler, Maui Scheduler
network
![Page 9: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/9.jpg)
slices
preprocessor
ptomo
ptomo
ptomo
ptomo
ptomo
writer
On-line PT Architecture
projection
scanlines
tomogram
![Page 10: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/10.jpg)
10
On-line PT Design
1) Frame on-line parallel tomography as a tunable application– Resource limitations / dynamic– Availability of alternate configurations [Chang,et
al]• each configuration corresponds to different output
quality and resource usage
2) Coupled with user-directed application-level scheduler (AppLeS)– adaptive scheduler– promote application performance
![Page 11: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/11.jpg)
11
On-line PT Configuration
• Triple: (f, r, su)
• Reduction factor (f) – Reduce resolution of data reduce both
computation and communication
• Projections per refresh (r)– Reduce refinement frequency reduce
communication
• Service Units - (su)– Increase cost of execution increase
computational power
![Page 12: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/12.jpg)
12
User Preferences
• Best configuration (f, r, su) = (1, 1, 0 )
• Several possible configurations user specifies bounds– projections should be at least size 256x256
• 1 f 4 or 1 f 8
– user could tolerate up to a 10 minute time wait• 1 r 13
– reasonable upper bound• 0 su (50 x acquisition period x c)
![Page 13: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/13.jpg)
13
User-directed
• Feasible?– Use dynamic load information– if work allocation found
• Better? – e.g.
1. (1, 6, 4) - best f
2. (2, 2, 8) - good su/r
3. (2, 1, 20) - best r
reduction factor
projections per refresh
service units
![Page 14: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/14.jpg)
generaterequest
displaytriples
adjustrequest
reviewtriples
processrequest
findwork
allocation
executeon-line PT
accepts one
rejects all
infeasible
feasible
User-directed AppLeS
User
User-directed AppLeS
![Page 15: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/15.jpg)
15
Triple Search
• Search parameter space– If triple satisfies constraints feasible
• Constrained optimization problem based on soft real-time execution– compute constraint– transfer constraint
• Heuristics to reduce search space– e.g. assume user will always choose (1,2,1)
over (1,2,4)
![Page 16: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/16.jpg)
16
Work Allocation
work allocation
transfer constraints
cost
user constraints
compute constraints
cpu availability
processor availability
ptomo-to-writer bandwidth
subnet-to-writer bandwidth
Multiple mixed-integer programs approx soln
![Page 17: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/17.jpg)
17
Experiments
• Impact of dynamic information on scheduler performance
• Usefulness of tunability Grid environments
• Scheduling latency
![Page 18: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/18.jpg)
18
Dynamic Information
• We fix the triple and let schedulers determine work allocation
Infinite bandwidth
Dynamic bandwidth
Dedicated cpu
wwa wwa+bw
Dynamic cpu
wwa+cpu AppLeS
![Page 19: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/19.jpg)
19
• Evaluate schedulers– Repeatibility – Long makespan– several resource environments
• Simgrid (Casanova [CCGrid’2001])– API for evaluating scheduling algorithms
• tasks• resources modeled using traces
– E.g. Parameter sweep applications [HCW’00]
• Simtomo
Simulation
![Page 20: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/20.jpg)
20
relative refresh lateness
expected refresh period
actual refresh period
• Relative refresh lateness
Performance Metric
![Page 21: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/21.jpg)
21
NCMIR experiments
• Traces (8 machines)– 8 hour work day on March 8th, 2001
• Ran simulations throughout day at 10 minute intervals
8:00 am 4:00 pm
![Page 22: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/22.jpg)
22
Perfect Load Predictions
0 1 2 3 4 5 6 7 810
0
101
102
103
104
hours since 3/8/2001 - 8:00 PST
mea
n re
lativ
e re
fres
h la
tene
ss
wwawwa+cpuwwa+bwAppLeS
![Page 23: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/23.jpg)
23
Imperfect Load Predictions
0 1 2 3 4 5 6 7 810
0
101
102
103
104
hours since 3/8/2001 - 8:00 PST
me
an
rela
tive
re
fre
sh la
tene
ss
wwawwa+cpuwwa+bwAppLeS
![Page 24: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/24.jpg)
24
Synthetic Grids
• Bandwidth predictibility– Average prediction error
– pi {L, M, H}
– p1 p2 p3
• e.g. LMH
– 27 types– 2510 Grids
x 4 schedulers
– 10,040 simulations
writer
cluster3
cluster2
cluster1
p1
p2
p3
![Page 25: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/25.jpg)
25
wwa wwa+cpu wwa+bw AppLeS 0
500
1000
1500
2000
2500
3000
scheduler
num
be
r o
f run
s1st2nd3rd4th
Relative Scheduler Performance
705.89 658.91 127.10 1.07
![Page 26: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/26.jpg)
26
Partial Ordering
• Performance vs. bandwidth predictability
• Grid predictibility– Partial orders using p1 p2 p3
– Comparable/Not Comparable• e.g. HML is comparable to HLL• e.g. HLM is not comparable to LHM
• HHH, HHM, HMM, HLM, MLM, LLM, LLL
![Page 27: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/27.jpg)
27
Example Partial Order
HHH HHM HMM HLM MLM LLM LLL . 10
0
101
102
103
104
rela
tive
re
fre
sh la
ten
ess
(se
con
ds)
wwawwa+cpuwwa+bwAppLeS
![Page 28: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/28.jpg)
28
Tunability Experiments
• How useful is tunability?– variability
• Fixed topology– categorized traces
• L, M, H
– v1 v2 v3 v4 v5
– 243 Grid types cluster2
cluster1
writer
supercomputer
v2
v1
v3
v4
v5
![Page 29: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/29.jpg)
29
Tunability Experiments
• Run over a 2 day period– back-to-back– assume single user
model• f, r, su
• Set of triples chosen– T = {1,…,61}
02
46
8
05
10150
2
4
6
x 104
fr
su
![Page 30: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/30.jpg)
30
Tunability Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
fra
ctio
n o
f ch
an
ge
s
parameters
frsu
• Count how many times a triple changed per 2-day simulation
• e.g.– 12.9%– 25.7%
![Page 31: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/31.jpg)
31
0 2 4 6 8 100
1000
2000
3000
4000
5000
6000
7000
seconds
nu
mb
er
of
exp
erim
en
ts
Scheduling Latency
• Time to search for feasible triples• e.g.
– 88% under 1 sec– 63% under 1 sec
![Page 32: 1 On-line Parallel Tomography Shava Smallen UCSD](https://reader035.vdocuments.site/reader035/viewer/2022062519/5697bfc51a28abf838ca69a3/html5/thumbnails/32.jpg)
32
Conclusions and Future Work
• Grid-enabled version of on-line parallel tomography– Tunable application
• Tunability is useful in Grid environments
– User-directed AppLeS• Importance of bandwidth predictability
– e.g. rescheduling
• Scheduling latency is nominal
• Production use