a statistical scheduling technique for a computational market economy
DESCRIPTION
A Statistical Scheduling Technique for a Computational Market Economy. Neal Sample Stanford University. Research Interests. Compositional Computing (GRID) Reliability and Quality of Service Value-based and model-based mediation Languages: “Programming for the non-programmer expert” - PowerPoint PPT PresentationTRANSCRIPT
A Statistical Scheduling Technique for a Computational
Market Economy
Neal SampleStanford University
2 UCSC 2002
Research Interests Compositional Computing (GRID)
Reliability and Quality of Service Value-based and model-based mediation Languages:
“Programming for the non-programmer expert”
Database Research Semistructured indexing and storage Massive table/stream compression Approximate algorithms for streaming data
3 UCSC 2002
Why We’re Here
Coding
Integration/Composition
1970 1990 2010
4 UCSC 2002
GRID: Commodity Computing
5 UCSC 2002
GRID: Commodity Computing
6 UCSC 2002
GRID: Commodity Computing
On Demand
High Throughput
Collaborative
Distributed Supercomputing
Data Intensive(Large Hadron Collider) (Computer-in-the-loop)
(FightAIDSAtHome, Nug30)
(Chip design, cryptography)
(Data exploration, Education)
7 UCSC 2002
Remote, autonomous Services are not free
Fee ($), execution time 2nd order dependencies
“Open Service Model” Principles:GRID, CHAIMS Protocols: UDDI, IETF SLP Runtime: Globus, CPAM
Composition of Large Services
8 UCSC 2002
Grid Life is Tough Increased complexity throughout
New tools and applications Diverse resources such as computers,
storage media, networks, sensors Programming
Control flow & data flow separation Service mediation
Infrastructure Resource discovery, brokering, monitoring Security/authorization Payment mechanisms
9 UCSC 2002
Our GRID Contributions Programming models and tools System architecture Resource management Instrumentation and performance
analysis Network protocols and infrastructure Service mediation
10 UCSC 2002
Other GRID Research Areas
The nature of applications Algorithms and problem solving methods Security, payment/escrow, reputation End systems
Programming models and tools System architecture Resource management Instrumentation and performance
analysis Network protocols and infrastructure Service mediation
11 UCSC 2002
Roadmap Brief introduction to CLAM language Some related scheduling methods Surety-based scheduling
Sample program Monitoring Rescheduling
Results A few future directions
12 UCSC 2002
Decomposition of CALL-statement Parallelism by asynchrony in sequential
program Reduction of complexity of invoke statements Control of new GRID requirements
(estimation, trading, brokering, etc.) Abstract out data flow
Mediation for data flow control and optimization
Extraction model mediation Purely compositional
No primitives for arithmetic No primitives for input/output Targets the “non-programmer expert”
CLAM Composition Language
13 UCSC 2002
Pre-invocation:SETUP: set up the connection to a service
SET-, GETPARAM: in a service
ESTIMATE: service cost estimation
Invocation and result gathering:INVOKE
EXAMINE: test progress of an invoked method
EXTRACT: extract results from an invoked method
Termination:TERMINATE: terminate a method invocation/connection to
a service
CLAM Primitives
14 UCSC 2002
Resources + Scheduling Computational Model
Multithreading Automatic
parallelization
Resource Management Process creation OS signal delivery OS scheduling
endsyste
m
15 UCSC 2002
Resources + Scheduling Computational Model
Synchronous communication
Distributed shared memory
Resource Management Parallel process creation Gang scheduling OS-level signal
propagation
cluster
endsyste
m
16 UCSC 2002
Resources + Scheduling Computational Model
Client/server Loosely synchronous:
pipelines IWIM
Resource Management Resource discovery Signal distribution
networks
cluster
intranet
endsyste
m
17 UCSC 2002
Resources + Scheduling Computational Model
Collaborative systems Remote control Data mining
Resource Management Brokers Trading Mobile code negotiation
cluster
intranet
endsyste
m
Internet
18 UCSC 2002
Scheduling Difficulties Adaptation: Repair and Reschedule
Schedules for T0 are only guesses Estimates for multiple stages may become
invalid => Schedules must be revised during
runtime
t0 tfinish
schedule
work
reschedulehazard
work work
TIME
19 UCSC 2002
Scheduling Difficulties Service Autonomy: No Resource Allocation
The scheduler does not handle resource allocation
Users observe resources without control
Means: Competing objectives have orthogonal scheduling techniques Changing goals for tasks or users means
vastly increased scheduling complexity
20 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
21 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
PERT
Q
A
M
22 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
PERT
Q
A
M
CPM
M
R
A
23 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
ePERT(AT&T)Condor
(Wisconsin)
M
R
Q
PERT
Q
A
M
CPM
M
R
A
24 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
ePERT(AT&T)Condor
(Wisconsin)
M
R
Q
PERT
Q
A
M
CPM
M
R
A
Mariposa
(UCB)
R
Q
A
25 UCSC 2002
Some Related Work
R
A
M
Q
Rescheduling
Autonomy of Services
Monitoring Execution
QoS, probabilistic execution
ePERT(AT&T)Condor
(Wisconsin)
M
R
Q
Mariposa
(UCB)
R
Q
A
PERT
Q
A
M
CPM
M
R
A
SBS(Stanfor
d)
R
Q
A
M
26 UCSC 2002
Sample Program
C
A
D
B
27 UCSC 2002
Budgeting Time
Maximum allowable execution time Expense
Funding available to lease services
Surety Goal: schedule probability of success Assessment technique
28 UCSC 2002
Program Schedule as a Template
Instantiated at runtime Service provider selection,
etc.
C
A
D
B
CCCC
C
A
A A
A
B
B B
B
B
DD
DD
29 UCSC 2002
Program Schedule as a Template
Instantiated at runtime Service provider selection,
etc.
C
A
D
B
CCCC
C
A
A A
A
B
B B
B
B
DD
DD
30 UCSC 2002
Program Schedule as a Template
Instantiated at runtime Service provider selection,
etc.
C
A
D
B
CCCC
C
A
A A
A
B
B B
B
B
DD
DD
31 UCSC 2002
Program Schedule as a Template
Instantiated at runtime Service provider selection,
etc.
C
A
D
B
CCCC
C
A
A A
A
B
B B
B
B
DD
DD
32 UCSC 2002
t0 Schedule Selection
Guided by runtime “bids” Constrained by budget
C
A
D
B
CCCC
C
A
A A
A
B
B B
B
B
DD
DD
7±2h$50
6±1h$40
5±2h$30
3±1h$30
33 UCSC 2002
t0 Schedule Constraints
Budget Time: upper bound - e.g. 22h Cost: upper bound - e.g. $250 Surety: lower bound - e.g. 90% {Time, Cost, Surety} = {22, 250, 90}
Steered by user preferences/weights <Time, Cost, Surety> = <10, 1, 5>
Selection S1est [20, 150, 90] = (22-20)*10 + (250-150)*1 + (90-90)*5 = 120 S2est [22, 175, 95] = (22-22)*10 + (250-175)*1 + (95-90)*5 = 100 S3est [18, 190, 96] = (22-18)*10 + (250-190)*1 + (96-90)*5 = 130
34 UCSC 2002
budget time
bu
dg
et
cost
Budget
User Pref.
Pareto
Search Space
Expected Program Execution Time
Exp
ect
ed P
rog
ram
Cost
0
0
Plans
35 UCSC 2002
Program Evaluation and Review Technique
Service times:most likely(m), optimistic(a) and pessimistic(b)
32
2 iii
i
bam
e
6ii
i
ab
and iee 2iprogram
;programxet
program
etx
programprogram
eteTtTprob
)(
program
et
N(0, 1)
(1) expected duration (service)
(2) standard deviation
(3) expected duration (program)
(4) test value
(5) expectation test
(6) ~expectation test
36 UCSC 2002
t0 Complete Schedule Properties
0
5
10
15
20
25
30
13 14 15 16 17 18 19 20 21 22 23
Pro
bab
ility
Den
sity
Probable Program Completion Time
deadlineBank = $100 userspecified
surety
37 UCSC 2002
Individual Service Properties
C
A
B
7±2h
6±1h
5±2h
0 10~finish time
pro
babili
ty d
en
sity
0
1.20
1.20
1.2
38 UCSC 2002
14 23probable finish time0
1
t0 Combined Service Properties
0 10~finish time
pro
babili
ty d
en
sity
0
1.20
1.20
1.2Deadline
(22h)Surety(90%)
Current Surety(99.6%)
pro
babili
ty d
en
sity
39 UCSC 2002
Tracking Suretysu
rety
%
80
100
90
pro
bab
ility
den
sity
User-specifiedsurety
40 UCSC 2002
Runtime Hazards With control over resource allocation or
without runtime hazards Scheduling becomes much easier
Runtime implies t0 schedule invalidation Sample hazards
Delays and slowdowns Stoppages Inaccurate estimations Communication loss Competitive displacement… OSM
41 UCSC 2002
Progressive Hazard
execution time080
100
minimumsurety
hazard
90
sure
ty %
Definition + Detection
serviceAstart
serviceBstart
(serviceB slow)
42 UCSC 2002
Catastrophic Hazard
execution time080
100
minimumsurety
hazard
90
sure
ty %
Definition + Detection
0%
serviceAstart
serviceBstart
(serviceB fails)
43 UCSC 2002
Pseudo-Hazard
execution time080
100
minimumsurety
pseudo-hazard
90
sure
ty %
Definition + Detection
serviceAstart
serviceBstart
(serviceB communication failure)0%
44 UCSC 2002
Monitoring + Repair
Observe, not control Complete set of repairs
Sufficient (not minimal)
Simple cost model: early termination = linear cost recovery
Greedy selection of single repair -O(s*r)
C
A
D
B
45 UCSC 2002
Schedule Repair
execution time080
100
thazard
90
sure
ty %
C
A
D
B
trepair
46 UCSC 2002
Strategy 0: baseline (no repair)
pro: no additional $ cost pro: ideal solution for partitioning hazards
con: depends on self-recovery
execution time080
100
thazard
90
sure
ty %
trepair
C
A
D
B
47 UCSC 2002
Strategy 1: service replacement
pro: reduces $ lost
con: lost investment of $ and time con: concedes recovery chance
execution time080
100
thazard
90
sure
ty %
C
A
D
B
trepair
B’
48 UCSC 2002
Strategy 2: service duplication
pro: larger boost surety; leverages recovery chance
con: large $ cost
execution time080
100
thazard
90
sure
ty %
C
A
D
B
trepair
B’
49 UCSC 2002
Strategy 3: pushdown repair
pro: cheap, no $ lost pro: no time lost con: cannot handle catastrophic hazards con: requires recovery chance
execution time080
100
thazard
90
sure
ty %
C
A
D
B
trepair
C’
x
50 UCSC 2002
Experimental Results Rescheduling options
Baseline: no repairs Single strategy repairs
Limits flexibility and effectiveness Use all strategies
Setup 1000 random DAG schedules, 2-10
services 1-3 hazards per execution Fixed service availability All schedules are repairable
51 UCSC 2002
“The Numbers”
What is the value of a close finish? ( late)
0
200
400
600
800
1000
do nothing push-down replacement duplication all IDEAL
Repair Strategy
Sc
he
du
les
Fin
ish
ed
On
tim
e
On Time
52 UCSC 2002
“The Numbers”
What is the value of a close finish? ( late)
0
200
400
600
800
1000
do nothing push-down replacement duplication all IDEAL
Repair Strategy
Sc
he
du
les
Fin
ish
ed
On
tim
e
On Time On Time+stdev
53 UCSC 2002
Why the Differences? Catastrophic hazard
Service provider failure - “do nothing”: no solution to hazard
Pseudo-hazard Communication failure, network partition Looks exactly like catastrophic hazard - “do nothing” : the ideal solution
Slowdown hazard Not a complete failure, multiple solutions - “do nothing”: ideal or futile or
acceptable
54 UCSC 2002
A Challenge Observations of progress are only
secondary indicators of current work rate
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160 180 200
execution time
prog
ress
%
projected finish
finish time
projected finish
55 UCSC 2002
Open Questions Simultaneous rescheduling
Use more than one strategy for a hazard NP to find the optimal solution NP here might not be that hard…
Approximations are acceptable Small set Strong constraints NP is worst case, not average case?
(e.g., DFBB search)
Global impact of local schedule preferences How do local preferences interact in/reshape
the global market?
56 UCSC 2002
Open Questions Monitoring resolution adjustments
Networks are not free or zero latency Account cost of monitoring
Frequent monitoring = more cost Frequent monitoring = greater accuracy
Unstudied effect delayed status information
Accuracy of t0 service cost estimates Model as a hazard with delayed detection “1-way hazard” Penalty adjustments
57 UCSC 2002
Deeper Questions User preferences only used in generating
initial (t0) schedule fixed least cost repair ( = surety / repair
cost) Best cost repair (success sensitive to
preference?) Second order cost effects
$ left over in budget is purchasing power What is the value of that purchasing power? Sampling for cost estimates during runtime surety =time + progress (+ budgetBalance/valuation)
58 UCSC 2002
Conclusions Novel statistical method for service
scheduling Effective strategies for varied hazard mix Achieves per-user-defined Quality of Service Should translate well “out of the sandbox”
Clear directions for continued research
More information http://www.db.stanford.edu/~nsample/ http://www.db.stanford.edu/CHAIMS/
59 UCSC 2002
60 UCSC 2002
Steps in Scheduling
Estimation
Planning
Invocation
Monitoring
Completion
Rescheduling
61 UCSC 2002
CHAIMS Scheduler
ProgramAnalyzer
Input program
Planner
Requirements
Estimator/Bidder
Monitor Dispatcher
StatusCosts/Times Control
observe invokehaggle
User Requirements(e.g., Budget)
62 UCSC 2002
Simplified Cost Model
on time
target
start/run
finish
+
data transportation costs+
Completing the cost model
63 UCSC 2002
Full Cost Model
client ready to start
hold fee
late
early on time
target
start/run
reservation
finish
client ready for data
+ -+
+ +
data transportation costs+
Completing the cost model
64 UCSC 2002
The Eight Fallacies of Distributed Computing
-- Peter Deutsch 1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn't change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous