university of dortmund june 30, 2015 1 on grid performance evaluation using synthetic workloads...
Post on 21-Dec-2015
215 views
TRANSCRIPT
April 18, 20231
University of Dortmund
On Grid Performance Evaluation using Synthetic Workloads
JSSPP 2006
Alexandru Iosup, Dick EpemaPDS Group, ST/EWI, TU Delft
Carsten Franke, Alexander Papaspyrou, Lars Schley, Baiyi Song, and Ramin Yahyapour UniDo
April 18, 20232
University of Dortmund
Outline
• A Brief Introduction to Grid Computing• On Grid Performance Evaluation
Experimental Environments Performance Indicators General Workload Modeling Grid-Specific Workload Modeling The GrenchMark Framework
• Future Work• Conclusions
April 18, 20233
University of Dortmund
A Brief Introduction to Grid Computing
• Typical grid environment• Applications [!]
• Unitary, composite• Data
• Resources• Compute (Clusters)• Storage• (Dedicated) Network
• Virtual Organizations, Projects• Groups, Users
• Grids vs. parallel production environments• Dynamic• Heterogeneous• Very large-scale (world)• No central administration
→ Most resource management problems are NP-hard
April 18, 20234
University of Dortmund
Experimental Environments Real-World Testbeds
• Real-World Testbed• DAS, NorduGrid, Grid3/OSG, Grid’5000…
• Pros• True performance, also shows “it works!”• Infrastructure in place
• Cons• Time-intensive• Exclusive access (repeatability)• Controlled environment problem (limited scenarios)• Workload structure (little or no realistic data)• What to measure (new environment)
April 18, 20235
University of Dortmund
Experimental Environments Simulated and Emulated Testbeds• Simulated and Emulated Testbeds• GridSim, SimGrid, GangSim, MicroGrid … • Essentially trade-off precision vs. speed
• Pros• Exclusive access (repeatability)• Controlled environment (unlimited scenarios)
• Cons• Synthetic Grids: What to generate? How to generate?
Clusters, Disks, Network, VOs, Groups, Users, Applications, etc.
• Workload structure (little or no realistic data)• What to measure (new environment)• Validity of results (accuracy vs. time)
April 18, 20236
University of Dortmund
Grid Performance Evaluation Current Practice
• Performance Indicators• Define my own metrics, or use U and AWT/ART, or both
• Workload Structure• Run my own workload, or use traces that are not validated
by peer researchers; do not make comparisons!• Run benchmarks from typical parallel production
environments• Mostly all users are created equal assumptionNeed a common Need a common
performance performance evaluation framework evaluation framework
for Gridfor Grid
April 18, 20237
University of Dortmund
Grid Performance Evaluation Current Issues
• Performance Indicators• What should be the metrics for the new
environment? • Workload Structure• Which general aspects could be important? • Which Grid-specific aspects need to be
addressed?Need a common Need a common
performance performance evaluation framework evaluation framework
for Gridfor Grid
April 18, 20238
University of Dortmund
Performance Indicators
• Time-, Resource-, and System-Related Metrics• Traditional: utilization, A(W)RT, A(W)WT, A(W)SD• New: waste, fairness (or service quality reliability)
• Workload Completion and Failure Metrics“ In Grids, functionality may be
even more important than performance ”• Workload Completion (WC)• Task and Enabled Task Completion (TC, ETC)• System Failure Factor (SFF)
April 18, 20239
University of Dortmund
General Aspects for Workload Modeling
• User/Group/VO model• Detailed modeling for top-5/10 users, then
clustering (Use squash area to group)
• Submission patterns• Yearly, monthly, weekly, daily• Do daily patterns exist? (Are Grids truly global?)
• Temporal patterns• Repeated submission (batches of jobs)• Job dependencies (composite applications common in Grid(?))
• Feedback • Empiric rules (don’t submit jobs when system busy). But,
reactive submission tools, co-allocators, evolving applications, etc.
April 18, 202310
University of Dortmund
Grid-Specific Workload ModelingComputation Management
• Processor co-allocation• Fixed, non-fixed, semi-fixed jobs
• Job flexibility and composition• Moldable, evolvable, flexible, etc.• Batches, workflows, other dependecies
• Other aspects• Background load: define top jobs (by consumption),
model the rest as background load• Project stage
April 18, 202311
University of Dortmund
Grid-Specific Workload ModelingData and Network Management
• Clearly Defined I/O Requirements• Files, streams, others• Data location and size
• Replicas• Replica location
• Other aspects• HDD occupancy
• Clearly Defined Network Requirements• Bandwidth, latency• Communication pattern
• Special Situations• Dedicated paths, other
QoS
• Other aspects• Background load
April 18, 202312
University of Dortmund
Grid-Specific Workload ModelingLocality/Origin Management
• Job issuer and execution siteNot all VOs are created equal !
• Two-level view: Which VO generates the next job? Within a VO, which user generates the next job?
• Three-level view, Multi-level view (Project, VO, Group, User)
• (Usage) Service Level Agreements• Use my system 50% for 7 days, or 20% for 30 days• Dedicated paths, other QoS
• Other aspects• Background load pertaining to same (u)SLA
April 18, 202313
University of Dortmund
Grid-Specific Workload ModelingFailure Modeling
• Error level• Infrastructure• Middleware• Application• User
• Fault tolerance scheme for submitted jobs• Catch the system feedback into the model
• Other aspects• Cascading errors
April 18, 202314
University of Dortmund
Grid-Specific Workload ModelingEconomic Models
• Utility • Resource utility• Application utility
• Pricing policies • Time-dependent pricing: pay cheaper on off-peak hours• Load-dependent pricing: pay cheaper for unused resources• Package pricing: pay cheaper for bundles of resources• Trust-building pricing: pay cheaper as old users
• Other aspects• Available information• Penalty / user satisfaction
April 18, 202315
University of Dortmund
GrenchMark: a Framework for Analyzing, Testing, and Comparing grids• What’s in a name?
grid benchmark → working towards a generic tool for the whole community: help standardizing the testing procedures, but benchmarks are too early; we use synthetic grid workloads instead
• What’s it about?A systematic approach to analyzing, testing, and comparing grid settings, based on synthetic workloads• A set of metrics for analyzing grid settings• A set of representative grid applications
• Both real and synthetic• Easy-to-use tools to create synthetic grid workloads• Flexible, extensible framework
April 18, 202316
University of Dortmund
GrenchMark Overview: Easy to Generate and Run Synthetic Workloads
April 18, 202317
University of Dortmund
… but More Complicated Than You Think• Workload structure
• User-defined and statistical models • Dynamic jobs arrival• Burstiness and self-similarity• Feedback, background load• Machine usage assumptions• Users, VOs
• Metrics• A(W) Run/Wait/Resp. Time • Efficiency, MakeSpan• Failure rate [!]
• (Grid) notions• Co-allocation, interactive jobs, malleable, moldable, …
• Measurement methods• Long workloads• Saturated / non-saturated system• Start-up, production, and cool-down scenarios• Scaling workload to system
• Applications• Synthetic• Real
• Workload definition language
• Base language layer• Extended language layer
• Other• Can use the same workload for both simulations and real environments
GrenchMark may become a vehicle for GrenchMark may become a vehicle for proving proving
(performance indicators, workload (performance indicators, workload modeling) modeling)
research in dynamic, heterogeneous, research in dynamic, heterogeneous, very large-scale environmentsvery large-scale environments
April 18, 202319
University of Dortmund
GrenchMark: Iterative Research RoadmapSimple functional systemA.Iosup, J.Maassen, R.V.van Nieuwpoort, D.H.J.Epema,
Synthetic Grid Workloads with Ibis, KOALA, and GrenchMark, CoreGRID IW, Nov 2005.
April 18, 202320
University of Dortmund
GrenchMark: Iterative Research Roadmap
Open-GrenchMark
CommunityEffortThis work
Complex extensible systemA.Iosup, D.H.J.Epema, GrenchMark: A Framework for Analyzing,
Testing, and Comparing Grids, IEEE CCGrid'06, May 2006.
April 18, 202321
University of Dortmund
• Performance Evaluation of Grid Systems - need a common performance evaluation framework for grids - need real grid traces (scheduling, accounting, monitoring, etc.) - need more research on workload modeling and performance indicators
• Performance indicators - failure metrics as important as traditional performance metrics
• Workload modeling - generic workload modeling needs validation based on real grid traces - computation/data/network management - locality/origin management - failure modeling - economic models
• GrenchMark - generic tool for the whole community - generates diverse grid workloads - easy-to-use, flexible, portable, extensible, …
Take home message