1 hidra: history based dynamic resource allocation for server clusters jayanth gummaraju 1 and...
TRANSCRIPT
11
Hidra: History Based Dynamic Resource
Allocation For Server Clusters
Jayanth GummarajuJayanth Gummaraju11 and Yoshio and Yoshio TurnerTurner22
11 Stanford University, CA, USA Stanford University, CA, USA
22Hewlett-Packard Labs., Palo Alto, CA, Hewlett-Packard Labs., Palo Alto, CA, USAUSA
ITA05, Wrexham, UK September 2005
22
Why Dynamic Resource Why Dynamic Resource AllocationAllocation
High demand variation for an Internet serviceHigh demand variation for an Internet service Daily: peak load ~10 times average load during Daily: peak load ~10 times average load during
dayday Variation over longer time scales (days, weeks)Variation over longer time scales (days, weeks)
Benefits of Dynamic Resource AllocationBenefits of Dynamic Resource Allocation Reduce operating costs for a serviceReduce operating costs for a service
EnergyEnergy Software license feesSoftware license fees
Support more services on a shared infrastructureSupport more services on a shared infrastructure Shift resources between services on-demandShift resources between services on-demand
Practical: fast server re-purposingPractical: fast server re-purposing Blade server managementBlade server management Networked storageNetworked storage Virtual machine cloning/migrationVirtual machine cloning/migration
33
ProblemProblem
Determine resource requirements for Determine resource requirements for a service on-the-flya service on-the-fly
Challenges:Challenges: Frequent service updatesFrequent service updates Frequent changes in client interest setFrequent changes in client interest set
Static Static a prioria priori capacity planning won’t capacity planning won’t workwork
44
Approach: HidraApproach: HidraHidra: History-based Dynamic Resource Hidra: History-based Dynamic Resource
AllocationAllocation
““Black-box approach”: continuously build and Black-box approach”: continuously build and update a model of system behavior from update a model of system behavior from externally visible performance attributes, externally visible performance attributes, without knowledge of internal operation (e.g., without knowledge of internal operation (e.g., what is the bottleneck resource)what is the bottleneck resource) Model updates: introduce Model updates: introduce freshnessfreshness and and confidenceconfidence Extrapolation: determine resource requirements Extrapolation: determine resource requirements
with only a partial modelwith only a partial model
55
ScopeScope Large services requiring multiple serversLarge services requiring multiple servers
Multi-tier: each tier = a cluster of servers. Assumptions:Multi-tier: each tier = a cluster of servers. Assumptions: Identical servers within a tierIdentical servers within a tier Servers in different tiers can be differentServers in different tiers can be different
Allocation granularity = Server (ex: blade in a blade server)Allocation granularity = Server (ex: blade in a blade server)
Predictable client request ratePredictable client request rate Reasonable if smoothly varying, or occasional Reasonable if smoothly varying, or occasional
discontinuitiesdiscontinuities
Service and server behavior can change over timeService and server behavior can change over time
Goal: Find minimum cost resource allocation that Goal: Find minimum cost resource allocation that meets server response time requirementmeets server response time requirement Cost = sum of cost of servers allocated to each tierCost = sum of cost of servers allocated to each tier Mean response time (may be generalized)Mean response time (may be generalized)
66
OutlineOutline
Single-tier history-based resource Single-tier history-based resource allocationallocation Constructing and updating history-based model Constructing and updating history-based model
(freshness and confidence) (freshness and confidence) Using the model to determine resource Using the model to determine resource
allocation (extrapolation)allocation (extrapolation) Multi-tier history-based resource allocationMulti-tier history-based resource allocation SummarySummary
77
Single-Tier History-Based Single-Tier History-Based ModelModel
Model represents the Model represents the averageaverage behavior of a behavior of a serverserver in a tierin a tier
Consists of a collection of Consists of a collection of measured operating points measured operating points (history) for the tier(history) for the tier Each history point: at least Each history point: at least
(request rate per server, (request rate per server, mean response mean response time)time)
Model provides an estimate Model provides an estimate of function F ():of function F ():response time = F (request rate)response time = F (request rate)(increasing function in (increasing function in
range of interest)range of interest)
(per-server request rate)
88
Using the History-Based Using the History-Based ModelModel
Goal: find the fewest Goal: find the fewest servers needed to meet servers needed to meet a requirement for a requirement for maximum mean maximum mean response time response time Extrapolate model to find Extrapolate model to find
the largest feasible the largest feasible average request rate per average request rate per serverserver
Given R = tier’s applied Given R = tier’s applied load (requests per second)load (requests per second)
Resource allocation = Resource allocation = N = R/N = R/serversservers
Response time threshold
(per-server request rate)
99
Updating the ModelUpdating the Model Response time function can change over time:Response time function can change over time:
Service content or implementationService content or implementation Client interest setClient interest set Number of allocated servers (request distribution, Number of allocated servers (request distribution,
and non-linear performance scaling)and non-linear performance scaling)
Nevertheless, history-based model is usefulNevertheless, history-based model is useful Gradual changes Gradual changes recent history is a good recent history is a good
approximationapproximation Occasional large changes Occasional large changes recent history is relevant recent history is relevant
except in immediate moments after a large changeexcept in immediate moments after a large change
Periodically update model based on current Periodically update model based on current performance measurementsperformance measurements Balance responsiveness and accuracy: Incorporate Balance responsiveness and accuracy: Incorporate
new measurements quickly to model current new measurements quickly to model current behavior, but not so aggressively that transient behavior, but not so aggressively that transient glitches pollute the modelglitches pollute the model
1010
History Update: History Update: Freshness and Confidence Freshness and Confidence
History point update as weighted average of stored History point update as weighted average of stored value and new measurementvalue and new measurementNew stored value = New stored value = * old stored value + * old stored value +
(1 – (1 – ) * new measurement) * new measurement Older history is less likely to represent current behaviorOlder history is less likely to represent current behavior Recent history can be obsolete after a sudden shift in Recent history can be obsolete after a sudden shift in
behaviorbehavior
Weighting factor Weighting factor combines: combines: FreshnessFreshness: value which decreases with time since last : value which decreases with time since last
updateupdate ConfidenceConfidence: value which increases with repeated : value which increases with repeated
confirmation of consistent behavior for the history pointconfirmation of consistent behavior for the history point CombinationCombinationEWMA (captures freshness) with decay rate EWMA (captures freshness) with decay rate
that slows with increasing confidencethat slows with increasing confidence
1111
Extrapolation: Determining Extrapolation: Determining Resource AllocationResource Allocation
Model has incomplete view Model has incomplete view of response time functionof response time function
To find optimal To find optimal , Hidra , Hidra extrapolates/interpolates extrapolates/interpolates unique pair of history unique pair of history pointspoints Only use points that match Only use points that match
general shape of typical general shape of typical response time curve response time curve (positive slope)(positive slope)
Favor points with high Favor points with high value (ignore if value (ignore if is very is very small)small)
If only one point exists If only one point exists (current operating point), (current operating point), adjust allocation differentlyadjust allocation differently
Limits on consecutive changes in resource allocation Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for (fixed limit for decreases, growing limits for increases)increases)
Threshold
Applied LoadR
esp
onse
Tim
e
1
23
4
56
78 9
X Y Z
1212
Single-Tier Evaluation: Single-Tier Evaluation: OverviewOverview
Approach: Apply Hidra to allocate resources Approach: Apply Hidra to allocate resources for a simulated clusterfor a simulated cluster Simulation allows easy control of cluster Simulation allows easy control of cluster
behavior and determination of optimal allocationbehavior and determination of optimal allocation Each server modeled as simple M/M/1 queue Each server modeled as simple M/M/1 queue
with time-varying arrival rate with time-varying arrival rate and service rate and service rate
Provides response time function that varies over timeProvides response time function that varies over time More complex models not needed for our purposesMore complex models not needed for our purposes
Effectiveness of freshness and confidenceEffectiveness of freshness and confidence Effectiveness for clusters with non-linear Effectiveness for clusters with non-linear
cluster performance scalingcluster performance scaling
1313
Effectiveness of FreshnessEffectiveness of Freshness Increase Increase steadily over time from 40 to 70 req/ssteadily over time from 40 to 70 req/s
No freshness (red) uses obsolete informationNo freshness (red) uses obsolete information Freshness (green) close to optimal (blue) Freshness (green) close to optimal (blue)
allocationallocation
1414
Effectiveness of ConfidenceEffectiveness of Confidence Set Set constant over time except for periodic transientsconstant over time except for periodic transients
Freshness only, no Confidence Freshness and Confidence
Using Confidence, Hidra less susceptible to Using Confidence, Hidra less susceptible to short-term transients by preserving more short-term transients by preserving more commonly observed valuescommonly observed values
1515
Non-Linear Cluster ScalingNon-Linear Cluster Scaling Response time function may be sensitive to the Response time function may be sensitive to the
resource allocation. Examples:resource allocation. Examples: Caching effectCaching effect: Memory in each additional server : Memory in each additional server
adds to total effective content cache capacity if adds to total effective content cache capacity if shared effectively shared effectively throughput scales faster than N throughput scales faster than N
Communication effectCommunication effect: Overhead of coordination : Overhead of coordination between servers between servers throughput scales slower than N throughput scales slower than N
Evaluate using request rates from hp.com logs Evaluate using request rates from hp.com logs for a 24-hour periodfor a 24-hour period Caching: assume hit ratio increases linearly with N, Caching: assume hit ratio increases linearly with N,
causing increase of service rate causing increase of service rate Communication: increase service time (1/Communication: increase service time (1/) linearly ) linearly
with Nwith N
1616
Caching Effect ResultsCaching Effect Results
Service Rate Resource Allocation Response Time
Wide variation in the average behavior of a serverWide variation in the average behavior of a server Each server is more effective as allocation is increasedEach server is more effective as allocation is increased
Hidra adapts, achieving close to optimal allocationHidra adapts, achieving close to optimal allocation
1717
Communication Effect Communication Effect ResultsResults
Service Rate Resource Allocation Response Time
Opposite service rate behavior compared to cachingOpposite service rate behavior compared to caching Each server is less effective as allocation is increasedEach server is less effective as allocation is increased
Hidra handles this case alsoHidra handles this case also
1818
Multi-Tier Resource Multi-Tier Resource AllocationAllocation
Multi-Tier characteristicsMulti-Tier characteristics A request to first tier could trigger multiple secondary A request to first tier could trigger multiple secondary
requests to other tiersrequests to other tiers Average response time is sum of average response times of Average response time is sum of average response times of
each tiereach tier Cost of resource could be different for different tiersCost of resource could be different for different tiers
Multi-Tier resource allocation as an extension Multi-Tier resource allocation as an extension of the single-tier caseof the single-tier case Response time for each tier computed using single-tier Response time for each tier computed using single-tier
algorithmalgorithm Dynamically vary target response times for each tier to Dynamically vary target response times for each tier to
minimize total cost resource allocationminimize total cost resource allocation Same client request rate used for all tiersSame client request rate used for all tiers
1919
Two-Tier ResultsTwo-Tier Results
Caching (both tiers) Communication (both tiers) Caching (Tier1)Caching (both tiers) Communication (both tiers) Caching (Tier1) Communication (Tier 2)Communication (Tier 2)
Total cost of allocated serversTotal cost of allocated servers
Same effect in both tiers Same effect in both tiers results similar to results similar to single-tier case are optimalsingle-tier case are optimal
Different effects in each tier Different effects in each tier optimal allocation optimal allocation has cost intermediate between the two extremeshas cost intermediate between the two extremes
Hidra adapts successfully to all these casesHidra adapts successfully to all these cases
2020
SummarySummary Presented Hidra for history-based Presented Hidra for history-based
resource allocation of server clustersresource allocation of server clusters Proposed use of freshness and confidence Proposed use of freshness and confidence
to update history-based model effectivelyto update history-based model effectively Developed extrapolation approach for Developed extrapolation approach for
finding operating point with incomplete finding operating point with incomplete modelmodel
Extended the model to multi-tier systemsExtended the model to multi-tier systems Simulation-based results show scheme is Simulation-based results show scheme is
promising for both single-tier and multi-promising for both single-tier and multi-tier systemstier systems