1 hidra: history based dynamic resource allocation for server clusters jayanth gummaraju 1 and...

11

Hidra: History Based Dynamic Resource

Allocation For Server Clusters

Jayanth GummarajuJayanth Gummaraju11 and Yoshio and Yoshio TurnerTurner22

11 Stanford University, CA, USA Stanford University, CA, USA

22Hewlett-Packard Labs., Palo Alto, CA, Hewlett-Packard Labs., Palo Alto, CA, USAUSA

ITA05, Wrexham, UK September 2005

22

Why Dynamic Resource Why Dynamic Resource AllocationAllocation

High demand variation for an Internet serviceHigh demand variation for an Internet service Daily: peak load ~10 times average load during Daily: peak load ~10 times average load during

dayday Variation over longer time scales (days, weeks)Variation over longer time scales (days, weeks)

Benefits of Dynamic Resource AllocationBenefits of Dynamic Resource Allocation Reduce operating costs for a serviceReduce operating costs for a service

EnergyEnergy Software license feesSoftware license fees

Support more services on a shared infrastructureSupport more services on a shared infrastructure Shift resources between services on-demandShift resources between services on-demand

Practical: fast server re-purposingPractical: fast server re-purposing Blade server managementBlade server management Networked storageNetworked storage Virtual machine cloning/migrationVirtual machine cloning/migration

33

ProblemProblem

Determine resource requirements for Determine resource requirements for a service on-the-flya service on-the-fly

Challenges:Challenges: Frequent service updatesFrequent service updates Frequent changes in client interest setFrequent changes in client interest set

Static Static a prioria priori capacity planning won’t capacity planning won’t workwork

44

Approach: HidraApproach: HidraHidra: History-based Dynamic Resource Hidra: History-based Dynamic Resource

AllocationAllocation

““Black-box approach”: continuously build and Black-box approach”: continuously build and update a model of system behavior from update a model of system behavior from externally visible performance attributes, externally visible performance attributes, without knowledge of internal operation (e.g., without knowledge of internal operation (e.g., what is the bottleneck resource)what is the bottleneck resource) Model updates: introduce Model updates: introduce freshnessfreshness and and confidenceconfidence Extrapolation: determine resource requirements Extrapolation: determine resource requirements

with only a partial modelwith only a partial model

55

ScopeScope Large services requiring multiple serversLarge services requiring multiple servers

Multi-tier: each tier = a cluster of servers. Assumptions:Multi-tier: each tier = a cluster of servers. Assumptions: Identical servers within a tierIdentical servers within a tier Servers in different tiers can be differentServers in different tiers can be different

Allocation granularity = Server (ex: blade in a blade server)Allocation granularity = Server (ex: blade in a blade server)

Predictable client request ratePredictable client request rate Reasonable if smoothly varying, or occasional Reasonable if smoothly varying, or occasional

discontinuitiesdiscontinuities

Service and server behavior can change over timeService and server behavior can change over time

Goal: Find minimum cost resource allocation that Goal: Find minimum cost resource allocation that meets server response time requirementmeets server response time requirement Cost = sum of cost of servers allocated to each tierCost = sum of cost of servers allocated to each tier Mean response time (may be generalized)Mean response time (may be generalized)

66

OutlineOutline

Single-tier history-based resource Single-tier history-based resource allocationallocation Constructing and updating history-based model Constructing and updating history-based model

(freshness and confidence) (freshness and confidence) Using the model to determine resource Using the model to determine resource

allocation (extrapolation)allocation (extrapolation) Multi-tier history-based resource allocationMulti-tier history-based resource allocation SummarySummary

77

Single-Tier History-Based Single-Tier History-Based ModelModel

Model represents the Model represents the averageaverage behavior of a behavior of a serverserver in a tierin a tier

Consists of a collection of Consists of a collection of measured operating points measured operating points (history) for the tier(history) for the tier Each history point: at least Each history point: at least

(request rate per server, (request rate per server, mean response mean response time)time)

Model provides an estimate Model provides an estimate of function F ():of function F ():response time = F (request rate)response time = F (request rate)(increasing function in (increasing function in

range of interest)range of interest)

(per-server request rate)

88

Using the History-Based Using the History-Based ModelModel

Goal: find the fewest Goal: find the fewest servers needed to meet servers needed to meet a requirement for a requirement for maximum mean maximum mean response time response time Extrapolate model to find Extrapolate model to find

the largest feasible the largest feasible average request rate per average request rate per serverserver

Given R = tier’s applied Given R = tier’s applied load (requests per second)load (requests per second)

Resource allocation = Resource allocation = N = R/N = R/serversservers

Response time threshold

(per-server request rate)

99

Updating the ModelUpdating the Model Response time function can change over time:Response time function can change over time:

Service content or implementationService content or implementation Client interest setClient interest set Number of allocated servers (request distribution, Number of allocated servers (request distribution,

and non-linear performance scaling)and non-linear performance scaling)

Nevertheless, history-based model is usefulNevertheless, history-based model is useful Gradual changes Gradual changes recent history is a good recent history is a good

approximationapproximation Occasional large changes Occasional large changes recent history is relevant recent history is relevant

except in immediate moments after a large changeexcept in immediate moments after a large change

Periodically update model based on current Periodically update model based on current performance measurementsperformance measurements Balance responsiveness and accuracy: Incorporate Balance responsiveness and accuracy: Incorporate

new measurements quickly to model current new measurements quickly to model current behavior, but not so aggressively that transient behavior, but not so aggressively that transient glitches pollute the modelglitches pollute the model

1010

History Update: History Update: Freshness and Confidence Freshness and Confidence

History point update as weighted average of stored History point update as weighted average of stored value and new measurementvalue and new measurementNew stored value = New stored value = * old stored value + * old stored value +

(1 – (1 – ) * new measurement) * new measurement Older history is less likely to represent current behaviorOlder history is less likely to represent current behavior Recent history can be obsolete after a sudden shift in Recent history can be obsolete after a sudden shift in

behaviorbehavior

Weighting factor Weighting factor combines: combines: FreshnessFreshness: value which decreases with time since last : value which decreases with time since last

updateupdate ConfidenceConfidence: value which increases with repeated : value which increases with repeated

confirmation of consistent behavior for the history pointconfirmation of consistent behavior for the history point CombinationCombinationEWMA (captures freshness) with decay rate EWMA (captures freshness) with decay rate

that slows with increasing confidencethat slows with increasing confidence

1111

Extrapolation: Determining Extrapolation: Determining Resource AllocationResource Allocation

Model has incomplete view Model has incomplete view of response time functionof response time function

To find optimal To find optimal , Hidra , Hidra extrapolates/interpolates extrapolates/interpolates unique pair of history unique pair of history pointspoints Only use points that match Only use points that match

general shape of typical general shape of typical response time curve response time curve (positive slope)(positive slope)

Favor points with high Favor points with high value (ignore if value (ignore if is very is very small)small)

If only one point exists If only one point exists (current operating point), (current operating point), adjust allocation differentlyadjust allocation differently

Limits on consecutive changes in resource allocation Limits on consecutive changes in resource allocation (fixed limit for decreases, growing limits for (fixed limit for decreases, growing limits for increases)increases)

Threshold

Applied LoadR

esp

onse

Tim

e

1

23

4

56

78 9

X Y Z

1212

Single-Tier Evaluation: Single-Tier Evaluation: OverviewOverview

Approach: Apply Hidra to allocate resources Approach: Apply Hidra to allocate resources for a simulated clusterfor a simulated cluster Simulation allows easy control of cluster Simulation allows easy control of cluster

behavior and determination of optimal allocationbehavior and determination of optimal allocation Each server modeled as simple M/M/1 queue Each server modeled as simple M/M/1 queue

with time-varying arrival rate with time-varying arrival rate and service rate and service rate

Provides response time function that varies over timeProvides response time function that varies over time More complex models not needed for our purposesMore complex models not needed for our purposes

Effectiveness of freshness and confidenceEffectiveness of freshness and confidence Effectiveness for clusters with non-linear Effectiveness for clusters with non-linear

cluster performance scalingcluster performance scaling

1313

Effectiveness of FreshnessEffectiveness of Freshness Increase Increase steadily over time from 40 to 70 req/ssteadily over time from 40 to 70 req/s

No freshness (red) uses obsolete informationNo freshness (red) uses obsolete information Freshness (green) close to optimal (blue) Freshness (green) close to optimal (blue)

allocationallocation

1414

Effectiveness of ConfidenceEffectiveness of Confidence Set Set constant over time except for periodic transientsconstant over time except for periodic transients

Freshness only, no Confidence Freshness and Confidence

Using Confidence, Hidra less susceptible to Using Confidence, Hidra less susceptible to short-term transients by preserving more short-term transients by preserving more commonly observed valuescommonly observed values

1515

Non-Linear Cluster ScalingNon-Linear Cluster Scaling Response time function may be sensitive to the Response time function may be sensitive to the

resource allocation. Examples:resource allocation. Examples: Caching effectCaching effect: Memory in each additional server : Memory in each additional server

adds to total effective content cache capacity if adds to total effective content cache capacity if shared effectively shared effectively throughput scales faster than N throughput scales faster than N

Communication effectCommunication effect: Overhead of coordination : Overhead of coordination between servers between servers throughput scales slower than N throughput scales slower than N

Evaluate using request rates from hp.com logs Evaluate using request rates from hp.com logs for a 24-hour periodfor a 24-hour period Caching: assume hit ratio increases linearly with N, Caching: assume hit ratio increases linearly with N,

causing increase of service rate causing increase of service rate Communication: increase service time (1/Communication: increase service time (1/) linearly ) linearly

with Nwith N

1616

Caching Effect ResultsCaching Effect Results

Service Rate Resource Allocation Response Time

Wide variation in the average behavior of a serverWide variation in the average behavior of a server Each server is more effective as allocation is increasedEach server is more effective as allocation is increased

Hidra adapts, achieving close to optimal allocationHidra adapts, achieving close to optimal allocation

1717

Communication Effect Communication Effect ResultsResults

Service Rate Resource Allocation Response Time

Opposite service rate behavior compared to cachingOpposite service rate behavior compared to caching Each server is less effective as allocation is increasedEach server is less effective as allocation is increased

Hidra handles this case alsoHidra handles this case also

1818

Multi-Tier Resource Multi-Tier Resource AllocationAllocation

Multi-Tier characteristicsMulti-Tier characteristics A request to first tier could trigger multiple secondary A request to first tier could trigger multiple secondary

requests to other tiersrequests to other tiers Average response time is sum of average response times of Average response time is sum of average response times of

each tiereach tier Cost of resource could be different for different tiersCost of resource could be different for different tiers

Multi-Tier resource allocation as an extension Multi-Tier resource allocation as an extension of the single-tier caseof the single-tier case Response time for each tier computed using single-tier Response time for each tier computed using single-tier

algorithmalgorithm Dynamically vary target response times for each tier to Dynamically vary target response times for each tier to

minimize total cost resource allocationminimize total cost resource allocation Same client request rate used for all tiersSame client request rate used for all tiers

1919

Two-Tier ResultsTwo-Tier Results

Caching (both tiers) Communication (both tiers) Caching (Tier1)Caching (both tiers) Communication (both tiers) Caching (Tier1) Communication (Tier 2)Communication (Tier 2)

Total cost of allocated serversTotal cost of allocated servers

Same effect in both tiers Same effect in both tiers results similar to results similar to single-tier case are optimalsingle-tier case are optimal

Different effects in each tier Different effects in each tier optimal allocation optimal allocation has cost intermediate between the two extremeshas cost intermediate between the two extremes

Hidra adapts successfully to all these casesHidra adapts successfully to all these cases

2020

SummarySummary Presented Hidra for history-based Presented Hidra for history-based

resource allocation of server clustersresource allocation of server clusters Proposed use of freshness and confidence Proposed use of freshness and confidence

to update history-based model effectivelyto update history-based model effectively Developed extrapolation approach for Developed extrapolation approach for

finding operating point with incomplete finding operating point with incomplete modelmodel

Extended the model to multi-tier systemsExtended the model to multi-tier systems Simulation-based results show scheme is Simulation-based results show scheme is

promising for both single-tier and multi-promising for both single-tier and multi-tier systemstier systems

1 hidra: history based dynamic resource allocation for server clusters jayanth gummaraju 1 and...

Documents