investing the effects of overcommitting yarn resources
TRANSCRIPT
Investigating the Effects of Overcommitting YARN Resources
Jason Lowe
Problem: Underutilized Cluster Resources
Optimize The Jobs!
● Internal Downsizer tool quantifies job waste● Application framework limitations● Optimally tuned container can still have opportunities
Time
Con
tain
er U
tiliz
atio
n
UnderutilizedResources
What about Static Overcommit?
● Configure YARN to use more memory than node provides● Tried with some success● Performs very poorly when node fully utilized
Overcommit Prototype Design Goals
● No changes to applications● Minimize changes to YARN protocols● Minimize changes to scheduler internals● Overcommit on memory only● Conservative growth● Rapid correction
Overcommit Overview
ResourceManager NodeManagerUtilization report in heartbeat
■ Unaware of overcommit amount■ Self-preservation preemption
■ Adjusts internal node size■ Assigns containers based on new size
Container assignments
ApplicationMasters
Container launches
Nod
e M
emor
yN
ode
Util
izat
ion
ResourceManager Node Scaling
Time
Time
No Overcommit
Reduced Overcommit
Full Overcommit
Allocated Node Mem
Total Node Mem
Original Node Mem
ResourceManager Overcommit Tunables
Parameter Description Value
memory.max-factor Maximum amount a node will be overcommitted 1.5
memory.low-water-mark Maximum overcommit below this node utilization
0.6
memory.high-water-mark No overcommit above this node utilization 0.8
memory.increment-mb Maximum increment above node allocation 16384
increment-period-ms Delay between overcommit increments if node container state does not change
0
Parameters use yarn.resourcemanager.scheduler.overcommit. prefix
NodeManager Self-Preservation Preemption
Node Utilization
High Water Mark
Low Water Mark
● Utilization above high mark triggers preemption● Preempts enough to reach low mark utilization● Does not preempt containers below original node size● Containers preempted in group order
○ Tasks from preemptable queue○ ApplicationMasters from preemptable queue○ Tasks from non-preemptable queue○ ApplicationMasters from non-preemptable queue
● Youngest containers preempted first within a group
0%
100%
NodeManager Overcommit Tunables
Parameter Description Value
memory.high-water-mark Preemption when above this utilization 0.95
memory.low-water-mark Target utilization after preemption 0.92
Parameters use yarn.nodemanager.resource-monitor.overcommit. prefix
Results
Results - Capacity_Gained vs Work_Lost
Lessons Learned
● Significant overcommit achievable on real workloads● Far less preemption than expected● Container reservations can drive overcommit growth● Coordinated reducers can be a problem● Cluster totals over time can be a bit confusing at first
Future Work
● YARN-5202● Only grows cluster as a whole not individual queues● Nodes can overcommit while others are relatively idle● CPU overcommit● Predict growth based on past behavior● Relinquish nodes during quiet periods● Integration with YARN-1011
YARN-1011
● Explicit GUARANTEED vs. OPPORTUNISTIC distinction● Promotion of containers once resources are available● SLA guarantees along with best-effort load
Acknowledgements
● Nathan Roberts for co-developing overcommit POC● Inigo Goiri for nodemanager utilization collection and reporting● Giovanni Matteo Fumarola for nodemanager AM container detection● YARN-1011 contributors for helping to shape the long-term solution