landing in the right nest: new negotiation features for enterprise environments
DESCRIPTION
Landing in the Right Nest: New Negotiation Features for Enterprise Environments. Jason Stowe. New Features for Negotiation. Experience in Enterprise Environments. What is an Enterprise Environment?. Any Organization Using Condor with. Demanding Users. Demanding Users. - PowerPoint PPT PresentationTRANSCRIPT
Landing in the Right Nest:New Negotiation Features for Enterprise Environments
Jason Stowe
New Features for Negotiation
Experience in Enterprise Environments
What is an Enterprise Environment?
Any Organization Using Condor with
Demanding Users
Demanding Users
Organization = Groups of Demanding Users
Purchased Computer Capacity
Guaranteed
Minimum Capacity
Need As Many as Possible
As Soon as they submit
Vanilla/Java Universe
Avoid Preemption
How do we ensure Resources land in the right Group’s Nest?
A valid definition ofEnterprise Condor Users?
I started off as a Demanding User
Follow up to earlier work
Condor Week 2005
Condor for Movies:75+ Million Jobs
1000+ CPUs (Linux/OSX)70+ TB storage
(Project that added AccountingGroups)
Condor Week 2006
Web-based Management Tools, Consulting, and 24/7 Support
A Conversation with Miron
Bob Nordlund’s idea for Condor += Hooks
Configuration with Pipes
CONDOR_CONFIG = cat /opt/condor/condor_config |
(Condor 6.8)
Demanding Condor Uses for Banks/Insurance Companies => This year, new features
Negotiation Policies to ManageNumber of Resources
For Groups and Users
What are the Requirements?
-Guaranteed Minimum Quota-Fast Claiming of Quota-Avoid Unnecessary Preemption
Three Common Ways
“Fair share” User PriorityPREEMPTION_REQUIREMENTS
Machine RANK
AccountingGroups GROUP_QUOTA
Generally these are a progression
Story of a Pool
100 Machines
A = 100
Fair-Share, User Priority
It Works! More Users…
100 Machines
A = 50 B = 50
condor_userprio –setfactor A 2 condor_userprio –setfactor B 2
PREEMPTION_REQUIREMENTS = RemoteUserPrio > SubmittorPrio
Works Well in Most cases
Suppose A has all 100 machines, and B submits 100 jobs
User Priorities Cached at Beginning of Negotiation
And not updated…
PREEMPTION_REQUIREMENTS = RemoteUserPrio > SubmittorPrio
Standard Universe = No Problem (Preemption doesn’t lose work)
Problem: Vanilla or Java Universe (Work is lost!)
Dampen these with NEGOTIATOR_MAX_TIME_PER_SUBMITTER
NEGOTIATOR_MAX_TIME_PER_PIESPIN
Slows matching rate,can lead to starvation
Time For RANK
RANK = Owner =?= “A” on 50 Machines RANK = Owner =?= “B” on 50 Machines
Users get their “quota”
Tied to particular machines
50 Machines
A = 50 B = 50
50 Machines
Problem: Group A submits 100 jobs on Empty Pool
A = 50 B = 50
A A
50 jobs Finish
A = 50 B = 50
A A
Empty Empty
Group B submits 100 jobs,Empty Machines get jobs
A Jobs on B Machines are preempted
A = 50 B = 50
A
B
B
B Jobs on A Machines are preempted.
A = 50 B = 50
A B
Skip Preemption, Use Empty Machines?
A = 50 B = 50
A A
Empty Empty
A = 50 B = 50
A A
B B
Accounting Groups, GROUP_QUOTA
#New Machines = 200GROUP_QUOTA_A = 50GROUP_QUOTA_B = 50 GROUP_QUOTA_C = 50GROUP_QUOTA_D = 50GROUP_AUTOREGROUP = True
200 Machines
A = 50 B = 50C = 50 D = 50
A, B Have 100 machines each, how does C get resources?
PREEMPTION_REQUIREMENTS Still has cache/preemption issues
We Need access to Up to Date Usage/Quota information
PREEMPTION_REQUIREMENTS
A Conversation with Todd
SubmitterUserPrio SubmitterUserResourcesInUse
(RemoteUser as well)
SubmitterGroupQuotaSubmitterGroupResourcesInUse
(RemoteGroup as well)
With Great Power Comes Great Responsibility
IMPORTANT: Turn-off Caching (may slow down)PREEMPTION_REQUIREMENTS_STABLE= False
PREEMPTION_RANK_STABLE = False
PREEMPTION_REQUIREMENTS = (SubmitterGroupResourcesInUse < SubmitterGroupQuota) && (RemoteGroupResourcesInUse > RemoteGroupQuota)
PREEMPTION_REQUIREMENTS_STABLE= False
RANK = 0
Now we have everything needed!
Demanding Groups of Users
Getting Purchased Compute Capacity (Quota, not tied to machine)
Getting Guaranteed
Minimum Capacity(GROUP_QUOTA)
Getting As Many as Possible
(Auto-Regroup)
Getting As Soon as they submit
(One Negotiation Cycle typically)
Avoids Preemption
A = 50 B = 50
A A
Empty Empty
A = 50 B = 50
A A
B B
condor_status?
It Works! (patched 6.8 and 6.9+)Code & Condor Community Process
Where do we go from here?What did we learn?
Wisconsin is Working on 6.9 Negotiation/Scheduling more Efficient
In the FutureAllow us to Specify what we Account
For per VM/Slot (KFLOPS) ?
That’s just me…
Come to tonight’s ReceptionParticipate in the Community
Talk with Condor Team.Talk with other users.
Help the community continue to work well for everyone.
Thank you. Questions?
http://www.cyclecomputing.comjstowe @ cyclecomputing.com