building a secure condor ® pool in an open academic environment bruce beckles university of...

14
Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Upload: mia-lyon

Post on 28-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Building a secure Condor® pool in an open academic

environment

Bruce BecklesUniversity of Cambridge Computing

Service

Page 2: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Condor pool characteristics

• Large number (~1000) of similar/ identical workstations

• Workstations centrally managed• Primary purpose of workstations not for running Condor jobs

• Workstations are “public access” machines, i.e. available to all members of institution

Page 3: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Fundamental requirements

• Condor service in this environment must be: Stable:• Must not make machines any less stable

Low impact:• Must be unnoticeable to ordinary users

Secure:• Must not significantly increase the attack

surface

Page 4: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Stability• Only use the current Condor stable

series, not the development series• Extensive testing (months, 1000s of

test jobs) on small pool of workstations

• Disable any features of Condor not required by users

• Support only limited subset of Condor functionality (only Vanilla and Java universes)

Page 5: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Low impact• Gather usage statistics of target

workstations and only allow Condor to run at periods when they would normally be idle

• Will not run jobs if a user is logged in Custom ClassAd attribute with number of users

logged in

• Any user activity aggressively preempts Condor job Issue under standard Linux 2.6 kernels: USB

mouse and keyboard activity not detected

• Control Condor job’s environment and sterilise environment after job completion Handles jobs using up all available disk space

and not cleaning up after themselves, etc

Page 6: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Security

• What is our threat landscape? What are we worried about?

• How does this specifically relate to Condor? Specific security concerns… …and how we addressed them

Page 7: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Threat landscape• Threats internal to the environment are at

least as significant as external threats: Largest body of users (students) are untrusted

No clear separation of use of machines by trusted and untrusted users

• Access (often wholly or largely unrestricted) to the public Internet is a core requirement: Both for normal use of the machines and for

Condor jobs Firewalls are of little help

Page 8: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Specific security concerns (1)

• Reliable identification of machines: IP addresses useless as identifiers (IP

“spoofing”) So “strong” authentication required:

• Do not significantly increase the attack surface of machines: No daemons running as root that listen to the

network:• Privilege separation (see following talk)

• Control access to the Condor pool: Easiest at point of job submission Restricted number of centralised submit nodes

Page 9: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Specific security concerns (2)

• Controlling the job execution environment: Inspect job prior to running on machine Start job in a sterile environment Sterilise environment after job has run Job run under dedicated unprivileged user account

• Restrict access to the Condor commands: Ideally develop separate front-end to Condor

system Currently just wrapper scripts for Condor

commands Can be circumvented (in some cases), so piloting

service with relatively trusted users

Page 10: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Strong authentication• Currently only available under UNIX/Linux• Kerberos or GSI• GSI:

Flawed security paradigm (mandates daemons run as root, etc)

Serious usability and scalability issues• Kerberos:

KDCs provide separate audit trail Plan to use Kerberos elsewhere in the University Support for Kerberos under Windows and MacOS X is

being added to Condor; support for GSI is not (functional GSI libraries not available)

Bug in Kerberos support in the stable series of Condor:• Backported patch from development series to fix

• Kerberos has proved surprisingly easy to deploy and administer in our setup

Page 11: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Scalability / Performance• condor_schedd (job queue management) doesn’t

scale well: “Monolithic” process: performs too many different tasks Uses blocking connections in stable series In our experience:

• Performs very badly above 4,000 jobs• Falls over above 10,000 jobs• Cannot handle significant numbers of short-running (less

than 5 minute) jobs• Job overhead is such that jobs need to be about 10 minutes

long to be worth running under Condor

• Not much we can do about this: Add more submit nodes as demand on our service rises Educate our users to use service sensibly (e.g. “batch up”

short running jobs) Wrap / replace Condor commands to encourage sensible

behaviour / mitigate some of these problems Lobby Condor Team to re-design the condor_schedd

daemon

Page 12: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Partitioning the pool• Require ability to only allow jobs from certain

users to run on certain machines: No sensible way provided to do this Restriction via lists of users or machines in

configuration files / ClassAd attributes is unwieldy and doesn’t scale

• Our method: Machines configured to only accept jobs with

particular ClassAd attribute Set automatically by our wrapper scripts based on

user’s identity On execute nodes cross check user against

independently maintained and distributed (via LDAP) ACL – this prevents users falsifying the ClassAd attributes

Page 13: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Architectural overview• Large number of centrally managed

“public access” workstations running Linux

• Jobs only run when no users are logged in• Centralised submit node(s)• Wrappers around Condor commands• Restricted (but still useful) subset of

Condor’s functionality• Machine identity strongly authenticated• Improved Condor security model:

Privilege separation on execute nodes Strict control of job environment

Page 14: Building a secure Condor ® pool in an open academic environment Bruce Beckles University of Cambridge Computing Service

Conclusion• Although Condor not designed for a “hostile”

environment, it can be used relatively securely in such environments (some caveats naturally)…

• …under Linux…• …but a lot of development work is required to

achieve this…• …and it requires the supporting infrastructure of

a stable, centrally managed workstation service.• Improvements to Condor would make this

significantly easier: Design for a hostile environment. These days,

most environments are.