migratory workloads across clouds with nomad

20
MIGRATORY WORKLOADS ACROSS CLOUDS WITH NOMAD Phil Watts DevOps Artificer

Upload: philip-watts

Post on 21-Jan-2017

51 views

Category:

Technology


0 download

TRANSCRIPT

MIGRATORY WORKLOADS ACROSS CLOUDS WITH NOMAD

Phil Watts DevOps Artificer

PROBLEM STATEMENT

“FLEXING BETWEEN THE CLOUDS”

▸ Goals of Virtualization seem universally applicable

▸ !(Vendor Lock-in)

▸ Not all workloads are valued equally

=>=>

IT Magic Anywhere

SUCCESS CRITERIA

WIN CONDITIONS

‣ Availability of compute resources are independent of the cloud provider

‣ Batch jobs can be allocated based on point in time cost metrics

‣ Work segregation based on compliance qualifications

TOOLCHAIN

MY CURRENT “FAVORITE” TOYSResources

Image Creation

Infrastructure Provisioning

Service Discovery

Scheduler

Driver

DEFINITIONS: RESOURCE CONTEXT

THE BANE OF TECHNICAL UNDERSTANDING (AKA WORDS):

▸ Region: The isolation boundary of a Nomad Cluster

▸ Datacenter: Low latency, high bandwidth, private network

▸ Resources: The available capacity provided by a node

Region Datacenter

AWS Continental AWS_Region

GCE Continental GCE_Region

Azure Location Location

Region Datacenter

AWS Global AWS_Region

GCE Global GCE_Region

Azure Global Sets of Locations

Common / Comfortable Pattern Ideal Pattern

NOMAD ARCHITECTURE - SINGLE REGION VIEW

BDFL FOR WORKLOAD DECISIONS

‣ In Nomad, Datacenter can speak to Region Aware Servers

‣ Datacenters don’t need to be the same platform

‣ Default Region is “global”

ARCHITECTURE OF SOLUTION

▸ Nomad Clients potentially provide Resources for Jobs

▸ Communication between Datacenters may need secured

▸ Nodes run a Consul Agent and Nomad Client

▸ Nomad Servers “Bin Pack” task onto nodes

THREE PICTURES OF THE SAME THINGSingle Region / Multi DataCenter

(different Clouds)

DEFINITIONS: TASK CONTEXT

WORDS: THE SEQUEL▸ Task: Desired state declaration of workload

▸ Constraints: Rules limiting where a job can run

▸ Evaluations: Queued request to compare desired and present state of work over the region

▸ Caused by a state change event

▸ Job Completion

▸ Node Addiction/Subtraction

▸ Job Scheduled

▸ Allocations: Mapping of tasks to resources within constraints

JOB TYPES: SERVICE

KEEPING THE SITE UP

▸ Long running jobs that should always be available

▸ Scheduling decisions favor QoS

▸ Example: Ensuring a front end web service is always available

JOB TYPES: BATCH

WHAT TO DO WITH ALL THIS DATA?

▸ A set of work spanning a few minutes to a few days

▸ Based on the Berkley Sparrow Two Choices model

▸ http://people.eecs.berkeley.edu/~keo/publications/sosp13-final17.pdf

▸ Probes a set of nodes which meet constraints and sends work to the "least loaded" nodes

▸ Example: Tasks to manipulate a queue of data when present

JOB TYPES: SYSTEM

KEEPING THE LIGHTS ON

▸ A unique job type used to declare jobs which should run on every node which meets the job constraints

▸ Are re-evaluated whenever a node joins the cluster

▸ Example: distributing common tasks, which can benefit from rolling updates, job updates, service discovery

NOMAD SCHEDULING INTERNALS

GETTING FROM WORK AND RESOURCES TO ACCOMPLISHMENTS

▸ Evaluations read the Job Specification and find constraints

▸ Evaluation Brokers maintain the pending queue, priority, and at least once delivery

▸ Schedulers submit an Allocation Plan, evaluated for feasibility, followed by priority

▸ Allocations set jobs against resources

LIKE TETRIS FOR WORKLOADS

▸ Tasks require resources

▸ Nodes have “dimensions” of resources

▸ Allocation fits Tasks inside Nodes

BIN PACKING

TASK GROUPS

PREVENTING TASK SEPARATION ANXIETY

▸ Task Groups allow for multiple Jobs to require they are scheduled on the same node

▸ Are created implicitly for single tasks in isolation

▸ Can be used to enforce compliance elements required to run together

▸ Example: Requiring log shipping co-processes

CONSTRAINTS

JUST BECAUSE YOU CAN, DOESN’T MEAN YOU SHOULD▸ Job Constraints limit the resources available for a particular

job group

▸ Constraints can map workloads directly to Customized Hardware such as AWS Placement Groups

CONSTRAINTS AND COMPLIANCE

SATISFYING COMPLIANCE REQUIREMENTS

▸ Constraints on datacenter can be used for Data Isolation inside National Boundaries.

▸ Healthcare workload that must stay within the EU

▸ Metadata attributes can allow for custom declarations.

▸ Eg. PCI DSS Compliance:

▸ Maintain network firewall

▸ Protect run Anti-Malware/Anti-Virus

▸ Monitor and log access

▸ Regularly test security systems and procedures.

1 job "sample_service" { 2 ... 3 meta { 4 pci_dss = true 5 } 6 group "webservice" { 7 constraint { 8 attribute = "meta.pci_dss" 9 value = true 10 } 11 } 12 }

Constraint Snippet

CONSTRAINTS: SATISFYING SPECIAL NEEDS

DIFFERENT THINGS ARE DIFFERENT

▸ Not all platforms are created equal

▸ Platform attributes for specifying Cloud Platforms

1 job "sample_service" { 2 ... 3 constraint { 4 attribute = attr.platform 5 value = aws 6 } 7 }

▸ ${attr.platform} = aws May be relevant if you needFloat (GPU) processing, which AWS offers and GCE doesn’t

RAW EXECS

CHEKHOV’S TASK DRIVER

▸ Unconstrained, Un-isolated, Disabled by Default

“IT SEEMS TO BE A DEEP INSTINCT IN HUMAN BEINGS FOR MAKING EVERYTHING COMPULSORY THAT ISN'T FORBIDDEN”

▸ Runs as the user Nomad is running as

▸ Disabled by default

client { options = { driver.raw_exec.enable = 1 } }

~Robert A. Heinlein

OPERATOR INTERACTION

RELIABLE MAGIC = OPERATIONS

1 $ nomad run jobfile.nomad -address=$nomad_server

‣ Operators schedule jobs against a server

‣ Nomad figures out how/where/when to run tasks

‣ Complex solution through iteration

Phil Watts DevOps Artificer @ REĀN Cloud

@pwattstbd github.com/marsupermammal

[email protected] www.reancloud.com

import "os"

func presentation() { os.Exit(0) }