distributed configuration management with nomad · 2019. 11. 5. · nomad quick overview nomad is...

36
NOMAD DISTRIBUTED CONFIGURATION WITH James Rasell: @jrasell

Upload: others

Post on 26-Sep-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

NOMADDISTRIBUTED CONFIGURATION WITH

James Rasell: @jrasell

Page 2: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

JAMES RASELL - WHO?

▸ Distributed Systems Engineer

▸ Background in Infra & Ops

▸ Generally automate the things other people don’t want to

▸ Creator of Sherpa, Levant and Nomad-Toast

▸ Is Butters Scotch?

Page 3: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

NOMAD

Page 4: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

NOMAD

QUICK OVERVIEW

▸ Nomad is an easy-to-use, flexible, and performant workload orchestrator

▸ Datacentre and Region aware and can scale above 10000 nodes per cluster

▸ It has native integration with Consul for service discovery and Vault for secret management

▸ Run a variety of workloads including Docker, Java, and QEMU via 3 types of schedular

Page 5: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region
Page 6: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

ARCHITECTURE OVERVIEW

Page 7: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region
Page 8: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

ARCHITECTURE OVERVIEW

CONSIDERATIONS

▸ Secure and segregated Vault/Consul cluster providing secrets and PKI management

▸ All servers/instances run Nomad client; even the Nomad servers

▸ Flexible workload placement using client meta and class parameters

Page 9: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

BOOTSTRAPPING

Page 10: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

BOOTSTRAPPING

TOOLS, METHOD, PROCESS

▸ Same process, tools, and methodology used for local dev as used in datacentre environments

▸ Utilised Bash, Terraform, and libvirt. Nothing else

▸ Locally the process was fully automated and would complete in under 8 minutes

▸ DC build was (slightly) more controlled

Page 11: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

provisioner "remote-exec" { inline = [ "sudo bash -x /var/tmp/infra/3_configure_control_plane_servers.sh", ] }

variable "control_plane_node_nomad_client_tls_key" { description = "The TLS key for the control plane Nomad client." type = "string" }

control_plane_node_nomad_client_tls_key = “${module.vault_data.nomad_client_pki_private_key}"

provisioner "file" { content = “${var.control_plane_node_nomad_client_tls_key}" destination = “/var/tmp/infra/nomad_client_key.pem” }

Page 12: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

BOOTSTRAPPING

PROCESS STEPS

▸ Dependancies meant base infrastructure needed to be built in a strictly controlled order

▸ Vault cluster in particular went through 5 stages of building

▸ VMs > Vault install > Vault init/unseal > Vault PKI > Vault Gotun > Nomad server install > Vault Nomad client install > workload pool Nomad client install

Page 13: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

POST BOOTSTRAP

Page 14: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

POST BOOTSTRAP

FIRST STEPS AFTER BOOTSTRAP PROCESS

▸ Stop the gotun process running as a Vault proxy and remove binary (batch job)

▸ Start Fabio for Vault proxying, using traffic shaping to direct traffic to active node (system job)

▸ Start the Consul server job (service job) and then the Consul client job (system job)

Page 15: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

template { data = <<EOH #!/bin/bash

set -e

systemctl stop gotun

REMAINING_PS=$(pgrep gotun) if [[ -z ${REMAINING_PS} ]]; then pgrep gotun | xargs kill fi

rm -f /usr/local/bin/gotun rm -f /etc/gotun/config.yaml EOH

destination = "local/stop-gotun.sh" change_mode = "noop" perms = "777" }

consul kv put fabio/config/vault "route weight vault / weight 1.00 tags \"active\""

service_tags = "fabio-vault-urlprefix-/ proto=http"

Page 16: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

TLS MANAGEMENT

Page 17: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

TLS

SHORT TTL TLS AS DEFAULT

▸ Full TLS encryption used from day 1 on infrastructure applications and platform applications

▸ Longest TTL used for a TLS certificate was 720h, short lived apps used slowest possible TTLs

▸ TLS took a while to get stable and would have likely been impossible/hugely time consuming as an after thought

Page 18: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

TLS

CERT-MANAGER APPLICATION

▸ Wrote small application to perform TLS expiry and IPSAN difference checks

▸ Certificates can be automagically replaced if either check fails

▸ Application can run arbitrary commands after replacing a certificate to force TLS rotation

▸ Managed and maintained Nomad, Consul and Vault certificates (batch cron job)

Page 19: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

TLS

CERT-MANAGER APPLICATION

▸ Used interesting deployment logic to ensure it would run as a batch on all clients of a class. System Batch, pretty please.

▸ Perform restart calls on applications that didn’t support SIGHUP reload

▸ Please Please Please always plan for TLS reload features on both the app and downstream connections

Page 20: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

TLS

BUNDLE SPLITTING

▸ Vault TLS bundles require splitting into component parts for use

▸ Used template stanzas and `hairyhenderson/gomplate` to perform splitting magic

Page 21: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

template { data = <<EOH {{ with secret “pki/issue/rand” "ttl=15m" “common_name=rand.common” “ip_sans=1.1.1.1” "format=pem" }} {{.Data | toJSON }} {{ end }} EOH

destination = "local/bundle.json" change_mode = "restart" }

template { left_delimiter = "((" right_delimiter = "))"

data = <<EOH {{- printf "%s\n" (datasource "bundle").private_key -}} EOH

destination = “local/rand.pem.tmpl" perms = "600" change_mode = "noop" }

Page 22: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

config { command = "gomplate"

args = [ "-d", "bundle=file://${NOMAD_TASK_DIR}/bundle.json?type=application/json", "-f", "local/ca.pem.tmpl", "-o", "local/ca.pem", "-f", "local/rand.pem.tmpl", "-o", "local/rand.pem", "-f", "local/rand-key.pem.tmpl", "-o", "local/rand-key.pem", "--", “${NOMAD_TASK_DIR}/app-run-command“, “--tls-key-path=local/rand-key.pem”,

] }

Page 23: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

COCKROACH DB

Page 24: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

MOST PEOPLE GET REALLY EXCITED ABOUT RUNNING A DATABASE INSIDE OF A CLUSTER MANAGER LIKE NOMAD; THIS IS GOING TO MAKE YOU LOSE YOUR JOB. GUARANTEED.

Kelsey Hightower - HashiConf 2016

Page 25: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

COCKROACH DB

CLUSTERING SETUP

▸ CRDB servers placed across hosts using constraints to ensure HA & redundancy (service job)

▸ Ephemeral disks used to lower impact of allocations restarts or failures

▸ CRDB requires initialisation to prepare the cluster ready for use (batch job)

▸ Careful attention paid to job parameters and continual assessment

Page 26: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

args = [ "${NOMAD_TASK_DIR}/cockroach", "init", "--certs-dir=${NOMAD_TASK_DIR}", "--host=${meta.host}", ] }

group "db-cluster" { ephemeral_disk { sticky = true }

count = 3

constraint { distinct_hosts = true }

constraint { operator = "=" attribute = "${meta.role}" value = “foobar" }

Page 27: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

COCKROACH DB

SCHEMA MANAGEMENT

▸ Table schema stored alongside application code

▸ A small custom application was used to apply schema changes and rollback if needed using ‘gobuffalo/packr’ (batch job)

▸ DB can be seeded with data for development and testing purposes (batch job)

Page 28: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

BACKUPS

Page 29: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

BACKUPS

BACKING UP DATA

▸ Cockroach DB tables and Consul backed up regularly to external storage using custom apps/wrappers (batch job)

▸ The backup applications had restore commands to allow for easy testing of backup data

▸ Remember: a backup isn’t a real backup until the restore is proved to work

▸ In a full DR situation, the platform could be fully restored in 15 minutes

Page 30: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

MISC.

Page 31: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

MISCELLANEOUS

INGRESS AND DISCOVERY

▸ Separate Fabio used to provide external access to services and UI’s (system job)

▸ Internal application service discovery performed using a gRPC Consul resolver

Page 32: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

MISCELLANEOUS

PATH TO PRODUCTION

▸ The entire platform can be built on a local developer machine giving an exact replica of production

▸ Infra tooling and process can be developed and tested in the same manner as application code

▸ All deployments performed using TeamCity, Levant and Nomad-Toast (service job) for automation and observability

Page 33: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

MISCELLANEOUS

MONITORING

▸ Consul health checking used extensively to alert to OpsGenie (service job)

▸ All logs shipped to Humio using FileBeat for log shipping (system job)

▸ Consul, Nomad, Vault, and app metrics shipped to Circonus for analysis and alerting (service job)

Page 34: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

FINAL THOUGHT

Page 35: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region

FINAL THOUGHT

KEY POINTS

▸ Nomad managed all tasks apart from initial minimal bootstrapping

▸ Rely on a single mechanism for running all tasks and workloads

▸ A number of small applications written to perform useful automation tasks

Page 36: Distributed Configuration Management with Nomad · 2019. 11. 5. · NOMAD QUICK OVERVIEW Nomad is an easy-to-use, flexible, and performant workload orchestrator Datacentre and Region