times a day rebuilding your cloud, multiple · vilmos@saucelabs.com. sauce labs hq in san...

Post on 08-Aug-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Rebuilding your Cloud, Multiple Times a Day

10/07/2015

Vilmos Nebehaj

Sauce Labs

Hi!

https://github.com/ldx/

@dyingpixel

vilmos@saucelabs.com

Sauce Labs

HQ in San Francisco, 2nd office in Vancouver

~100 employees, almost 50% are engineers

Main product: Selenium and Appium testing in the cloud

Private cloud which runs tens of millions of jobs every month

>500 combinations of OSes, desktop browsers, mobile emulators/simulators and real mobile devices for testing browser, native and hybrid applications

We’re hiring

Immutable Infrastructure

"A large fraction of the flaws in software development are due to programmers not fully understanding all the possible states their code may execute in. In a multithreaded environment, the lack of understanding and the resulting problems are greatly amplified, almost to the point of panic if you are paying attention.

Programming in a functional style makes the state presented to your code explicit, which makes it much easier to reason about, and, in a completely pure system, makes thread race conditions impossible."

Without mutable variables, testing becomes trivial: if we're transforming certain input via a given side effect free function, we always get the same output (referential transparency).

Note: this is just an abstraction of course. If you drill deep enough, latest at the CPU instruction level, you have side effects e.g. caches, TLB, etc. But as an abstraction, this is still pretty useful.

So what are the downsides?

In several cases, performance is not as good as with simply mutating a data structure in place.

What does it have to do with my infrastructure?

In Ops/DevOps, we have the exact same issue as in the application

development space. Large fraction of the problems we are facing

are due to the almost incomprehensible state space in

configuration on our servers.

Think about it: how many configuration files are there, with how

many possible settings in each on the average server? What

interactions and interference is possible between them?

NO MUTATING STATE

in your infrastructure?

The primary goal of treating your infrastructure as code:

"Enable the reconstruction of the business from nothing but a source code repository, an application data backup, and bare metal resources."

Model Configuration Enabling technology

pets manual, minimal scripting internet, IP, server hosting

cattle automated configuration management softwareinfrastructure as code

immutable infrastructure

automatedno modification, rebuilding for any change

virtualizationcloud services

Containers vs VMs

Containers

● Security concerns

● Lock you into a specific OS

● No (or minimal) performance

penalty

● Lightweight

● Very fast startup times

Virtual Machines

● Fully isolated at the

hardware level

● Another layer of security

● Different operating

environment (kernel/OS)

● Performance overhead

● Slower boot times

Two repositories:● sauce-ansible with our inventory, playbooks and roles● vmbuilder with a packer templates for our VMs/containers

We use branch builds for pull requests in sauce-ansible.

A commit/merge into sauce-ansible master kicks off new image builds for all templates in vmbuilder.

Infrastructure as Code at Sauce

+

Packer builders

{ "builders": [

{ "type": "virtualbox-iso", "guest_os_type": "Ubuntu_32",

"iso_checksum": "1214cd22448338b60bb24f583dd8741a","iso_url": "http://releases.ubuntu.com/14.04/...",

... }, { "type": "qemu", "format": "qcow2",

"iso_checksum": "1214cd22448338b60bb24f583dd8741a","iso_url": "http://releases.ubuntu.com/14.04/...",

... }

], ...}

Packer provisioners

{ "provisioners": [

{ "type": "shell",

"inline": ["sudo pip install ansible"] },

{ "destination": "/tmp/ansible", "type": "file",

"source": "../ansible" }, {

"type": "shell", "inline": ["cd /tmp/ansible && ansible-playbook -c local -i inventory chef.yml"]

} ],

...}

Building an LXC image is as simple as:

rm -rf output-lxcPACKER_CONFIG=/etc/packer.conf packer build -only=lxc ./packer.json

Building a QEMU image:

# Parse command line arguments.# ...

# Remove output directory in case we get killed.trap "rm -rf ${OUTDIR}; exit" SIGHUP SIGINT SIGTERM

# Remove any previous build.rm -rf ${OUTDIR}

# Build image.packer build -var basename=${NAME} -only=qemu ./packer.json

# Convert image to desired format.# ...

# Jenkins has problems transferring large images when they are larger than# 8GB. Split it up into smaller chunks.# ...

Long image builds

● Relying on a new image to be built for any change means you want

to minimize image build times

● CI infrastructure you can scale out is key

● Especially VMs might take a long time

● We split our longest running builds into multiple Jenkins jobs

○ Install base OS

○ Configure system and application(s) in the image

● Jenkins makes it easy to create build pipelines

Deploying images

● Central artifact store (Jenkins)

● Images have unique build numbers

● There are several hundred hypervisors in the Sauce Cloud

● We deploy images to hypervisors in smaller batches via ansible

● The control plane for the Cloud tells hypervisors which image to

boot -> easy to roll back

Tools recap

Jenkins

CI software, also our artifact (image) store

Ansible

Automation and configuration management

Packer

Building images from a common configuration for different backends

LXC

Linux kernel level containment library and tools

QEMU/KVM

Full virtualization solution for Linux with hardware acceleration

Runtime temporary storage

VM

Base image

CoW image

snapshot

reads unchanged blocks

reads/writes changed blocks

Runtime temporary storage

● Images are immutable, VMs are always started from this clean state

● Temporary storage is provided on the hypervisors via per-VM copy on

write images, snapshotted from the immutable image

● For containers, we use aufs for CoW

● Assets created during tests are uploaded to S3

● When job ends, the VM and its CoW image are destroyed

Testing

Repo PRInventory

tests

Playbook tests

Role tests

Join

Testing

Testing

Testing

● For end to end testing , we have a main integration build

● Several thousand Selenium tests - we’re eating our own dogfood

● No continuous delivery for automatic image deployment into

production

TL;DR

Containers are cool. VMs are also cool. Both of them have their use cases.

We use continuous integration for building immutable VM/container images for our cloud.

Images are built on Jenkins in a fully automated fashion. Long builds are split up into multiple jobs and chained together.

Testing is key. Our infra codebase (vmbuilder + sauce-ansible) is tested for any change. We test both our ansible codebase via unit tests and integration tests, and the image artifacts via end to end tests.

Packer is a great tool; you create your image template once, and can use various builders to produce an output image for many different cloud providers and virtualization solutions.

Questions?

top related