sten spans schuberg philis @sspans (github, etc) · centos == redhat5 or you may have redhat7...

Post on 15-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CUSTOMER

WHO?

Sten Spans

Schuberg Philis

@sspans (github, etc)

CUSTOMER

TOPIC

Going from 100 to 10000 systems

Orchestrating a Zone

Not Google-scale

CUSTOMER

WHY?

New Zone

Rethink principles

Automate

Comments on Centos7/KVM

Conceptual or Technical?

CUSTOMER

WHAT?

CUSTOMER

SUDO MAKE CLOUD

Networking

Hypervisors

Storage

Orchestration

CUSTOMER

TOYS

Source: https://www.flickr.com/photos/rfc1036/406675831/

CUSTOMER

STAFF

CUSTOMER

GOAL

CUSTOMER

GOAL

CUSTOMER

CLOUDY

https://www.flickr.com/photos/versageek/493800514

CUSTOMER

MISTAKES

Artisinal / Pets

Network not Scalable / Redundant

Stretching Failure-domains

Other technical downsides

Lack of Automation

CUSTOMER

WHAT IS ARTISINAL?

People tracking MAC addresses

Tweaking settings for each system

Multiple sources of truth

Validation / Acceptance test

Naming - individual servers

CUSTOMER

NAMING?

Impacts automation

Impacts labeling

Impacts replacements

Go for location-based identities!

CUSTOMER

NETWORKING?

Large layer2 domains

Sharing networks between zones

Manual configuration

Not redundant (enough)?

Or more failures due to redundancy?

CUSTOMER

FAILURE DOMAINS

Do you really want twin-datacenter?

Clustering is complicated…

Way more complicated failures…

Have you actually tested failures?

CUSTOMER

GOAL

Manage zone as one unit

Capture design / logic in config-management

Versioned Iterations

Think about naming

Think about how you identify hosts

Simplify…

CUSTOMER

GOAL

Stop managing individual servers (cattle)

Stop being Artisanal

Start scaling

Start Orchestrating

Think Terraform/CloudFormation/Heat

CUSTOMER

BUILDING BLOCKS

Isolated Networking

Isolated Pods

Worry-free Storage

Optional: Dedicated SDN Clusters

Fully orchestrated zones

CUSTOMER

BOOTSTRAP NETWORK CORE

Core Switches

LoM switch

Hypervisors

SDN?

CUSTOMER

CORE SWITCHES

Linux based

Bootstrap via DHCP/HTTP

Chef/Ansible/Puppet supported!

Capture design in cookbooks/playbooks

Can run additional services

CUSTOMER

SDN

Cluster per (availability) Zone

Failure Domain

Features vs. Lock-in

Complicated? Expensive?

Accept tunnels between zones

Customers will accept trade-offs!

CUSTOMER

BOOTSTRAP A POD

TOR Switch Pair

LoM switch

Hypervisors

Storage

CUSTOMER

TOR SWITCHES

Linux Based

Bootstrap via DHCP/HTTP

Chef/Ansible/Puppet supported!

Capture design in cookbooks/playbooks

Can run DHCP/DNS per Pod

Move pod services into the Pod

CUSTOMER

LOM SWITCHES

Can bootstrap via ToR switch

Config via ToR

Manage iLO’s via DHCP Hooks

Would love a linux box here too

CUSTOMER

HYPERVISORS

Linux Based

Automated Firmware Updates

Bootstrap via DHCP/HTTP

HTTP Bootstrap via Chef

TFTP Proxy on ToR

Location based DHCP (Option 82)

CUSTOMER

HYPERVISOR HARDWARE

Machines are extremely scalable

Calculate cost per VM

Waiting for 25G Ethernet

Has anybody solved EFI PXE? Please?

CUSTOMER

PROVISIONING

Bootstrap via DHCP/HTTP

Nekopan - Golang webserver

Interfaces with Chef

(or ansible/puppet)

CUSTOMER

STORAGE

Stable

NFS – For now…

API Driven

No fancy replication / clustering

CUSTOMER

DONE?

Lets add all of this to cloudstack…

CUSTOMER

CLOUDSTACK

SDN providers need work

cloudstack-setup-agent is … horrible

Routervm/SystemVM

Small networking issues

And I bet there is more…

CUSTOMER

THE HORROR:

CUSTOMER

WHAT IS GOING ON?

All Ubuntu is the same…

Fedora == Redhat 6

Centos == Redhat 5

Or you may have Redhat 7

Really? WTF?

CUSTOMER

RESULTS ON CENTOS 7

Selinux is disabled (revert broken)

Firewall changes don’t work for firewalld

Cgroup changes are not that cool really

Workarounds for old bugs results in breakage on newer systems

So I reinstalled the box

CUSTOMER

CENTOS 7 STATUS

Selinux seems to work

Labeled NFS is still bleeding edge

No need to mess with cgroups

Firewalld is pretty nice really

Cloudstack should perhaps audit the config

But please don’t change it…

CUSTOMER

ROUTERVM

We run ansible to hotfix/manage routervms

But ip / kernel commandline not available on KVM L

Qemu-guest-agent solves that and more…

Libvmi – not sure

top related