improving operations efficiency with puppet

31
Improving Operations Efficiency with Puppet April 17 th , 2015 Nicolas Brousse | Sr. Director Of Operations Engineering | [email protected] Julien Fabre | Site Reliability Engineer | [email protected]

Upload: nicolas-brousse

Post on 15-Jul-2015

673 views

Category:

Engineering


1 download

TRANSCRIPT

Page 1: Improving Operations Efficiency with Puppet

Improving Operations Efficiency with Puppet

April 17th, 2015

Nicolas Brousse | Sr. Director Of Operations Engineering | [email protected] Julien Fabre | Site Reliability Engineer | [email protected]

Page 2: Improving Operations Efficiency with Puppet

Who are we?

TubeMogul ●  Enterprise software company for digital branding ●  Over 27 Billions Ads served in 2014 ●  Over 30 Billions Ad Auctions per day ●  Bid processed in less than 50 ms ●  Bid served in less than 80 ms (include network round trip) ●  5 PB of monthly video traffic served ●  1.1 EB of data stored

Operations Engineering ●  Ensure the smooth day to day operation of the platform

infrastructure ●  Provide a cost effective and cutting edge infrastructure ●  Team composed of SREs, SEs and DBAs ●  Managing over 2,500 servers (virtual and physical)

Page 3: Improving Operations Efficiency with Puppet

Our Infrastructure

Public Cloud On Premises

Multiple locations with a mix of Public Cloud and On Premises

Page 4: Improving Operations Efficiency with Puppet

●  Java (a lot!) ●  MySQL ●  Couchbase ●  Vertica ●  Kafka ●  Storm ●  Zookeeper, Exhibitor ●  Hadoop, HBase, Hive ●  Terracotta ●  ElasticSearch, Kibana ●  LogStash ●  PHP, Python, Ruby, Go... ●  Apache httpd ●  Nagios ●  Ganglia

Technology Hoarders

●  Graphite ●  Memcached ●  Puppet ●  HAproxy ●  OpenStack ●  Git and Gerrit ●  Gor ●  ActiveMQ ●  OpenLDAP ●  Redis ●  Blackbox ●  Jenkins, Sonar ●  Tomcat ●  Jetty (embedded) ●  AWS DynamoDB, EC2, S3...

Page 5: Improving Operations Efficiency with Puppet

●  2008 - 2010: Use SVN, Bash scripts and custom templates.

●  2010: Managing about 250 instances. Start looking at Puppet.

●  2011: Started with Puppet 0.25 then upgraded to 2.7 by EOY on 400 servers with 2 contributors.

●  2012: 800 servers managed by Puppet. 4 contributors.

●  2013: 1,000 servers managed by Puppet. 6 contributors.

●  2014: 1,500 servers managed by Puppet. Workflow using Git, Gerrit and Jenkins. 9 contributors. Start migration to 3.7.

●  2015: 2,000 servers managed by Puppet. 13 contributors.

Five Years Of Puppet!

Page 6: Improving Operations Efficiency with Puppet

●  2000 nodes ●  225 unique nodes definition

●  1 puppetmaster

●  112 Puppet modules

Puppet Stats

Page 7: Improving Operations Efficiency with Puppet

●  Virtual and Physical Servers Configuration : Master mode ●  Building AWS AMI with Packer : Master mode

●  Local development environment with Vagrant : Master mode

●  OpenStack deployment : Masterless mode

Where and how do we use Puppet ?

Page 8: Improving Operations Efficiency with Puppet

Code Review?

Page 9: Improving Operations Efficiency with Puppet

●  Gerrit, an industry standard : Eclipse, Google, Chromium, OpenStack, WikiMedia, LibreOffice, Spotify, GlusterFS, etc...

●  Fine Grained Permissions Rules ●  Plugged to LDAP ●  Code Review per commit ●  Stream Events ●  Use GitBlit ●  Integrated with Jenkins and Jira ●  Managing about 600 Git repositories

A Powerful Gerrit Integration

Page 10: Improving Operations Efficiency with Puppet

Gerrit in Action

Page 11: Improving Operations Efficiency with Puppet

●  1 job per module ●  1 job for the manifests and hiera data ●  1 job for the Puppet fileserver ●  1 job to deploy

Continuous Delivery with Jenkins

Global Jenkins stats for the past year ●  ~10,000 Puppet deployment ●  Over 8,500 Production App Deployment

Page 12: Improving Operations Efficiency with Puppet

Team Awareness: HipChat Integration with Hubot

Page 13: Improving Operations Efficiency with Puppet

Infrastructure As Code ●  Follow standard development lifecycle ●  Repeatable and consistent server

provisioning Continuous Delivery ●  Iterate quickly ●  Automated code review to improve code

quality Reliability ●  Improve Production Stability ●  Enforce Better Security Practices

Puppet Continuous Delivery Workflow: The Vision

Page 14: Improving Operations Efficiency with Puppet

The Workflow

Page 15: Improving Operations Efficiency with Puppet

The Workflow : Puppet code logic

Puppet environments ●  Dedicated node manifests (*.pp) ●  Modules deployed by branch with Git submodules

All the data in Hiera ●  Try to avoid params.pp class ●  Store everything : modules parameters, classes, keys, passwords, ...

Page 16: Improving Operations Efficiency with Puppet

Puppet Code Hierarchy

/etc/puppet ├── puppet.conf, hiera.yaml, *.conf ├── hiera └── environments ├── dev │ ├── manifests │ │ ├── nodes/*.pp │ │ └── site.pp │ └── modules │ ├── activemq │ ├── apache │ ├── apf │ ... │ └── zookeeper └── production ├── manifests │ ├── nodes/*.pp │ └── site.pp └── modules ├── activemq … └── zookeeper

Git submodules, branch dev

Git submodules, branch production

Page 17: Improving Operations Efficiency with Puppet

Hiera Configuration

$ cat /etc/puppet/hiera.yaml --- :backends: - eyaml - yaml :yaml: :datadir: /etc/puppet/hiera :eyaml: :datadir: /etc/puppet/hiera :extension: 'yaml' :pkcs7_private_key: /var/lib/puppet/hiera_keys/private_key.pkcs7.pem :pkcs7_public_key: /var/lib/puppet/hiera_keys/public_key.pkcs7.pem :hierarchy: - fqdn/%{::fqdn} - "%{::zone}/%{::vpc}/%{::hostgroup}" - "%{::zone}/%{::vpc}/all" - "%{::zone}/%{::hostgroup}" - "%{::zone}/all" - hostname/%{::hostname} - hostgroup/%{::hostgroup} - environment/%{::environment} - common :merge_behavior: deeper

Page 18: Improving Operations Efficiency with Puppet

Hiera eyaml : github.com/TomPoulton/hiera-eyaml ●  Hiera backend ●  Easy to use ●  Powerful CLI : eyaml edit /etc/puppet/hiera/secrets.yaml

Encrypt Your Secrets

$ cat secret.yaml --- ec2::access_key_id: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMII IBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAVIa28OwyaqI5N1TDCvVkBZz3YG+s+Hfzr0lqgcvRCIuJGpq28sQmmuBaQjWY38i86ZSFu0gM6saOHfG64OzVlurO7k/l0CKeL0JfXNaVM4TUqMaN9dSkL5e2vsmpLKrMASawmarqbLYwllTrTe32H4NWxU1e+qWLeUMr9ciBnA3W1Azm4RIo+3bsvgvMfdks....=]

Page 19: Improving Operations Efficiency with Puppet

Encrypt Files

Blackbox : github.com/StackExchange/blackbox ●  Use GPG to encrypt secret files ●  Easy to add/delete team members ●  No need to change your Puppet code !

# modules/${modules_name}/files/credentials.yaml.gpg file { ‘/etc/app/credentials.yaml’: ensure => ‘file’, owner => ‘root’, group => ‘root’, mode => ‘0644’, source => ‘puppet:///modules/${module_name}/credentials.yaml’ }

Page 20: Improving Operations Efficiency with Puppet

The Workflow

Page 21: Improving Operations Efficiency with Puppet

The Workflow : bottlenecks

●  Only Ops team members can commit (SRE, SE)

●  Review and validation is done only by a SRE

●  Jenkins will verify the code but will not validate the commit

●  Static Puppet environments

●  Rely a lot on server hostnames

Page 22: Improving Operations Efficiency with Puppet

Flexibility : R10K github.com/adrienthebo/r10k ! ●  Dynamic environments

●  No Git submodules anymore ! : - )

●  Easy to reproduce any environment

●  Can use private and forge Puppet modules

●  Can use branches and tags

●  Based on Puppetfile

Puppet Workflow Reloaded!

Page 23: Improving Operations Efficiency with Puppet

R10K

$ cat Puppetfile forge "https://forgeapi.puppetlabs.com" # Forge modules mod 'pdxcat/collectd' mod 'puppetlabs/rabbitmq' mod 'arioch/redis' mod 'maestrodev/wget' mod 'puppetlabs/apt' mod 'puppetlabs/stdlib' # Tubemogul modules mod "hosts", :git => 'ssh://<gerrit_host>/puppet/modules/hosts', :branch => 'dev' mod "timezone", :git => 'ssh://<gerrit_host>/puppet/modules/timezone', :branch => 'dev' ...

Page 24: Improving Operations Efficiency with Puppet

Puppet Workflow Reloaded!

Better code organization : Roles and Profiles ●  Represent the business logic : Roles

o  Highest abstraction layer o  Use Profiles for implementation

●  Implement the applications : Profiles

o  Remove potential code duplication o  Use modules and other Puppet resources

Page 25: Improving Operations Efficiency with Puppet

Roles/Profiles Pattern

class role::logs { include profile::base include profile::logstash::server include profile::elasticsearch } class profile::logstash { $version = hiera('profile::logstash::server::version', '1.4.2') $es_host = hiera('profile::logstash::server::es_host', 'es01') $redis_host = hiera('profile::logstash::server::redis_host', 'redis01') class { 'logstash': package_url => "https://download.elasticsearch.org/logstash/.../logstash_${version}.deb", java_install => true, } logstash::configfile { 'input_redis': content => template('logstash/configfile/logstash.input_redis.conf.erb'), order => 10, } logstash::configfile { 'output_es': content => template('logstash/configfile/logstash.output_es.conf.erb'), order => 30, } }

Page 26: Improving Operations Efficiency with Puppet

Do not rely on hostname : nodeless approach ●  Facts to guide Puppet ●  No node myawesomeserver { } anymore ●  Enforce a cluster vision ●  site.pp gives the configuration logic

Puppet Workflow Reloaded!

# /etc/puppet/manifests/site.pp node default { if $::ec2_tag_tm_role { notify { "Using role : ${ec2_tag_tm_role}": } include "role::${::ec2_tag_tm_role}" } else { fail(‘No role found. Nothing to configure.’) } }

Page 27: Improving Operations Efficiency with Puppet

●  Specify tags during the provisioning ●  Retrieve tags with AWS Ruby SDK and create facts

●  New hierarchy

AWS EC2 tags

$ facter -p | grep ec2_tag ec2_tag_cluster => rtb-bidder ec2_tag_nagios_host => mgmt01 ec2_tag_name => bidder ec2_tag_pupenv => production ec2_tag_tm_role => rtb::bidder

:hierarchy: - "%{::zone}/%{::ec2_tag_vpc}/%{::ec2_tag_cluster}" - "%{::zone}/%{::ec2_tag_vpc}/all" - "%{::zone}/all" - vpc/%{::ec2_tag_vpc}/%{::ec2_tag_cluster} - vpc/%{::ec2_tag_vpc}/all - environment/%{::environment} - common

Page 28: Improving Operations Efficiency with Puppet

New merging and reviewing rules

●  Everyone can commit a Puppet code ●  Allow everyone to review a Puppet change (+1)

●  Allow SE and SRE to validate a Puppet change (+2)

●  Auto validation/merging in dev if at least 80% of test (+2)

Page 29: Improving Operations Efficiency with Puppet

Next improvements

●  Acceptance testing with Beaker and Docker ●  Full test provisioning with ServerSpec

●  PuppetDB to improve the reporting

●  Dedicated Puppet Masters

Page 30: Improving Operations Efficiency with Puppet

OpenSource Modules

●  tubemogul-aptly ●  tubemogul-blackbox ●  tubemogul-codedeploy ●  tubemogul-gor ●  tubemogul-packer ●  tubemogul-tmfile ●  tubemogul-storm ●  tubemogul-kafka

Page 31: Improving Operations Efficiency with Puppet

Nicolas Brousse Julien Fabre

@orieg @julien_fabre