the road to the white house with puppet & aws
DESCRIPTION
Learn how the Obama campaign leveraged Amazon Web Services (AWS) and Puppet to rapidly scale their infrastructure up for the needs of the election in a sustainable manner. Using the automation that AWS and Puppet enabled -- the Obama campaign build a significant AWS infrastructure (http://awsofa.info) while having a lean DevOps team, tight deadlines and applications that needed to be highly available. Learn about using bootstrapping puppet on Amazon EC2 instances with CloudInit, using it with autoscaling groups and secure handling of credentials in manifests. Find out how to scale puppet masters and take advantage of Amazon S3 backed RPM/Debian repos with them. Leo Zhadanovsky Senior Solutions Architect, Amazon Web Services Leo Zhadanovsky is a Senior Solutions Architect at Amazon Web Services. He helps customers best leverage AWS services, in order to help them succeed in building highly-available, scalable and elastic architectures for their business needs. He was previously the Director of Systems Engineering at the Democratic National Committee. From 2009 to early 2013, he ran the DNC's physical server and cloud footprint and supported infrastructure which was in use by the Obama campaign, state and local Democratic parties. In 2010, the DNC successfully ran and deployed many applications, such as a Call Tool and Voter Registration website, that were written in Ruby and ran on AWS. In 2012, the DNC supported the Obama campaign with various backend APIs, web sites, voter file databases and a large data warehouse.TRANSCRIPT
The Road to the White Housewith Puppet & AWS
Leo Zhadanovsky – Solutions Architect – [email protected] @leozh
What am I talking about today?
What was OFA Tech? • Who did it?• What did they build?
How did they do that?• Technologies and Tradeoffs• Services vs. Software
How did they leverage puppet?
What did they learn from building something so big?
Who Am I?
I work for AWSI worked for the DNC 2009-2012
I was embedded at OFA
AWS does not endorse political candidates
I love Star Trek (TNG is the best)
So here’s the Idea
~30th biggest E-commerce operation, globally~200 distinct new applications, many mobileHundreds of new, untested analytical approachesProcessing hundreds of TB of data on thousands of serversSpikes of hundreds of thousands of concurrent users
FUN FUN FUN
a few constraints…
~30th biggest E-commerce operation, globally~200 distinct applications, many mobileHundreds of new, untested analytical approachesProcessing hundreds of TB of data on thousands of serversSpikes of hundreds of thousands of concurrent users
Critically compressed budgetLess than a year to executeVolunteer and near-volunteer development teamCore systems will be used for a single critical dayConstitutionally-mandated completion date
NOT FUN NOT FUNNOT FUN
Built by guys and gals like these: Obama For America
Business as usual..
…for a technology startup
Election Day – OFA Headquarters
So they built it all, and it worked
Typical Charts
How?
The old approach, even from Amazon
The old approach.. Might have some problems..
No Up-Front Capital Expense
Pay Only for What You Use
Self-Service Infrastructure
Easily Scale Up and Down
Improve Agility & Time-to-Market
Low Cost
Cloud Computing Benefits
Deploy
OFA’s Infrastructure
awsofa.info
Web-Scale Applications
500k+ IOPS DB Systems
Services API
Ingredients
Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB
Data Stores
Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB
Development Frameworks
Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB
Infrastructure, Configuration Management & Monitoring
Ubuntu nginx boundary Unity jQuery SQLServer hbase NewRelic EC2 node.js Cybersource hive ElasticSearch Ruby Twilio EE S3 ELB boto Magento PHP EMR SES Route53 SimpleDB Campfire nagios Paypal CentOS CloudSearch levelDB mongoDB python securitygroups Usahidhi PostgresSQL Github apache bootstrap SNS cloudformation Jekyll RoR EBS FPS VPC Mashery Vertica RDS Optimizely MySQL puppet tsunamiUDP R asgard cloudwatch ElastiCache cloudopt SQS cloudinit DirectConnect BSD rsync STS Objective-C DynamoDB
Configuration Management: Puppet
In mid-2011, we look at options for configuration management and chose PuppetWe needed to make it scale, and to get it to work with state-less, horizontally scalable infrastructureHow did we do this?
Bootstrapping Puppet with CloudInit
CloudInit is built into Ubuntu and Amazon Linux• Allows you to
pass bootstrap parameters in Amazon EC2 user-data field, in YAML format
Bootstrapping Puppet with CloudInit
Don’t store creds in puppet manifests, store them in private Amazon S3 bucketsEither pass Amazon S3 creds through CloudInit:
Even better – avoid this by using AWS Identity and Access Management (IAM) roles and the version of s3cmd in github
Bootstrapping Puppet with CloudInit
Built-in puppet support
Use certname with %i for instance id to name the nodePuppetmaster must have auto sign turned on• Use security groups and/or NACLs for network-level security
In nodes.pp, use regex to match node names
Puppet Tips
Use a base class to define your standard install
Use runstages
Don’t store credentials in puppet, store them in private Amazon S3 buckets• Use AWS IAM to secure the credentials bucket/folders within that
bucket
Puppet Tips
Puppet Tips
Use puppet only for configuration files and what makes your apps uniqueFor undifferentiated parts of apps, use Amazon S3 backed RPM/Debian repositories• Can be either public or private repos, depending on your needs
• Amazon S3 Private RPM Repos: http://git.io/YAcsbg• Amazon S3 Private Debian Repos: http://git.io/ecCjWQ
Puppet Tips
By using packages for applications deploys, you can set ensure => latest, and just bump the package in the repo to update
Log everything with rsyslog/graylog/loggly/NewRelic/splunk
Scaling the Puppet Masters
Use an Auto Scaling group for puppet masters• Min size => 2, use multiple Availability Zones
Either have them build themselves off of existing puppet masters in the group or off packages storied in Amazon S3 and bootstrapped through user-dataAuto-sign must be on
SitesCommunicationsAd TargetingOps ToolsAnalyticsAppsMicro-targetingMicro-listeningReportingRegistrationsVolunteer
CoordinationEtc, etc, etc.
Technology ChoicePolyglot Development
Cloud Hosting
Diverse, App-centered Databases
SOA, queue-based system integrations
Expected TradeoffMore Complex Ops
Less Infra Control, performanceMore Complex Ops, Fragility, Data Corruption
Dev Complexity, slower system performance
Technology ChoicePolyglot Development
Cloud Hosting
Diverse, App-centered Databases
SOA, queue-based system integrations
Expected TradeoffMore Complex Ops
Less Infra Control, performanceMore Complex Ops, Fragility, Data Corruption
Dev Complexity, slower system performance
UpsideBuild as little as possible, rev-1 faster, reuse dev skills
Scale, Speed, Cost
Heterogeneous Resilience, right tools for the job
Scalability, serviceability, operational flexibility, and substantially faster in aggregate
$5.2B retail business
7,800 employees
A whole lot of servers
2003
2012
Every day, AWS adds enough server capacity to
power this $5B enterprise
$5.2B retail business
7,800 employees
A whole lot of servers
2003
2012
Amazon Simple Queuing Service
(SQS)
Thousands of customers
A whole lot of servers
Over 5 Billion Queued Events
2006-8
2012
OFA
Produced 8.4 Billion Amazon SQS Queued
Events
Amazon Simple Queuing Service
(SQS)
Thousands of customers
A whole lot of servers
Over 5 Billion Queued Events
2006-8
2012
OFA
Produced 8.4 Billion Amazon SQS Queued
Events
Just the last month of the campaign
2006-8
Amazon Simple Queuing Service
(SQS)
Thousands of customers
A whole lot of servers
Over 5 Billion Queued Events
No time to waste
?This applies to lots of services!
Elastic Load BalancingAmazon ElastiCacheAmazon RDSAmazon CloudSearchAmazon Route53Amazon S3Amazon CloudFrontAmazon DynamoDB
You can mostly do these on your
own…
But do you have extra:focus, expertise, time, research,
money, risk-tolerance, staff, dedication
to innovate, operations coverage, scalability in
design...
Looks pretty simple.
Inserts 7.5m records in Amazon DynamoDB, in 8 minutes
One thing that is difficult to prepare for…
No pressure…
They had this built for the previous 3 months, all on the East Coast.
They had this built for the previous 3 months, all on the East Coast.
We built this part in 9 hours
to be safe.
AWS +Puppet +
Netflix Asgard + CloudOpt +DevOps =
Cross-Continent Fault-Tolerance On-Demand
Replication across the continent..
http://tsunami-udp.sourceforge.net/
478.18 Mbps cross-continental data transit rate for a single cc2.8xlarge instance
1.72 Tb an hour
27 Tb of data to move
3.92 Hours required to move the data across the continent with four cc2.8xlarge instances
So what did they learn?
HA in Depth: Amazon S3 static pages, de-coupled UI, jekyll/hyde
Game Day: Practice failures so you know what to do.( http://www.awsgameday.com )
Loose-Coupling: Ops easy, scale easy, test easy, fix easy…
Fail-Forward: features, quality, and focus are all critical.
Cloud works.
We showed it to the world at re: Invent 2012
together with the OFA DevOps crew
We presented in Tokyo…
Born from the Campaign
What will you do next?
Maybe look at some of their Ruby code?
Register Now! reinvent.awsevents.com$200 Off Discount Code:
Zoltan2013
Gain New Skills & KnowledgeChoose from 175+ technical sessions, training bootcamps, hands-on labs, and hackathons.
Dive Deeper into AWSDive deep into foundational AWS services and learn about the latest services and features.
Get Your Questions AnsweredGet your technical questions answered by AWS architects, engineers, and product leads.
Learn Best PracticesDiscover best practices, tips and tricks, and lessons learned from expert customers.
Thank you!
Questions? • Come talk to an AWS Solutions Architect at Table 22
Contact me!• @leozh• [email protected]