2nd eucalyptus bay area meet up with rich wolski
TRANSCRIPT
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus Architecture and Implementation
Rich Wolski, CTO March 1, 2012
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus Multi-tiered Service Architecture
User Transactions
Inventory and Scheduling
Actualization Actualization Actualization Actualization Actualization
Inventory and Scheduling
Inventory and Scheduling
Service Delivery User Requests
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus Components • Cloud Controller (CLC)
– User request processing (except for Walrus), Credentials management, VM (instance) state management
• Walrus (S3) – S3 user request processing, Append-only, Put/Get object storage
• Cluster Controller (CC) – VM inventory, Network provisioning/security group implementation
• Storage Controller (SC) – Block level, network attached storage (SAN and Linux)
• Node Controller (NC) – Hypervisor interface and control, VM launch/decommissioning
• VMWare Broker – Gateway between CC and ESX and/or vSphere for VMWare
© 2012 Eucalyptus Systems, Inc. -- confidential
Component Architecture
CLC Walrus
CC SC
NC/VMWareB
NC/VMWareB
NC/VMWareB
NC/VMWareB
NC/VMWareB
CC SC CC SC
Service Delivery User Requests
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus Generations
• Eucalyptus 1.X (June 08 through Sep. 10) – University code
• Eucalyptus 2.X (June 10 through Feb. 11) – Commercial focus, early production
• Eucalyptus 3.X (present - ) – Production operational improvements – Full commercial feature set (almost)
• Few, if any features deprecated – BitTorrent?
© 2012 Eucalyptus Systems, Inc. -- confidential
New Eucalyptus 3.0 Features • High-availability (HA) of the Eucalyptus Service
– Hot fail-over and repair for all components except NC
• AWS Identity and Access Management (IAM) API plus extensions for private clouds
– Quotas and metering
• Eucalyptus Block Storage improvements – AWS Volume-backed instance API (persistent instances)
“bootable” – NetApp and JBOD support added to existing Dell Equallogic
• Full support for Windows images – Seven different versions, AWS compatible authentication,
sysprep, ephemeral disk
• Accounting/Usage reporting – Charge-back interface linked to quotas
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus 3.0 Platform Improvements • Revamped image caching in the NC
– Faster instance starts using copy-on-write
• Refactored VMWare broker – Faster and more robust image preparation, support for vSphere 4.X,
improved scale, more extensive deployment topologies
• Extended Linux distro support – RHEL 5 and RHEL 6, packages for Canonical LTS (Ubuntu 10.04)
• Substantial improvement in automated QA – Full QA sequence is 5 days (features + distros + hypervisors +
deployment topologies + networking modes)
• Re-designed administrative webUI • Improved command-line admin tools • Re-designed packaging, upgrade and dependency management • Re-designed installation mechanism (package repositories)
© 2012 Eucalyptus Systems, Inc. -- confidential
Eucalyptus in The Wild
• Eucalyptus 2.0 Deployments – Games, mobile infrastructure, media, telecom
• Tons of feedback – Not all of it angry
• Top 3 – Platform HA -> VM connectivity and request service – Quotas, accounting, reporting – Windows (fast image creation and start)
© 2012 Eucalyptus Systems, Inc. -- confidential
High Availability
• Eliminate single point of failure – Host failure – Network connectivity failure (including network partitions)
• Tolerate as many multiple failure cases as possible • Avoid data loss at all costs
– Fail stop is better than data loss
• Availability of the services that Eucalyptus offers – Eucalyptus requests – VM connectivity and storage – Not VM HA -> application level
© 2012 Eucalyptus Systems, Inc. -- confidential
HA Web Service Architecture
• All Eucalyptus components are implemented as Web Services
– CLC, Walrus, SC, VMWare Broker– Java – CC and NC - C
• CC and NC are each implemented in separate Axis2c service container
• CLC, Walrus, SC, and VMWare Broker share a web service stack and JVM when co-located
© 2012 Eucalyptus Systems, Inc. -- confidential
PoC Configuration
Linux
Web Service DB management
CLC Walrus
SC VMWb
CC
Linux
NC
Linux
NC
Linux
NC
Linux
NC
Linux
NC
Linux
Web Service DB management
CLC Walrus
SC VMWb
CC
Walrus
Walrus
© 2012 Eucalyptus Systems, Inc. -- confidential
Multi-component Failure
Linux
Web Service DB management
CLC
SC VMWb
CC
Linux
NC
Linux
NC
Linux
NC
Linux
NC
Linux
NC
Linux
Web Service DB management
CLC Walrus
SC VMWb
CC
Walrus CLC
SC VMWb
CC
© 2012 Eucalyptus Systems, Inc. -- confidential
Production CLC CLC Wal
SC
Wal
SC
CC
VMb
CC
VMb
Linux
NC
Linux
NC
Linux
NC
Linux
NC
Linux
NC
© 2012 Eucalyptus Systems, Inc. -- confidential
Group Membership and Heartbeat
• HA is from the perspective of the “master” CLC • Jgroups determines which machines are “up”
– The network connecting the “up” machines is unpartitioned
• Heartbeat determines which services are available within the “up” group
• Back-up CLC monitors the “up” group to determine if it contains a master
– If not, it becomes the master
• Master and Back-up DBs kept synced – Resync when failed CLC is restored
© 2012 Eucalyptus Systems, Inc. -- confidential
Interesting Wrinkles
• CLC and Walrus have externally visible URLs – DNS remapping service is built into the CLC
• What happens if the master loses connectivity with the user?
– Back-up may have an alternative path to user – If DNS remaps, and the back-up becomes active, the system
may experience a “split brain” • Fail stop • Arbitrator service
• Multi-failure can cause split brain – Master fails over, new master fails before original back,
original then brought up => fail stop
© 2012 Eucalyptus Systems, Inc. -- confidential
IAM, Quotas, and Reporting
• IAM is AWS “Identity and Access Management” – Accounts and users, and groups of users – JSON based policies defines calls that users and groups can
execute – Also possible to attach policies to resources S3 (buckets for
now)
• Eucalyptus extends the IAM predicates with inequalities – Implements quotas as tests against IAM policies
• Resource usage information exportable in a variety of formats and through GUI
© 2012 Eucalyptus Systems, Inc. -- confidential
For Example
eucalyptus support sales dev
EC2 image permission
S3 bucket ACL
quota
quota
{ "Version":"2012-‐02-‐12", "Statement":[{ "Sid":"2", "Effect":“Limit", "Action":"ec2:RunInstances", "Resource":"*", "Condition":{ "NumericLessThanEquals":{ "ec2:quota-‐vminstancenumber": "256" } } }] }
© 2012 Eucalyptus Systems, Inc. -- confidential
Evaluation Logic
Sys admin?
Reject Accept
Yes No
Account-level permission satisfied?
Yes
Account admin or
IAM user policy allowed?
No Reject
No
Allocating resources?
Yes
Accept
No Yes
Exceeding Quota?
Reject
Yes No Accept
© 2012 Eucalyptus Systems, Inc. -- confidential
Windows
• Windows images are big – One customer wants 200 GB images – Ephemeral within the C: drive
• Need a way to use CoW to improve Windows launch time
© 2012 Eucalyptus Systems, Inc. -- confidential
The Blob Store
• Blobs are (sparse) files on the file system – remember to use ‘ls –s’ to see disk space allocated – files are mounted on loopback when in use – future implementation could use LVM volumes instead of files
• Mapping and copy-on-write snapshots are implemented using Linux kernel’s device-mapper (same as LVM snapshots)
– once snapshotted or mapped, file access method cannot be used
– i.e., backing file on disk no longer has the bits you want
© 2012 Eucalyptus Systems, Inc. -- confidential
Image -> Instance in the NC Eucalyptus Linux Image on NC
NC cache area
NC work
space
Walrus
ERIEMI EKI
swap
ephemeral0
EMI
ERI
EKI
mkfs.ext3
download download download
mkswap
EMI
ERI
EKI
EMI + KEYPT
swap
ephemeral0
copy
copy
snapEMI + KEY
snap
snap
map
swapephemeral0
map
mapzero
snap
• NC’s cache keeps objects from Walrus and partitions created from scratch, one per size/type
• LRU eviction policy for non-pinned objects limits disk use
• EKI and ERI are copied to work space due to libvirt requirement
• Other objects are snapshotted, tuned, and then mapped to compose the disk
© 2012 Eucalyptus Systems, Inc. -- confidential
What’s Next? • Eucalyptus 3.1 (Q2)
– Refactoring for packaged plug-ins – Postgres instead of MySQL
• Eucalyptus 3.2 (Q4) – Feature release – Possibilities
• ELB, Cloudwatch, Autoscaling • Tags
• Eucalyptus 4 in 2013 and Eucalyptus 5 in 2014 – Application features -> services and API – Operational features -> ease of use, maintenance,
performance
• Please help! – tell us what Eucalyptus needs and when it needs it
© 2012 Eucalyptus Systems, Inc. -- confidential
Thanks!
• [email protected] • @richwolski
Questions?