going cloud native with ibm cloud and netflixoss for dev@pulse
DESCRIPTION
Dev@Pulse 2014 Lightning Talk. Focused on how to use the IBM Cloud and NetflixOSS for high availability/automatic recovery, elastic and web scale, and high velocity continuous delivery. The talk also includes a live demo of chaos testing (Chaos Gorilla specifically) where the application was shown to have enough high availability to survive an entire datacenter / availability zone outage.TRANSCRIPT
Going cloud native for your applications and services
Jerry CuomoAndrew Spyker
Jerry is going to cover– Our Journey to Cloud Services– Stop along the way, Winning Netflix Cloud Prize– Our Goals in 2014 in delivering Cloud Services
Andrew is going to– Describe “Xen, Methodology, Approach” to
building world-class services– Highlighting new capabilities to support this
methodology, running on IBM Cloud– Prove this by example
Topics
@aspyker
@JerryCuomo
Our Journey to Cloud Services• From my blog– http://bit.ly/cuomoblog
• In 2014, we will continue driving oursoftware to the cloud. To complement our packaged software business, we are transforming our development operations to also deliver our wares as self service cloud-native offerings within the IBM Cloud (SoftLayer, Bluemix, PureApp).
• You know you have a cloud service if it is addressable via URL, has Ts&Cs, and has an operations team running it 24x7x365.
Acme Air and winningthe Netflix Cloud Prize
• Acme Air– Cloud and Mobile Sample and Benchmark
• Acme Air + NetflixOSS + IBM SoftLayer– IBM SoftLayer Port to embrace NetflixOSS platform– Winner: Best Example Mash-Up Application Category
Cloud Services Goals• We will follow the “Zen” of operating cloud services
• “We will rule the cloud, the cloud will not rule us”– Proactive on failure and security testing and auto recovery
• Move from reactive model to predictive model– We are always watching and anticipating
• Scalable service fabric services, ops excellence team– Tools, libraries, services, and practices and COE for cloud
• Focus on key areas including– Elastic and Web Scale– High Availability and Automatic Recovery– High Velocity Continuous Delivery
Elastic and Web Scale
Doing This
Not Doing That
Source: Programmableweb.com 2012
Elastic and Web Scale
…Front end API
(browser and mobile)
AuthenticationService
BookingService
Temporalcaching
DurableStorage
LoadBalancers
…… …
Strategy Benefit
Make deployments automated Without automation impossible
Expose well designed API to users Offloads presentation complexity to clients
Remove state for mid tier services Allows easy elastic scale out
Push temporal state to client and caching tier Leverage clients, avoids data tier overload
Use partitioned data storage Data design and storage scales with HA
HA and Automatic Recovery
Feeling This
Not Feeling That
Micro serviceImplementation
Call “Auth Service”
Highly Available Service Runtime Recipe
Ribbon REST clientwith Eureka
Web AppFront End
(REST services)App Service
(auth-service)
Executeauth-service
call
Hys
trix
EurekaServer(s)
EurekaServer(s)
EurekaServer(s)
Karyon
FallbackImplementation
Implementation Detail Benefits
Decompose into micro services • Key user path always available• Failure does not propagate across service boundaries
Karyon /w automatic Eureka registration • New instances are quickly found• Failing individual instances disappear
Ribbon client with Eureka awareness • Load balances & retries across instances with “smarts”• Handles temporal instance failure
Hystrix as dependency circuit breaker • Allows for fast failure• Provides graceful cross service degradation/recovery
IaaS High Availability
Region (Dallas)
DAL01
Datacenter (DAL06)DAL05
… … …
…
Eureka…
Local LBs
Web App Auth Service Booking Service
Cluster Auto Recovery and Scaling Services
……
… ……
……
… ……Global LoadBalancers …
Rule Why?
Always > 2 of everything 1 is SPOF, 2 doesn’t web scale and slow DR recovery
Including IaaS and cloud services You’re only as strong as your weakest dependency
Use auto scaler/recovery monitoring Clusters guarantee availability and service latency
Use application level health checks Instance on the network != healthy
DEMO TIME!
Let’s prove it
• What is you lost a random instance?
• What if you lost a whole datacenter?
Demonstrated as partof Netflix Cloud prize
bit.ly/noss-sl-blog
DEMO Overview
Region (Dallas)
DAL06
Datacenter (DAL05)
DAL01
… … …
…
Eureka…
Local LBs
Web App Auth Service Booking Service
Cluster Auto Recovery and Scaling Services
…
…… ……
…
…… ……
Global LoadBalancers … ✗Chaos Gorilla
DEMO Success!
Region (Dallas)
DAL06
Datacenter (DAL05)
DAL01
… … …
…
Eureka…
Local LBs
Web App Auth Service Booking Service
Cluster Auto Recovery and Scaling Services
…
…… ……
…
…… ……
Global LoadBalancers …
Chaos Gorilla✗
Online Video(shows recovery as well)
http://bit.ly/sl-gorillavid
Continuous Delivery
Reading This
Not This
ContinuousDelivery
… …v
Cluster v1 Canary v2 Cluster V2
Step Technology
Developers test locally Unit test frameworks
Continuous build Continuous build server based on gradle builds
Build “bakes” full instance image Imaginator (Aminator inspired) creates SoftLayer images
Developer work across dev and test Archaius allows for environment based context
Developers do canary tests, red/black deployments in prod
Asgard console provides app cluster common devops approach, security patterns, and visibility
ContinuousBuild Server Baked to SoftLayer
Image Templates
More details?
• PAS-1418A - Porting the Netflix OSS Cloud Architecture to SoftLayer– Today - 5:00 – 6:00, Room 116
• All code available on Github–netflix.github.io– github.com/EmergingTechnologyInstitute–Blog - iSpyker.blogspot.com– Twitter - @aspyker