(ism309) efficient innovation:high-velocity cost management at netflix
TRANSCRIPT
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Andrew Park, Manager – Technology FP&A
10.7.15
Efficient InnovationHigh Velocity Cost Management at Netflix
What to Expect from the Session
• Managing efficiency vs. innovation & availability
• How and why cloud mgmt. has changed at Netflix
• Best practices & future goals
The Efficiency Challenge
Netflix: world’s largest
subscription Internet TV
business
Business Stats:
>60m members
2,000+ employees
80+ countries
>100m hours watched per day
Strategy: innovation & availability
prioritized above efficiency
Engineering Stats:
1,400 Tech & Dev. engineers
40+ independent teams
500+ microservices
90,000+ instances (~15% autoscaling)
Cloud Cost Timeline
0
0.2
0.4
0.6
0.8
1
Aug-10
Oct-10
Dec-10
Feb-11
Apr-11
Jun-11
Aug-11
Oct-11
Dec-11
Feb-12
Apr-12
Jun-12
Aug-12
Oct-12
Dec-12
Feb-13
Apr-13
Jun-13
Aug-13
Oct-13
Dec-13
Feb-14
Apr-14
Jun-14
Aug-14
Oct-14
Dec-14
Feb-15
Apr-15
Jun-15
0
0.2
0.4
0.6
0.8
1
1.2
Aug-10
Oct-10
Dec-10
Feb-11
Apr-11
Jun-11
Aug-11
Oct-11
Dec-11
Feb-12
Apr-12
Jun-12
Aug-12
Oct-12
Dec-12
Feb-13
Apr-13
Jun-13
Aug-13
Oct-13
Dec-13
Feb-14
Apr-14
Jun-14
Aug-14
Oct-14
Dec-14
Feb-15
Apr-15
Jun-15
~Not 20x Growth
Dollars
(normalized)
Cost per stream
(normalized)
The Foundations Present Day The Dream
Our foundations
• Forming the Cloud Capacity Team
• Matching processes with strategy
• Developing transparency tools
The Who: Cloud Capacity Planners
“Serving customers, not keeping gates”
Main Responsibilities
Strategy & Operations
• Scalable cloud growth
• Ad-hoc internal consulting
• Capacity liaisons with AWS
Capacity Planning
• Purchasing capacity
• Planning with major teams
• Retroactive RI purchasing
Retroactive Reservation Purchasing
• Look back purchases of on-demand
• Bi-weekly process includes rebalancing unused RIs
• Purchasing considerations:
Implementing Retroactive Purchasing
• Assumptions & processes required:
• Understanding about infrastructure & usage by account
• Irregular growth to be communicated
• Robust usage dashboards & cost tools
• Benefits:
• Lowered deployment friction accelerating innovation
• Reduced operational overhead management
• Potential reduction of “capacity sandbagging”
On-demand in Aug’15
C*
scale-up
Edge service
migration
Global
memcache
replication
General Cloud Capacity Strategies
• Strat: Service oriented architecture @ massive scale
• Process: centralized cloud capacity planning function
• Strat: Unconstrained deployment capabilities
• Process: develop contextual efficiency information via tooling
• Strat: Improve overall availability
• Process: dedicated failover capacity, critical services on
general instance families
Transparency Through Tooling
• Invest in robust tooling for capacity team
• Reveal AWS usage & cost back to service teams
• Select a business metric to set growth context
Historic Cloud Cost Email
KPI dashboard
Detailed Cost Dashboards - Demo
VP Level
Director / 1st Level
Engineering Manager
Application Level
Detailed Cost Dashboards - Demo
VP Level
Director / 1st Level
Engineering Manager
Application Level
Detailed Cost Dashboards - Demo
VP Level
Director / 1st Level
Engineering Manager
Application Level
Netflix Today
• Decentralizing cloud cost responsibilities
• Active ROI analysis with largest service teams
• Exposing efficiency metrics beyond instance usage
Real Time Actionable Data
“ROI” Based Mindset
• Cause: today’s scale requires thoughtful deployment
strategy & service architecture
• Cloud capacity team engaged on a per-project basis
Pro-tip: Internal Unused RI Borrowing
Prod Heavy
RI Usage
Encoding
Heavy RI
Borrowed
Efficiency Score Cards
The Future of Cloud Cost Management
• Bin-packing at the service level through containers
• Dynamic traffic shifting between regions
• Automated ROI calculations at testing & deployment
Session Recap
• EC2: Everyone Contributes Cost
• S3: Smart Strategy Saves
• Data Transfer: Develop Transparency
• RDS: Remove Datacenter Sentiment
• Managing efficiency vs. innovation & availability
0
0.2
0.4
0.6
0.8
1
1.2Aug-10
Oct-10
Dec-10
Feb-11
Apr-11
Jun-11
Aug-11
Oct-11
Dec-11
Feb-12
Apr-12
Jun-12
Aug-12
Oct-12
Dec-12
Feb-13
Apr-13
Jun-13
Aug-13
Oct-13
Dec-13
Feb-14
Apr-14
Jun-14
Aug-14
Oct-14
Dec-14
Feb-15
Apr-15
Jun-15
Remember to complete
your evaluations!
Related Sessions
• SPOT302 – Availability: The New Kind of Innovator’s Dilemma
• DVO203 – A Day in the Life of a Netflix Engineer using 37% of the
Internet
• ISM301 – Engineering Netflix Global Operations in the Cloud