mircoservices, dev ops and engineering best practices at wix.com
TRANSCRIPT
@aviranm
Aviran MordoHead of
Microservices, DevOps & Engineering at Wix.com
www.linkedin.com/in/aviran
@aviranm
http://www.aviransplace.com
@aviranm
@aviranm
Wix In NumbersOver 87M users
Static storage is >2Pb of data
3 data centers + 2 clouds (Google, Amazon)
2B HTTP requests/day
1200 people work at Wix
@aviranm
Over 200 Microservices on Production
@aviranm
Microservices - What Does it Take
@aviranm
How to Get There? (Wix’s journey)
http://gpstrackit.com/wp-content/uploads/2013/11/VanishingPointwRoadSigns.jpg
@aviranmhttp://p1.pichost.me/i/11/1339236.jpg
In the Beginning of Time
About 6 years ago
@aviranm
The Monolithic GiantOne monolithic server that handled everything
Dependency between features
Changes in unrelated areas caused deployment of the whole system
Failure in unrelated areas will cause system wide downtimeLighttpd
(file serving) MySQLDB
Wix(Tomcat)
@aviranm
Breaking the System Apart
https://upload.wikimedia.org/wikipedia/commons/6/67/Broken_glass.jpg
@aviranm
@aviranm
Concerns and SLA
Many feature request
Lower performance requirement
Lower availability requirement
Write intensive
Edit websites
Not many product changes
High performance
High availability
Read intensive
View sites, created by Wix editor
@aviranm
Mono-Wix
Phase 1
@aviranm
Extract Public Service
Editor service (Mono-Wix) Public service
@aviranm
Divide and Conquer
Editor service Public service
Guideline: No runtime, deployment or data dependency
@aviranm
Separation by Product LifecycleDecouple architecture => Decouple teams
Deployment independence
Areas with frequent changes
Editor service Public service
@aviranm
Separation by Service LevelScale independently Use different data storeOptimize data per use case (Read vs Write)Run on different datacenters / clouds / zonesSystem resiliency (degradation of service vs. downtime)Faster recovery time
Editor service Public service
@aviranmhttp://blogs.adobe.com/captivate/2011/03/training-adding-interactivity-to-elearning-courses-with-adobe-captivate-5.html/time-to-learn-clock
@aviranm
Service Boundary
@aviranm
Separation of DatabasesCopy data between segments
Optimize data per use case (read vs. write intensive)
Different data stores
Public serviceEditor serviceCopy necessary data
@aviranm
Serialization
@aviranm
Serialization / ProtocolBinary?JSON / XML / Text?HTTP?
Public serviceEditor service ?
@aviranm
Serialization / Protocol - TradeoffsReadability?
Performance?Debug?Tools?Monitoring?Dependency?
Public serviceEditor service ?
@aviranm
API Transport/Protocol
@aviranm
How to Expose an APIREST?RPC?SOAP?
Public serviceEditor service ?
@aviranm
Wix’s Choices RESTHTTP
Public serviceEditor service
BinaryJSON-RPCHTTP
@aviranm
API Versioning
@aviranm
API Versioning
Public serviceEditor serviceBackward compatibility
Maybe here
API Schema /v1/v2YAGNI
@aviranm
Service Discovery
@aviranm
Service Discovery
Public serviceEditor serviceConfiguration (DNS+LB)
Zookeeper?Consul?Etcd?Eureka?
YAGNI
@aviranm
Resilience
@aviranm
What does the Arrow Mean?
Public serviceEditor service
Failure Point
Every Network Hop Reduces Availability
@aviranm
Failure Points = Network I/O
Public serviceEditor service
Failure Point
Retry policyCircuit breakerThrottlers
Be careful – you may cause downtime
Retry only on idempotent operations
@aviranm
Degradation of Service
Public serviceEditor service
Failure Point
Feature killer (Killer feature)FallbacksSelf healing
@aviranm
Testing
@aviranm
Test a Distributed System (at Wix)
Public serviceEditor service
Unit TestIntegration TestServer E2EAutomation
Client
@aviranm
Distributed Logging
@aviranm
Build visibility into service
@aviranm
What is the Right Size of a Microservice?
@aviranm
The Size of a Microservice is the Size of the Team That is
Building it.
“Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations”Conway, Melvin
@aviranm
How many Engineers can Work on a Single Project ?
http://cdn.wp.sunmotors.co.uk/get/2014/03/cars.28.620x413.jpg
@aviranm
Microservices = Small TeamsSmall Teams = Small Rooms
@aviranm
@aviranm
Ownership
@aviranm
Team Work
Microservice is owned by a team
You build it – you run it
No microservice is left without a clear owner
Microservice is NOT a library – it is a live production system
@aviranm
Culture that ROQS
ROQS
esponsibility
wnership
uality
haring@aviranm
@aviranm
What Is The Common Denominator? Product manager
Project manager
QA
Operations
DBA
Develperscan dothese jobs
@aviranm
@aviranm
Developer
Product
QA
Management
Operation
BI
Dev Centric Culture – Involve The Developer Product definition (with product)
Development
Testing (with QA engineers)
Deployment / Rollback (with operations)
Monitoring / BI (with BI team)
DevOps – to enable deployment and rollback, fully automated
Support Circle
@aviranm
What did you Learn from Just 2 Services
● Service boundary● Monitoring infrastructure● Serialization format● Synchronous communication protocol (HTTP/Binary)● Asynchronous (queuing infra)● Service SLA● API definition (REST/ RPC / Versioning)● Data separation● Deployment strategy● Testing infrastructure (integration test, e2e test)● Compatibility (backwards / forward)
@aviranm
Continue to Extract More Microservices
@aviranm
HTML Editor
Flash Editor
MSM
Private Media
Public Media
Editor Segment Public Segment
Premium Services
List DB
App Builder
App Store
App Market
Dashboard
Mailer
TimeZone
Public HTML API
Public API (Flash)
MSP
Public Server
HTML Renderer
HTML SEO Renderer
Flash Renderer
Flash SEO Renderer
Sitemap Renderer
Robots.txt Renderer
User Server
Template Viewer
ContactsHUBActivity
Site Members
Store MgrComments
Snapshoter
User Pref
Feed Me
Shout-out Hotels
PETRI
Site Pref
Dist LoggerSlicer
eCom Renderer
eCom Cart
eCom Checkout
eCom Catalog
eCom Orders
Payment Facade
Account Info
HTML API
HTML Embeder
BlogMobile
Mostly writes
2 Data centers
Db active-standby (preferably active-active)
Performance < 2s 99%
Serves mostly site builders
Uptime > 99.9
Mostly reads
>2 Data centers
Db active-active(-active)
Performance < 500ms 99%
Serves mostly site viewers
Uptime > 99.99
@aviranm
When to Extract a New Microservice
@aviranm
Microservice or Library?
Do I create deployment dependency?
What is DevOps overhead (managing middleware) ?
Who owns it?
Does it have its own development lifecycle?
Does it fit the scalability / availability concerns?
Can a different team develop it?
I need time zone from an IP address
@aviranm
Microservice has Ops, Library is Only Computational
@aviranm
Which Technology Stack to Use
@aviranm
Free to Chose?Microservices gives the freedom to use a different technology stacks.Enables innovation
@aviranm
Default to the Stack You Know how to Operate.
@aviranm
Innovate on Non Critical Microservices and Take Full Responsibility for its Operation.
@aviranmhttp://wallpaperbeta.com/dogs_kiss_noses_animals_hd-wallpaper-242054/
@aviranm
Why Microservices
Scale engineering
Development Velocity
Scale system ?
@aviranm
Microservices is the First Post DevOps Architecture
@aviranm
Every Microservice is a Overhead
@aviranm
Wix’s Company Model
Company
Guild
Guild
Company
Gang GangGang Gang
Product Product
Cross-Engineering Teams
A Guild is a group of people that share expertise, knowledge, tools, code and practice
Company focus on large segmentHas all the resources it needs to be independentPeople within the company are aligned with the Guilds
@aviranm
Production State Changes Every 5 min
@aviranm @aviranm
@aviranm @aviranm
Quality = Better Engineers
Better Engineers = Professional Growth
Professional Growth = Investment in People
@aviranm
Guild Day
Engineers work 4 days with their company
Thursday is Guild day.
Developers conduct quality enhancing activities with the Guild.
@aviranm
Guild Day Goals Builds cross-team relationships
Shares knowledge
Assimilate the culture
Lesson learned
Continuous improvement
Promotes innovation
Professional development@aviranm
@aviranm
Guild Day Schedule 10:00-11:00 Open Space11:00-11:15 Break11:15-11:40 Project spotlight11:45-13:00 Tech talk or Workshop
@aviranm
@aviranm
Open Space
@aviranm
Post MortemAny production downtime is investigatedUnderstand the causeLesson learnedAI to avoid same issue in the futureFindings are published in dapulse (public forum)
@aviranm
Guild Week – Games of GangsOne week each quarterPair programming with a person from another company
Enhancing infrastructureBuilding toolsHelping companiesWork on open sourceGoal #1 – Improve engineering skills and
quality
@aviranm
FastFeatures
BetterQuality
@aviranm
Thank You
@aviranm
Q&A@aviranm
http://www.aviransplace.com
www.linkedin.com/in/aviran
Aviran MordoHead of Engineering
http://engineering.wix.com
@WixEng