keystone event processing pipeline on a dockerized microservices architecture

Post on 09-Jan-2017

200 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Keystone Event Processing Pipeline

Zhenzhong Xu

on a Dockerized Microservices Architecture

Real-time Data Infrastructure – Netflix

Cloud Infrastructure - Microsoft

About Me

About Netflix

● 83M+ Subscribers

● 125M+ Streaming Hours / Day

● > 1/3 Peak NA Internet Traffic

● Thousands of Device Types

● Many Tens of Thousands of VMs

● 3 Active-Active Regions Across the World

Observe

Orient

Decide

Act CD

Observe

Orient

Decide

Act

Innovation

CD

Observe

Orient

Decide

Act

Innovation

Big Data

CD

Observe

Orient

Decide

Act

Innovation

Big Data

Culture

CD

Observe

Orient

Decide

Act

Innovation

Big Data

Culture

Cloud

CD

Microservices Ecosystem

Why a Event Processing Platform in Netflix?

● 500+ Billion events generated per day

● 1+T events processed per day

○ >1 PB

○ 4M – 16M / sec

○ 13GB - 43GB /sec

● Message Payload: 3 kb - 10mb

Data Driven Culture

● Realtime System Failure Detection

● A/B Testing

● Recommendation Algorithm

● Fraud Detection

● Distributed Tracing

● Log Quering

Paved Road in a Microservices Ecosystem

Microservices produces events

Storage service, and Batch/Stream

Processing services

Event Processing

Pipeline

Paved Road in a Microservices Ecosystem

Supports Batch & Streaming

Evolution of Netflix Keystone Pipeline

In the Old Days ...

EMR

EventProducers

About a year ago

EventProducer

Druid

Stream Consumers

EMR

ConsumerKafka

Suro Router

EventProducer

Suro

Kafka

SuroProxy

Today

Stream Consumers

SamzaRouter

EMR

FrontingKafka

ConsumerKafka

Control Plane

EventProducer

KS P

roxy

Self Service UI

Event flowKeystone Pipeline As a Service

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Self Service UI

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Self Service UI

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Self Service UI

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Self Service UI

Stream Consumers

SamzaRouter

EMR

FrontingKafka

EventProducer

ConsumerKafka

Control Plane

Self Service UI

What exactly is Keystone?

Keystone is ...

… a collection of microservices & components

Stream Processing

ServiceElastic

Pub/Sub Queue

Producer API

Control Plane

Consumer API

Self Service UI

Keystone is ...… a single self-contained logical service

Event Processing

Pipeline

Keystone is ...… an self-scaling, multi-tenancy service that embraces CI/CD

Keystone is ...… a self healing, cloud failure tolerant service, guarantees at-least-once delivery semantics

Let’s drill down ...

For the purposes of this talk, we’ll focus on...

Stream Processing

Service

Elastic Pub/Sub Queue

Producer SDK

Control Plane

Consumer SDK

Self Service UI

Overview

Self Service UI

Routing Infrastructure

Routing Infrastructure

EC2 InstancesZookeeper(Instance Id assignment)

JobJob

Job

ksnode

Checkpointing Cluster

Server Group (Cluster)Store logs

in S3

Routing Infrastructure

+

CheckpointingCluster

+

0.9.1Go

C language

Control Plane

Custom Cluster Orchestration and Scheduling Layer

Control Plane

• Decides container resources

• Schedules container placements

• Orchestrates cluster deployments

Design Decisions?Distributed System is all about

trade-offs.

Container

● Process Isolation● Fast Startup

Service Protocol

● Declarative

● Idempotent

● Reconciliation

State Management

● Stateless vs Stateful service

● Single source of truth

Scaling

● Self Scaling

● Partition boundary

● Idempotent operations

● Immutable server deployments

Delivery Semantics

● At-most-once

● At-least-once (best effort)

● Exactly-once

At-least-once under failure condition

● Checkpointing mechanism

● Optimize for writes

● Occasional reads

Multi-tenancy

● Isolation

● Heterogenous

● Cluster fragmentation

Failure Recovery

• Back pressure

• Network blip

• Container level failure

• Instance level failure

• Zone level failure

• Cluster level failure - Kafka-Kong

• Regional failure - Chaos-Kong

Stream Processing Engine

• Discovery integration

• Custom wire format integration

• Samza: Per partition serialized process loop

• Samza: Simple payload transformation

• Plugable abstraction

Current Scale - Routing Service

● 14,000 + docker containers

● 1,400 + EC2 C3-4XL instances

● 3 regions

Future Improvements

● Integrate with more sophisticated orchestration/scheduling/cluster management ecosystem

● Unlock value in real-time unbounded data streams

● Data Discovery● Data Silos

Questions?

top related