real-time data real-time data processing at rtb house ... · real-time data processing at rtb house...

58
REAL-TIME DATA PROCESSING AT RTB HOUSE REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz Łoś, 2019

Upload: others

Post on 31-May-2020

31 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

REAL-TIME DATA PROCESSING AT RTB HOUSEREAL-TIME DATA PROCESSING AT RTB HOUSEReal-Time Data Processingat RTB House

How we have grown 10x within 2 years

Bartosz Łoś, 2019

Page 2: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

AGENDA

● our RTB platform

Page 3: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

AGENDA

● our RTB platform● the previous iterations: three different architectures

Page 4: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

AGENDA

● our RTB platform● the previous iterations: three different architectures● the fourth iteration: multi-dc architecture

Page 5: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

AGENDA

● our RTB platform● the previous iterations: three different architectures● the fourth iteration: multi-dc architecture● our use cases: requirements and processing patterns

Page 6: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

AGENDA

● our RTB platform● the previous iterations: three different architectures● the fourth iteration: multi-dc architecture● our use cases: requirements and processing patterns● kafka workers

Page 7: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

OUR RTB PLATFORM

Page 8: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

OUR RTB PLATFORM: THE CONTEXT

Page 9: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

OUR RTB PLATFORM: THE CONTEXT

Page 10: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE PREVIOUS ITERATIONS

Page 11: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 1ST ITERATION: MUTABLE IMPRESSIONS

Page 12: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 2ND ITERATION: LAMBDA ARCHITECTURE

Page 13: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 3RD ITERATION: IMMUTABLE STREAMS OF EVENTS

Page 14: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE FOURTH ITERATION: MULTI-DC

Page 15: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MAIN CHANGES

● 10x larger scale:● from 350K to 3.5M bid requests/s within 2 years

Page 16: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MAIN CHANGES

● 10x larger scale:● from 350K to 3.5M bid requests/s within 2 years

● full multi-dc architecture:● synchronization of user profiles● merging streams of events

Page 17: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MAIN CHANGES

● 10x larger scale:● from 350K to 3.5M bid requests/s within 2 years

● full multi-dc architecture:● synchronization of user profiles● merging streams of events

● fixed partitioning in all DCs:● parallelism, merging, end-to-end lag

Page 18: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MAIN CHANGES

● 10x larger scale:● from 350K to 3.5M bid requests/s within 2 years

● full multi-dc architecture:● synchronization of user profiles● merging streams of events

● fixed partitioning in all DCs:● parallelism, merging, end-to-end lag

● end-to-end exactly-once processing:● at-least-once output semantics & deduplication

Page 19: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MAIN CHANGES

● 10x larger scale:● from 350K to 3.5M bid requests/s within 2 years

● full multi-dc architecture:● synchronization of user profiles● merging streams of events

● fixed partitioning in all DCs:● parallelism, merging, end-to-end lag

● end-to-end exactly-once processing:● at-least-once output semantics & deduplication

● a few better components:● new stats-counter, new data-flow● logstash● merger, dispatcher & loader

Page 20: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 21: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 22: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 23: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 24: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 25: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 26: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 27: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 28: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 29: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 4TH ITERATION: MULTI-DC ARCHITECTURE

Page 30: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

OUR USE CASES

Page 31: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

STATS-COUNTER: STORM TOPOLOGY (THE 2ND ITERATION)

Page 32: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

APACHE STORM: TRIDENT + EXACTLY-ONCE STATE

Page 33: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

APACHE STORM: PARALLELISM MODEL

Page 34: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

MERGER (THE 4TH ITERATION)

Page 35: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

MERGER: KAFKA CONSUMER API

Page 36: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

DATA-FLOW: KAFKA STREAMS (THE 4TH ITERATION)

Page 37: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA STREAMS: PARALLELISM MODEL

Page 38: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA STREAMS: PARALLELISM MODEL

Page 39: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA STREAMS: EXACTLY-ONCE DELIVERY

Kafka Streams:● processing.guarantee = exactly-once

Page 40: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA STREAMS: EXACTLY-ONCE DELIVERY

Kafka Streams:● processing.guarantee = exactly-once

Producer:● transactions● enable.idempotence = true

Page 41: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA STREAMS: EXACTLY-ONCE DELIVERY

Kafka Streams:● processing.guarantee = exactly-once

Producer:● transactions● enable.idempotence = true

Consumer:● isolation.level = read_committed

Page 42: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS

Page 43: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution

Page 44: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution

Page 45: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution

public interface WorkerPartitioner<K, V> {

int subpartition(ConsumerRecord<K, V> consumerRecord);

}

Page 46: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition

Page 47: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition

Page 48: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition

public interface WorkerTask<K, V> {

boolean accept(WorkerRecord<K, V> record);

void process(WorkerRecord<K, V> record, RecordStatusObserver observer);

}

Page 49: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

Page 50: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

public interface RecordStatusObserver {

void onSuccess();

void onFailure(Exception exception);

}

Page 51: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

Page 52: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

● at-least-once semantics

Page 53: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

● at-least-once semantics● handling failures

Page 54: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

● at-least-once semantics● handling failures● kafka-to-kafka, hdfs, bigquery, elasticsearch connectors

Page 55: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: MAIN FEATURES

● higher level of distribution● possibility to pause and resume processing for given partition● asynchronous processing

● tighter control of offsets commits● backpressure● processing timeouts

● at-least-once semantics● handling failures● kafka-to-kafka, hdfs, bigquery, elasticsearch connectors● github.com/RTBHOUSE/kafka-workers

Page 56: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

KAFKA WORKERS: PARALLELISM MODEL

Page 57: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

THE 5TH ITERATION: KAFKA WORKERS

Page 58: Real-Time Data REAL-TIME DATA PROCESSING AT RTB HOUSE ... · REAL-TIME DATA PROCESSING AT RTB HOUSE Real-Time Data Processing at RTB House How we have grown 10x within 2 years Bartosz

techblog.rtbhouse.com/jobs