extreme programming meets real time data - qcon london · extreme programming meets real time data...

82
Extreme Programming meets Real Time Data Gel Goldsby & Tom Johnson, Unruly

Upload: vuongbao

Post on 01-Jan-2019

220 views

Category:

Documents


0 download

TRANSCRIPT

Extreme Programmingmeets

Real Time Data

Gel Goldsby & Tom Johnson, Unruly

Title position hereYour title sit over itWhen Santa Got Stuck Up The Chimney

Title position hereYour title sit over itWhen Data Got Stuck Up The Chimney

Title position hereYour title sit over itHello. My name is...

Gel GoldsbyReporting and

Data Team Lead

Tom JohnsonSenior Developer

Title position hereYour title sit over itWe Believe In XP

Title position hereYour title sit over itExtreme Programming Values

● Communication

● Simplicity

● Feedback

● Courage

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itSimplicity

Title position hereYour title sit over itOur Reporting Pipeline

pipelineevents

Title position hereYour title sit over itOur Reporting Pipelines

super duper wizzy pipeline

events

old pipeline

Title position hereYour title sit over itShut It Off!

super duper wizzy pipeline

events

old pipeline

Title position hereYour title sit over itA Closer Look At Our Pipeline

pipelineeventsevents consumer

Title position hereYour title sit over itIt’s Not A Truck, It’s A Series of Tubes

events sequencerparser consumernginx

Title position hereYour title sit over itQueueing with S3

events consumer

S3 S3 S3

parser sequencernginx

Title position hereYour title sit over itQueueing with S3

events consumer

S3 S3 S3

S3 S3 S3

parser sequencernginx

Title position hereYour title sit over itWe Need More Power, Cap’n

events sequencerparser consumernginx

Title position hereYour title sit over it

nginx parser

We Need More Power, Cap’n

events sequencerparser consumernginx

Title position hereYour title sit over it

nginxnginx

parserparser

We Need More Power, Cap’n

events sequencerparser consumernginx

Title position hereYour title sit over it

nginxnginx

nginxparser

parserparser

We Need More Power, Cap’n

events sequencerparser consumernginx

Title position hereYour title sit over itTwo Writes Can Make A Wrong

events sequencerparser consumernginx

Title position hereYour title sit over itTwo Writes Can Make A Wrong

events sequencerparser consumernginx

Title position hereYour title sit over itChristmas was saved!

Title position hereYour title sit over itSimplicity

● Each component does one thing

and does it well

Title position hereYour title sit over itJust Another Report, Right?

● Improving targeting

● Correlate events for same ad call

● Need to join on session id

● Needs disaggregated data

Title position hereYour title sit over itAggregation

Campaign Site

Acme Zombo.com

Acme Zombo.com

Acme Zombo.com

Acme Nyan.cat

Brawndo Zombo.com

Brawndo Nyan.cat

Brawndo Nyan.cat

Title position hereYour title sit over itAggregation

Campaign Site

Acme Zombo.com

Acme Zombo.com

Acme Zombo.com

Acme Nyan.cat

Brawndo Zombo.com

Brawndo Nyan.cat

Brawndo Nyan.cat

Title position hereYour title sit over itAggregation

Campaign Site

Acme Zombo.com

Acme Zombo.com

Acme Zombo.com

Acme Nyan.cat

Brawndo Zombo.com

Brawndo Nyan.cat

Brawndo Nyan.cat

Title position hereYour title sit over itAggregation

Count Campaign Site

1 Acme Zombo.com

1 Acme Zombo.com

1 Acme Zombo.com

1 Acme Nyan.cat

1 Brawndo Zombo.com

1 Brawndo Nyan.cat

1 Brawndo Nyan.cat

Title position hereYour title sit over itAggregation

Count Campaign Site

3 Acme Zombo.com

1 Acme Nyan.cat

1 Brawndo Zombo.com

2 Brawndo Nyan.cat

Title position hereYour title sit over itAggregation

Count Campaign Site Lots More

3 Acme Zombo.com ... ...

1 Acme Nyan.cat ... ...

1 Brawndo Zombo.com ... …

2 Brawndo Nyan.cat ... ...

Title position hereYour title sit over itLots of buckets

Title position hereYour title sit over itMicro-Aggregations

● Roughly 20k events per second

● Batched: window size 20s

● x7 reduction factor

● Reduces writes to db

Title position hereYour title sit over itMake America Aggregate Again

● Daily● From ~800 million events● Compacts to ~2 million rows● 400x reduction● Reduces disk usage● Speeds up queries

Title position hereYour title sit over itQuerying data

view

historic data today’s data

userquery

Title position hereYour title sit over itAggregatable facts

Campaign Site

Acme Zombo.com

Acme Zombo.com

Acme Zombo.com

Acme Nyan.cat

Brawndo Zombo.com

Brawndo Nyan.cat

Brawndo Nyan.cat

Title position hereYour title sit over itAdd in session ids

Campaign Site Session Id

Acme Zombo.com Wo5Meiri

Acme Zombo.com Xotaipu6

Acme Zombo.com Xu1goor7

Acme Nyan.cat eVai6OhS

Brawndo Zombo.com oiMoo7Du

Brawndo Nyan.cat aiSh1eej

Brawndo Nyan.cat rae8ieY5

Title position hereYour title sit over itDoes not aggregate well

Campaign Site Session Id

Acme Zombo.com Wo5Meiri

Acme Zombo.com Xotaipu6

Acme Zombo.com Xu1goor7

Acme Nyan.cat eVai6OhS

Brawndo Zombo.com oiMoo7Du

Brawndo Nyan.cat aiSh1eej

Brawndo Nyan.cat rae8ieY5

Title position hereYour title sit over itWhat next?

Title position hereYour title sit over itWhat next? Spikes!

Title position hereYour title sit over itBig Data!

Title position hereYour title sit over itBig data: big choices

● Many options● Available documentation was:

○ Academic○ Evangelical○ Naive/Trivial

Title position hereYour title sit over itSpark!

Title position hereYour title sit over itBig data: big costs

● Infrastructure

● Language (Scala)

● Incompatible with current approach

● Performance tradeoffs

Title position hereYour title sit over itWhy we could step away

● Understood our data better

● Underestimated costs

● We know our code

● We can change our code

Title position hereYour title sit over itFeedback

● Regular retrospectives

● Shared understanding of “research”

● Shared understanding of value

Title position hereYour title sit over itCourage

● Not afraid to try new things

● Not afraid to change direction

● Not lured by what we “ought” to do

Title position hereYour title sit over itThe Shape of our Data

Title position hereYour title sit over itThe Shape of our Data

Disaggregated

Title position hereYour title sit over itThe Shape of our Data

Disaggregated

Unsampled

Title position hereYour title sit over itThe Shape of our Data

Disaggregated

Unsampled

Real Time

Title position hereYour title sit over itProgrammatic Pacing

Disaggregated

Unsampled

Real Time

Title position hereYour title sit over itOperational Debugging

Disaggregated

Unsampled

Real Time

Title position hereYour title sit over itAuction Data

Disaggregated

Unsampled

Real Time

Title position hereYour title sit over itAdvertising 101

user loads page

paymentsauctionad call user interaction

Title position hereYour title sit over itFunnel of data

user loads page

paymentsauctionad call user interaction

Title position hereYour title sit over itPipelines to match data shape

user loads page

paymentsauctionad call user interaction

Title position hereYour title sit over itOur Actual Reporting Pipelines

payments pipeline

events

ad call pipeline

user interaction pipeline

auction pipeline

Title position hereYour title sit over itWhen We Get Overloaded...

payments pipeline

events

ad call pipeline

user interaction pipeline

auction pipeline

Title position hereYour title sit over itWhen We Get Overloaded...

payments pipeline

events

ad call pipeline

user interaction pipeline

auction pipeline

Title position hereYour title sit over itWhen We Get Overloaded...

payments pipeline

events

ad call pipeline

user interaction pipeline

auction pipeline

Title position hereYour title sit over itEnsuring real time performance

Title position hereYour title sit over itEnsuring real time performance

Title position hereYour title sit over itCommunication

● How data was used● Performance requirements

○ What was needed○ What wasn’t needed○ Hard vs soft requirements

Title position hereYour title sit over itSimplicity

● Green cards

● 10 pair-days total

● Incremental

● Separable

Title position hereYour title sit over itLet's talk about our databases

Title position hereYour title sit over itRow-based database

Column A Column B Column C Column D Column E

Title position hereYour title sit over itRow-based database

Column A Column B Column C Column D Column E

Title position hereYour title sit over itColumnar database

Column A Column B Column C Column D Column E

Title position hereYour title sit over itRow-based database

Column A Column B Column C Column D Column E

Title position hereYour title sit over itColumnar database

Column A Column B Column C Column D Column E

Title position hereYour title sit over itVectorwise or Postgres?

Title position hereYour title sit over itQuery-based routing

apiuserquery

VectorwisePostgres

Title position hereYour title sit over itQuery-based routing

apiuserquery

VectorwisePostgres

Title position hereYour title sit over itQuery-based routing

apiuserquery

VectorwisePostgres

Title position hereYour title sit over itConclusion

Title position hereYour title sit over itConclusion

● Simplicity

● Communication

● Feedback

● Courage

Title position hereYour title sit over itThank you!

Title position hereYour title sit over itQuestions?

(this space intentionally left blank)