real-time analytics as a service at king

20

Upload: gyula-fora

Post on 09-Jan-2017

36 views

Category:

Data & Analytics


1 download

TRANSCRIPT

© King.com Ltd 2016 – Commercially confidential

Real-Time Analytics as a Service at KingGyula Fóra

Data Warehouse Engineer

Apache Flink PMC

Page 2

© King.com Ltd 2016 – Commercially confidential

We make awesome mobile games

463 million monthly active users

30 billion events per day

And a lot of data…

Page 3

About King

© King.com Ltd 2016 – Commercially confidential

From streaming perspective…

Page 4

DB

30 billion events / day

Analytics/Processing applications

Terabytes of state

DB

DB

© King.com Ltd 2016 – Commercially confidential

This is awesome, but…

Page 5

End-users are often not Java/Scala developers

Writing streaming applications is pretty hard

Large state and windowing doesn’t help either

Seems to work in my IDE, what next?

We need a “turnkey” solution

© King.com Ltd 2016 – Commercially confidential

The RBea platform

Page 6

Powered by Apache Flink

Scripting on the live streams

Window aggregates

Stateful computations

Scalable + fault tolerant

© King.com Ltd 2016 – Commercially confidential

RBea architecture

Page 7

Events Output

REST API

RBEA web frontend

Libraries

http://hpc-asia.com/wp-content/uploads/2015/09/World-Class-Consultancy-Seeking-Data-Scientist-CA-Hobson-Associates-Matthew-Abel-Recruiter.jpg

Data Scientists

© King.com Ltd 2016 – Commercially confidential

RBea backend implementation

Page 8

One stateful Flink job / game

Stream events and scripts

Events are partitioned by user id

Scripts are broadcasted

Output/Aggregation happens downstream

S1 S2

S3S4

S5

Add/Remove scripts

Event stream Loop over deployed scripts and process

CoFlatMap

Output based on API calls

© King.com Ltd 2016 – Commercially confidential

Dissecting the DSL

Page 9

@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,

Output out,Aggregators agg) {

long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)

Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()

}

© King.com Ltd 2016 – Commercially confidential

Dissecting the DSL

Page 10

@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,

Output out,Aggregators agg) {

long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)

Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()

}

Processing methods by annotation

Event filter conditions

Flexible argument list

Code-generate Java classes

=> void processEvent(Event e, Context ctx);

© King.com Ltd 2016 – Commercially confidential

Dissecting the DSL

Page 11

@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,

Output out,Aggregators agg) {

long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)

Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()

}

Output calls create Output events

Output(KAFKA, “purchases”, “…” )

These events are filtered downstream and sent to a Kafka sink

© King.com Ltd 2016 – Commercially confidential

Dissecting the DSL

Page 12

@ProcessEvent(semanticClass=SCPurchase.class)def process(SCPurchase purchase,

Output out,Aggregators agg) {

long amount = purchase.getAmount() String curr = purchase.getCurrency() out.writeToKafka("purchases", curr + "\t" + amount)

Counter numPurchases = agg.getCounter("PurchaseCount", MINUTES_10) numPurchases.increment()

}

Aggregator calls create Aggregate events

Aggr (MYSQL, 60000, “PurchaseCount”, 1)

Flink window operators do the aggregation

© King.com Ltd 2016 – Commercially confidential

Aggregators

Page 13

long size = aggregate.getWindowSize();long start = timestamp - (timestamp % size);long end = start + size;TimeWindow tw = new TimeWindow(start, end);

Event time windowsWindow size / aggregator

Script1

Script2

Window 1Window 2 NumGames

Revenue

W1: 8999W2: 9001

W1: 200W2: 300

MyAggregator

W1: 10W2: 5

Dynamic window assignment

© King.com Ltd 2016 – Commercially confidential Page 14

RBea physical plan

© King.com Ltd 2016 – Commercially confidential

How do we run Flink

Page 15

Standalone => YARN

Few heavy streaming jobs => more and more

RocksDB state backend

Custom deployment/monitoring tools

© King.com Ltd 2016 – Commercially confidential

Monitoring our jobs

Page 16

© King.com Ltd 2016 – Commercially confidential

King Streaming SDK (sneak preview)

Page 17

Goal: Bridge the gap between RBea and Flink

Build data pipelines from RBea processors

Strict event format, limited set of operations

Easy ”stream joins” and pattern matching

Thin wrapper around Flink

© King.com Ltd 2016 – Commercially confidential

King Streaming SDK (sneak preview)

Page 18

Last<GameStart> lastGS = Last.semanticClass(GameStart.class);

readFromKafka("event.myevents.log", "gyula").keyByCoreUserID().join(lastGS).process((event, context) -> {

context.getJoined(lastGS).ifPresent(lastGameStart -> {

context.getAggregators().getCounter("Purchases", MINUTES_10).setDimensions(lastGameStart.getLevel()).increment();

});});

© King.com Ltd 2016 – Commercially confidential

Closing

Page 19

RBea makes streaming accessible to every data scientist at King

We leverage Flink’s stateful and windowed processing capabilities

People love it because it’s simple and powerful

Thank you!