using spark at vungle

57
Introduction Old Architecture New Architecture Decoupling Streaming Conclusion 1

Upload: vungle

Post on 06-Aug-2015

351 views

Category:

Mobile


0 download

TRANSCRIPT

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

1

2

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

● Introduction

● Old Architecture

● New Architecture

● Decoupling

● Streaming

● Conclusion

3

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

● Legacy Java Process○ “Crunches” data○ Sends data downstream to our own datastores and to 3rd party

analytics○ Runs every hour

● Growth○ Process can run over an hour○ 12 GB -> 24GB heap in less than 1 year○ Cron is a horrible job management system○ A failure requires rerunning a job from the beginning

● 2.0○ Horizontably scalable○ Real Time ETL○ Reuesable

4

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

ETL @ Vungle

● ~1 Billion Events / Day

● Deduplication

● Calculating $$$

● Outputting data to various destinations

5

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Old Architecture

6

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

7

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

8

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

9

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

10

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

11

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

12

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

13

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

14

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

New Architecture

15

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

16

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

17

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

18

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

19

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

20

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

21

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

22

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Decoupling

23

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

24

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

25

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

26

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

27

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

28

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

29

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

30

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

31

Introduction Problem Decoupling Streaming Conclusion

Setup connection and spark streams

Map each line of log into Mongo Objects and insert into mongo

32

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Setup connection and spark streams

33

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Mapping to Mongo objects and insertions

34

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Questions

35

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Streaming

36

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

37

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

38

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

39

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Ingestion

40

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Event ID Request View Install ... Request Added

View Added

Install Added

Value

Ingestion Table Schema

41

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

... Date Time Deliveries Views Installs Processed Deliveries

Processed Views

Processed Installs

Fact Table Schema

42

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Ingestion

43

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

44

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

45

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

46

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

47

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

48

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

49

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Process

50

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

51

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

52

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

53

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

54

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

55

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Next Steps

● Switching from JSON to ProtoBuf

● Using YARN to run multiple jobs on one cluster

● Data Science

● Who knows?

56

Introduction Old Architecture New Architecture Decoupling Streaming Conclusion

Questions

Thank you!

57