thomas schreiter insight
TRANSCRIPT
![Page 1: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/1.jpg)
Ingestion Comparison
Thomas Schreiter Insight Data Engineering Fellow
![Page 2: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/2.jpg)
Ingestion = Message Queuing System
ProducerProducerProducersProducerProducerConsumers
![Page 3: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/3.jpg)
Research Question: How fast can data be produced into Kinesis/Kafkaif all producers run on only one node?
ProducerProducerProducers
1x m3.medium
![Page 5: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/5.jpg)
Throughput over #producers
5
![Page 6: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/6.jpg)
6
0"
5000"
10000"
15000"
20000"
25000"
30000"
35000"
1" 2" 5" 10" 20" 50" 100" 200" 500"
Throughp
ut)[m
sg/sec])
Bulk)Size)[msg])
Throughput)over)Bulk)Size)
Ka)a"
Kinesis"
![Page 7: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/7.jpg)
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1 stream
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
![Page 8: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/8.jpg)
logger
metrics
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1 stream
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
![Page 9: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/9.jpg)
logger
metrics
“Message #0 to Kafka @ 12:39:04.300” “Message #1 to Kafka @ 12:39:04.310” …
“Message #0 to Kinesis @ 13:00:05.700” “Message #1 to Kinesis @ 13:00:05.702” …
ProducerProducerProducer.py
ProducerProducerProducer.py4x m3.large
1x m3.medium
1x m3.medium
1x m3.medium
1x t2.micro
1 stream
![Page 10: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/10.jpg)
Engineering Challenges
Install scripts: tried to automate everything ☺
![Page 11: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/11.jpg)
Engineering Challenges
Install scripts: tried to automate everything ☺
broke Kafka installation in Week 2 ☹
![Page 12: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/12.jpg)
Engineering Challenges
Install scripts: tried to automate everything ☺
broke Kafka installation in Week 2 ☹
and again in Week 4 ☹ ☹ ☹
![Page 13: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/13.jpg)
but Engineering puzzles are really fun
☺☺☺
![Page 14: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/14.jpg)
And I read Kafka for the first time
![Page 15: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/15.jpg)
Thomas Schreiter [[email protected]]
M.Sc. + B.Sc. in Computer Science @Karlsruhe Institute of Technology, Germany Ph.D. in Transportation @Delft University of Technology, The Netherlands
Before Insight: Research Engineer in Transportation @UC Berkeley
![Page 16: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/16.jpg)
![Page 17: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/17.jpg)
AWS Costs
17
![Page 18: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/18.jpg)
![Page 19: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/19.jpg)
Throughput over #partitionsThroughput [#msg/sec]
0
300
600
900
1200
1 par__on 2 par__ons 3 par__ons 4 par__ons
Ka`aKinesis
![Page 20: Thomas schreiter Insight](https://reader030.vdocuments.site/reader030/viewer/2022032620/55cf24ffbb61eb9f2c8b4665/html5/thumbnails/20.jpg)
Older resultsThroughput [#msg/sec]
0
500
1000
1500
2000
1 par__on 2 par__ons 3 par__ons 4 par__ons
Ka`aKinesis