![Page 1: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/1.jpg)
Kafka: a Distributed Messaging System for Log Processing
Jay Kreps, Neha Narkhede, Jun RaoLinkedIn
![Page 2: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/2.jpg)
AGENDA
• Kafka usage at LinkedIn
• Kafka design
• Kafka roadmap
![Page 3: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/3.jpg)
ABOUT LINKEDIN
• Professional social network platform
• top 50th largest site in the world (traffic)
• 100M+ members
![Page 4: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/4.jpg)
LOGGING OVERVIEW• Many types of events
• user activity events: impression, search, ads, etc
• operational events: call stack, service metrics, etc
• High volume: billions of events per day
• Both online and offline use case
• reporting, batch analysis
• security, news feeds, performance dashboard, ...
![Page 5: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/5.jpg)
DEPLOYMENT
Frontend Frontend Frontend
VIP
KafkaKafkaKafka
Realtimeservice
Realtimeservice
OracleAsterdata
Main site
KafkaKafkaKafka
Analysis site
Hadoop
![Page 6: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/6.jpg)
KAFKA DESIGN PRINCIPLES
• Simple API
• Efficient
•Distributed
![Page 7: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/7.jpg)
PRODUCER API
void send(String topic, ByteBufferMessageSet messages)
producer = new KafkaProducer(…); message = new Message(“test message str”.getBytes()); set = new ByteBufferMessageSet(message); producer.send(“test”, set);
![Page 8: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/8.jpg)
CONSUMER API
streams[] = Consumer.createMessageStreams(“test”, 1)
for(message: streams[0]) { bytes = message.payload() // do something with bytes}
![Page 9: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/9.jpg)
EFFICIENCY #1: SIMPLE STORAGE
• Each topic has an evergrowing log
• A log == a list of files
• A message is addressed by a log offset
![Page 10: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/10.jpg)
EFFICIENCY #2: CAREFUL TRANSFER
• Batch send and fetch
•No message caching in Kafka layer
• Rely on file system page cache
•mostly, sequential access patterns
• Zero-copy transfer : file -> socket
![Page 11: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/11.jpg)
EFFICIENCY #3: STATELESS BROKER
• Each consumer maintains its own state
•Message deletion driven by retention policy, not by tracking consumption
• acceptable in practice
• rewindable consumer
![Page 12: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/12.jpg)
AUTO CONSUMER LOAD BALANCING
• brokers and consumers register in zookeeper
• consumers listen to broker and consumer changes
• each change triggers consumer rebalancing
broker broker broker broker
consumer
zookeeper
consumer
![Page 13: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/13.jpg)
PRODUCER PERFORMANCE
!
![Page 14: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/14.jpg)
CONSUMER PERFORMANCE
!
![Page 15: Jay Kreps, Neha Narkhede, Jun Rao LinkedIn · 2019-07-17 · Kafka: a Distributed Messaging System for Log Processing Jay Kreps, Neha Narkhede, Jun Rao LinkedIn](https://reader035.vdocuments.site/reader035/viewer/2022070919/5fb885583687392f216d56e9/html5/thumbnails/15.jpg)
ROADMAP
•New Kafka features
• compression
• replication
• stream processing (online M/R)
• http://sna-projects.com/kafka/