jitney, kafka at airbnb
TRANSCRIPT
![Page 1: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/1.jpg)
Jitney, Kafka at Airbnb
ALEXIS MIDON & KRISHNA PUTTASWAMY / 2016-02-23 / KAFKA MEETUP
![Page 2: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/2.jpg)
Jitney ?!
a bus carrying passengers for a low fare
![Page 3: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/3.jpg)
Some Kafka Facts• 1 Production cluster
• v0.8.2
• 90 “small” brokers, d2.2xlarge
• 70 topics
• Replication Factor of 3
• 5 Billions events / day
• IN: 80MB / second
• OUT: 1.5GB / second
• Network bound
• Super stable
![Page 4: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/4.jpg)
Why Jitney?
![Page 5: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/5.jpg)
Pick any metric
![Page 6: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/6.jpg)
Standardization!
![Page 7: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/7.jpg)
What Use Cases ?
![Page 8: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/8.jpg)
Classic Message Bus
Jitney
MySQL
Monorail
MySQL Elasticsearch
Jitney Client & Schemas
![Page 9: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/9.jpg)
• Decouple Services
• Standard Events
• At-least once delivery
• Standard clients, for Java and Ruby
• Easy to use
• Conventions over configuration
Message Bus
![Page 10: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/10.jpg)
• Site and image load times, OOM events
• Searches, requests, bookings, etc. • Experiment assignments
• Event data is critical for building data products • Data ingestion should be reliable: timely and complete
User Activity Logging
![Page 11: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/11.jpg)
• JSON events without schemas
• Easy to break events during evolution/code changes • One topic overall for 800+ event types • Improper producer configs • Lack of monitoring
• Lead to: • Too many data outages, data loss incidents • Lack of trust on data systems
Challenges
![Page 12: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/12.jpg)
Data Stability
CEO dashboard and Magical booking dashboards were regularly broken.
A Year Ago
![Page 13: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/13.jpg)
Data Stability
ERF was unstable and experimentation culture was weak
Hi team,
This is partly a PSA to let you know ERF dashboard data hasn't been up to date/accurate for several weeks now. Do not rely on the ERF dashboard for information about your experiment.
A Year Ago
![Page 14: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/14.jpg)
Join Forces!Data Infrastructure & Production Infrastructure
![Page 15: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/15.jpg)
Jitney Components
![Page 16: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/16.jpg)
Jitney Components
Schema Repository
![Page 17: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/17.jpg)
Thrift Schema Repository
Why Thrift?
• Easy syntax
• Good performance in Ruby
• Ubiquitous
Advantages of schema repo?
• Great Catalyst for communication, documentation, etc
• it ships jar and gems
• Will developers hate you for this? no
![Page 18: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/18.jpg)
• Standard Field in the event schema
• Managed Explicitly
• use Semantic Versioning:
1.0.0 = MODEL . REVISION . ADDITION
MODEL is a change which breaks the rules of backward compatibility.
Example: changing the type of a field.
REVISION is a change which is backward compatible but not forward compatible.
Example: adding a new field to a union type.
ADDITION is a change which is both backward compatible and forward compatible.
Example: adding a new optional field.
Schema Evolution
![Page 19: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/19.jpg)
Example of Thrift Event
because the event is your API
![Page 20: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/20.jpg)
Jitney Components
Schema Repository Topic Repository
![Page 21: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/21.jpg)
Topic Repository
• Declare all Jitney topics
• Aggregate all characteristics of a topic:
name
ordering (partitioning function)
white list of accepted schemas
• Great for documentation purposes
• DRY
![Page 22: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/22.jpg)
Example of a Topic
![Page 23: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/23.jpg)
Jitney Components
Schema Repository Topic Repository Clients
![Page 24: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/24.jpg)
Jitney Clients
• Kafka clients are hard to use correctly
• it’s better with 0.9
• Committing offsets is tricky, someone will get it wrong
• even with 0.9
• Configuration is a mess
![Page 25: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/25.jpg)
Jitney Clients
it provides:
• metrics reporting: github.com/airbnb/kafka-statsd-metrics2
• configuration for default clusters
• built-in support for Schema Repository and Topic Repository
Consumer:
• offset management to implement at-least once delivery
• polymorphic dispatching to event handler
![Page 26: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/26.jpg)
Example of a Java Producer
![Page 27: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/27.jpg)
Example of a Java Consumer
![Page 28: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/28.jpg)
Jitney Components
Schema Repository Topic Repository Clients
HTTP Proxy
![Page 29: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/29.jpg)
Jitney Components
Schema Repository Topic Repository Clients
HTTP Proxy Warehouse Integration
![Page 30: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/30.jpg)
Data Ingestion Pipeline
• Stack: Jitney, Spark Streaming, HBase, HDFS • Spark Streaming 1.5 with Kafka “direct” connect • Process 1 minute batches • Write to HBase after deserializing with the right schema • Dump data to HDFS every hour (with dedup) and add a Hive partition • But live data can be queried via “current” partition
![Page 31: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/31.jpg)
Data Ingestion Pipelineend to end
![Page 32: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/32.jpg)
Audit
124
124
3
![Page 33: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/33.jpg)
Event Schema for Audit Metadata
![Page 34: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/34.jpg)
How is Jitney used in the org?
![Page 35: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/35.jpg)
DB change ingestion
Payment processing via
pub/sub
Experimentation
User activity ingestion Cache invalidation
Use cases currently powered
![Page 36: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/36.jpg)
Key take aways
1 2 3
Standardization! Auditing Pipeline Huge Advantage for the organization
![Page 37: Jitney, Kafka at Airbnb](https://reader034.vdocuments.site/reader034/viewer/2022052122/589c56dc1a28abc4358b4d91/html5/thumbnails/37.jpg)
Thank You!