fluentd and kafka
TRANSCRIPT
![Page 1: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/1.jpg)
Fluentd and KafkaHadoop / Spark Conference Japan 2016 Feb 8, 2016
![Page 2: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/2.jpg)
Who are you?
• Masahiro Nakagawa • github: @repeatedly
• Treasure Data Inc. • Fluentd / td-agent developer • Fluentd Enterprise support
• I love OSS :) • D Language, MessagePack, The organizer of several meetups, etc…
![Page 3: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/3.jpg)
Fluentd• Pluggable streaming event collector
• Lightweight, robust and flexible • Lots of plugins on rubygems • Used by AWS, GCP, MS and more companies
• Resources • http://www.fluentd.org/ • Webinar: https://www.youtube.com/watch?v=6uPB_M7cbYk
![Page 4: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/4.jpg)
Popular case
App
Push
Push
Forwarder Aggregator Destination
![Page 5: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/5.jpg)
• Distributed messaging system • Producer - Broker - Consumer pattern • Pull model, replication, etc
Apache Kafka
App
PushPull
Producer Broker DestinationConsumer
![Page 6: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/6.jpg)
Push vs Pull
• Push: • Easy to transfer data to multiple destinations • Hard to control stream ratio in multiple streams
• Pull: • Easy to control stream flow / ratio • Should manage consumers correctly
![Page 7: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/7.jpg)
There are 2 ways
• fluent-plugin-kafka
• kafka-fluentd-consumer
![Page 8: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/8.jpg)
fluent-plugin-kafka• Input / Output plugin for kafka
• https://github.com/htgc/fluent-plugin-kafka • in_kafka, in_kafka_group, out_kafka, out_kafka_buffered
• Pros • Easy to use and output support
• Cons • Performance is not primary
![Page 9: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/9.jpg)
Configuration example
<source> @type kafka topics web,system format json add_prefix kafka. # more options </source>
<match kafka.**> @type kafka_buffered output_data_type msgpack default_topic metrics compression_codec gzip required_acks 1 </match>
https://github.com/htgc/fluent-plugin-kafka#usage
![Page 10: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/10.jpg)
kafka fluentd consumer• Stand-alone kafka consumer for fluentd
• https://github.com/treasure-data/kafka-fluentd-consumer
• Send cosumed events to fluentd’s in_forward
• Pros • High performance and Java API features
• Cons • Need Java runtime
![Page 11: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/11.jpg)
Run consumer
• Edit log4j and fluentd-consumer properties
• Run following command: $ java \ -Dlog4j.configuration=file:///path/to/log4j.properties \ -jar path/to/kafka-fluentd-consumer-0.2.1-all.jar \ path/to/fluentd-consumer.properties
![Page 12: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/12.jpg)
Properties examplefluentd.tag.prefix=kafka.event.
fluentd.record.format=regexp # default is json
fluentd.record.pattern=(?<text>.*) # for regexp format
fluentd.consumer.topics=app.* # can use Java Rege
fluentd.consumer.topics.pattern=blacklist # default is whitelist
fluentd.consumer.threads=5
https://github.com/treasure-data/kafka-fluentd-consumer/blob/master/config/fluentd-consumer.properties
![Page 13: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/13.jpg)
With Fluentd example
<source> @type forward </source>
<source> @type exec command java -Dlog4j.configuration=file:///path/to/log4j.properties -jar /path/to/kafka-fluentd-consumer-0.2.1-all.jar /path/to/config/fluentd-consumer.properties tag dummy format json </source>
https://github.com/treasure-data/kafka-fluentd-consumer#run-kafka-consumer-for-fluentd-via-in_exec
![Page 14: Fluentd and Kafka](https://reader030.vdocuments.site/reader030/viewer/2022021416/587a26071a28abbd388b50e1/html5/thumbnails/14.jpg)
Conclusion
• Kafka is now becomes important component on data platform
• Fluentd can communicate with Kafka • Fluentd plugin and kafka consumer
• Building reliable and flexible data pipeline with Fluentd and Kafka