fluentd - flexible, stable, scalable

35
Fluentd Flexible, Stable, Scalable Suiting @Taipei.py

Upload: shu-ting-tseng

Post on 21-Apr-2017

9.730 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Fluentd - Flexible, Stable, Scalable

FluentdFlexible, Stable, Scalable

Suiting @Taipei.py

Page 2: Fluentd - Flexible, Stable, Scalable

ho  am  I

Suiting  (@suitingtseng)  

Gogolook  Inc.  

Data  Team

Page 3: Fluentd - Flexible, Stable, Scalable

Before

Page 4: Fluentd - Flexible, Stable, Scalable

What is Fluentd?

• Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

• Treasure Data: td-agent

Page 5: Fluentd - Flexible, Stable, Scalable

What is Fluentd?

• Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

• Treasure Data: td-agent

Page 6: Fluentd - Flexible, Stable, Scalable

What is a log?

Page 7: Fluentd - Flexible, Stable, Scalable

Log definition

Time + Tag + Content

Page 8: Fluentd - Flexible, Stable, Scalable

After

Page 9: Fluentd - Flexible, Stable, Scalable

How?

• Lightweight: C + Ruby + MessagePack

• Pluggable architecture

• Built-in Reliability

Page 10: Fluentd - Flexible, Stable, Scalable

Input plugins

• forward

• tail

• AWS Simple Queue Service

• AWS CloudWatch

Page 11: Fluentd - Flexible, Stable, Scalable

input: tail$  cat  /etc/td-­‐agent/conf.d  

<source>  

   type            tail  

   path            /var/log/nginx/access.log  

   pos_file    /var/log/td-­‐agent/httpd-­‐access.log.pos  

   tag              nginx.access  

</source>  

<match  nginx.access>  

   blah  blah  

</match>

Page 12: Fluentd - Flexible, Stable, Scalable

input: forward$  cat  /etc/td-­‐agent/conf.d  

<source>  

   type  forward  

   port  24224  

</source>  

<match  flask.index>  

   blah  blah  

</match>

Page 13: Fluentd - Flexible, Stable, Scalable

input: forward$  cat  ~/example.py  

from  fluent  import  sender  

from  fluent  import  event  

sender.setup('flask',  host='localhost',  port=24224)  

event.Event("index",  {  

"user":  "foo",  

"token":  "bar",  

"action":  "POST"  

})

Page 14: Fluentd - Flexible, Stable, Scalable

Output plugins

• forward

• copy

• Elasticsearch / MongoDB

• statsd / influxDB / graphite

• S3 / GCS / BigQuery

Page 15: Fluentd - Flexible, Stable, Scalable

output: elasticsearch$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                        bar  

   tag                        nginx.access  

</source>  

<match  nginx.access>  

   type                      elasticsearch  

   hosts                    es-­‐host1,es-­‐host2  

   index_name          nginx  

   type_name            access  

   flush_interval  60s  

</match>

Page 16: Fluentd - Flexible, Stable, Scalable

output: splunk$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                        bar  

   tag                        nginx.access  

</source>  

<match  nginx.access>  

   type                      splunk  

   hosts                    splunk-­‐host1  

</match>

Page 17: Fluentd - Flexible, Stable, Scalable

Filter plugins

• grok

• grep

• record-modifier / record-reformer

• geoip

Page 18: Fluentd - Flexible, Stable, Scalable

Buffer types

• Memory

• File

Page 19: Fluentd - Flexible, Stable, Scalable

Buffer example$  cat  /etc/td-­‐agent/conf.d  

<source>  

   foo                                  bar  

   tag                                  nginx.access  

</source>  

<match  nginx.access>  

   type                                splunk  

   hosts                              splunk-­‐host1  

   buffer_chunk_limit    10m  

   buffer_queue_limit    1000  

   flush_interval            5m  

</match>

Page 20: Fluentd - Flexible, Stable, Scalable
Page 21: Fluentd - Flexible, Stable, Scalable

Scalability

• Scale up: multi-process plugin

• Scale out: out-forward plugin

Page 22: Fluentd - Flexible, Stable, Scalable

App + Fluentd

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

App + Fluentd

App + Fluentd

Page 23: Fluentd - Flexible, Stable, Scalable

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

Fluentd

App + Fluentd

App + Fluentd

App + Fluentd

Page 24: Fluentd - Flexible, Stable, Scalable

Fluentd

Elastic search

Elastic search

Elastic search

Elastic search

Fluentd

FluentdLoad

balance

App + Fluentd

App + Fluentd

App + Fluentd

Auto scaling group

Page 25: Fluentd - Flexible, Stable, Scalable

Stability

• Auto retry

• Persistent file buffer

• At-most-once delivery

Page 26: Fluentd - Flexible, Stable, Scalable

Message Delivery

• At-most-once: data may be lost

• At-least-once: data may be duplicated

• Exactly-once: perfect

Page 27: Fluentd - Flexible, Stable, Scalable

Idempotent

• HTTP PUT

• Maintain a unique id in application level or

• Concatenate (instance-id, time, ….) as id

Page 28: Fluentd - Flexible, Stable, Scalable

Gogolook use cases

• MongoDB, nginx log

• API, worker log

• Monitor

• Benchmark

Page 29: Fluentd - Flexible, Stable, Scalable

Active users by day

Page 30: Fluentd - Flexible, Stable, Scalable

System monitor

Page 31: Fluentd - Flexible, Stable, Scalable

Queue monitor

Page 32: Fluentd - Flexible, Stable, Scalable

Benchmark?

FluentdApp + Fluentd DB

Page 33: Fluentd - Flexible, Stable, Scalable

Benchmark?

FluentdApp + Fluentd DB

Local files

Page 34: Fluentd - Flexible, Stable, Scalable

Benchmark?

FluentdApp + Fluentd DB

Local files

Page 35: Fluentd - Flexible, Stable, Scalable

Q & A