log analysis system and its designs in line corp. 2014 early
DESCRIPTION
LINE developer meetup in fukuoka 1 #LINE_DMTRANSCRIPT
![Page 1: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/1.jpg)
Log Analysis SystemsAnd its designsIn LINE Corp. 2014 Early
2014/02/20 (Thu)@tagomoris (TAGOMORI Satoshi)LINE Corp.LINE Developer Meetup in Fukuoka #1
14年2月20日木曜日
![Page 2: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/2.jpg)
TAGOMORI Satoshi (@tagomoris)LINE Corp.
Development Support Team
14年2月20日木曜日
![Page 3: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/3.jpg)
14年2月20日木曜日
![Page 4: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/4.jpg)
14年2月20日木曜日
![Page 5: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/5.jpg)
Data Collecting,Aggregation, Analytics,
Visualization
14年2月20日木曜日
![Page 6: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/6.jpg)
See also:
「OSSで支えられるライブドアの巨大ログ集計」 (2012 Summer)http://www.slideshare.net/tagomoris/oss-nhntech
「Log analysis system with Hadoop in livedoor 2013 Winter」(2013 early)http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
「Batch and Stream processing with SQL」 (2013 Fall)http://www.slideshare.net/tagomoris/batch-and-stream-processing-with-sql
14年2月20日木曜日
![Page 7: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/7.jpg)
disclaimer:
This talk is about“a” log analysis system
in LINE.
14年2月20日木曜日
![Page 8: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/8.jpg)
SQL好きですか?
14年2月20日木曜日
![Page 9: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/9.jpg)
System Overview (2014)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
Presto Cluster
14年2月20日木曜日
![Page 10: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/10.jpg)
System Overview (2014)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
Presto Cluster
JavaNode Perl
Ruby
14年2月20日木曜日
![Page 11: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/11.jpg)
System Overview (2014)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
Presto Cluster
SQL
fluentd.conf
14年2月20日木曜日
![Page 12: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/12.jpg)
Who uses it?Internet Messaging Service
Public Web Service
Game
Private Web Service (for closed person-to-persons)
Internal Web Service (administrator only)
Data Analytics Service
14年2月20日木曜日
![Page 13: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/13.jpg)
Who uses it?Internet Messaging Service
Public Web Service
Game
Private Web Service (for closed person-to-persons)
Internal Web Service (administrator only)
Data Analytics Service
14年2月20日木曜日
![Page 14: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/14.jpg)
Data analytics players
StoragesHadoop Cluster
Visualization Tools
ADMINISTRATOR
Raw Log FormatsApplication Logs
Data SizesData Semantics
PROGRAMMER
SERVICE DIRECTORSALES
Whatever Metrics They Want
BOARD MEMBER
........
14年2月20日木曜日
![Page 15: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/15.jpg)
Data analytics players
StoragesHadoop Cluster
Visualization Tools
ADMINISTRATOR
Raw Log FormatsApplication Logs
Data SizesData Semantics
PROGRAMMER
SERVICE DIRECTORSALES
Whatever Metrics They Want
BOARD MEMBER
........
WE NEED THE QUERY LANGUAGEWHAT THEY ALL CAN
RUN AND UNDERSTAND!!!!!!!!!!
14年2月20日木曜日
![Page 16: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/16.jpg)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
14年2月20日木曜日
![Page 17: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/17.jpg)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
Presto Cluster
14年2月20日木曜日
![Page 18: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/18.jpg)
14年2月20日木曜日
![Page 19: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/19.jpg)
SQL: Hive
14年2月20日木曜日
![Page 20: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/20.jpg)
SQL: Hive
14年2月20日木曜日
![Page 21: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/21.jpg)
Schema-less Stream Processing with SQL
Norikra
14年2月20日木曜日
![Page 22: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/22.jpg)
14年2月20日木曜日
![Page 23: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/23.jpg)
Software StackHadoop: CDH 4.5.0 w/ JDK6 (WebHDFS, Hive, HiveServer)
Presto: 0.59 w/ JDK7
Shib: v0.3.0 w/ Node.js v0.10
Fluentd: v0.10.39 w/ Ruby 2.0.0
And many plugins
Norikra: v0.1.3 w/ JRuby 1.7.4
14年2月20日木曜日
![Page 24: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/24.jpg)
14年2月20日木曜日
![Page 25: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/25.jpg)
Batches and StreamsHadoop is for batchesHigh performance batch is important
HDFS has good performance
Stream log writing and calcurationsare also VERY VERY IMPORTANT
Hybrid System:Stream processing + Batch
14年2月20日木曜日
![Page 26: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/26.jpg)
Collect and deliveras
STREAM
Calculateas
BATCH
14年2月20日木曜日
![Page 27: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/27.jpg)
BATCH
1st gen: First impl.Web Servers Scribed
ArchiveStorage(scribed)
Hadoop ClusterCDH3b2
(Hadoop Streaming)
hiveserver
STREAM
Shib
(LIBHDFS)
14年2月20日木曜日
![Page 28: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/28.jpg)
Hadoop and Hive
Filesystem (HDFS)
Processing Framework (Hadoop MapReduce)
Query Compiler: SQL -> MR (Hive)
Thrift API Server (HiveServer)
Old style Java (....)
14年2月20日木曜日
![Page 29: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/29.jpg)
Shib
WebUI Client for Hive
Query editor/executer + result viewer
HTTP JSON API Gateway for Hive query execution
Node.js
14年2月20日木曜日
![Page 30: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/30.jpg)
2nd gen: +FluentdWeb Servers Fluentd
Cluster
ArchiveStorage(scribed)
Hadoop ClusterCDH3u2
(Hive)
Cludera Hoop
HuahinManager
hiveserver
STREAM
Shib
BATCH
14年2月20日木曜日
![Page 31: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/31.jpg)
FluentdLog collector
Apache-like configuration
Pluggable Input/Output/Buffer on public plugin repository (rubygems.org)
Ruby 1.9 or later
Collect, and Store
collect: fluent-agent-lite (perl)
store: fluent-plugin-webhdfs
14年2月20日木曜日
![Page 32: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/32.jpg)
Collect and deliveras
STREAM
Calculateas
BATCH
Monitoras
STREAM
14年2月20日木曜日
![Page 33: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/33.jpg)
3rd gen: +MonitoringWeb Servers Fluentd
Cluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop ClusterCDH3u5
(Hive)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
14年2月20日木曜日
![Page 34: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/34.jpg)
Fluentd plugins
Monitoring in real-time
message num/size counting
min, max, average and percentiles
Visualization and Notification
Graph tools (GrowthForecast / Focuslight)
IRC (or Mail, HipChat, ...)
14年2月20日木曜日
![Page 35: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/35.jpg)
4th gen: +HA (hadoop)
Web Servers FluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop ClusterCDH4
(HDFS, YARN)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
14年2月20日木曜日
![Page 36: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/36.jpg)
Collect and deliveras
STREAM
Calculateas
BATCH
Monitoras
STREAM
Calculateas
STREAMon demand
14年2月20日木曜日
![Page 37: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/37.jpg)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
5th gen: +Norikra
14年2月20日木曜日
![Page 38: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/38.jpg)
NorikraSQL Query for Streams
Add/Remove on demand (without restarts)
... and many features
HTTP JSON API
JRuby on JVM with Esper
14年2月20日木曜日
![Page 39: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/39.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
WHERE current=”Fukuoka”
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”}
{“name”:”tagomoris”,”age”:34}
14年2月20日木曜日
![Page 40: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/40.jpg)
Norikra Queries: (2)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
WHERE current=”Fukuoka” GROUP BY age
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Fukuoka”}
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
14年2月20日木曜日
![Page 41: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/41.jpg)
Collect and deliveras
STREAM
Calculateas
BATCH
Monitoras
STREAM
Calculateas
STREAMon demand
Calculateas
BATCHimmediatelyon demand
14年2月20日木曜日
![Page 42: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/42.jpg)
Web ServersFluentdCluster
ArchiveStorage(scribed)
FluentdWatchers Graph
Tools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
Presto Cluster
5th gen: +Presto
14年2月20日木曜日
![Page 43: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/43.jpg)
PrestoOpen sourced by Facebook at 2013/11/07
MPP Engine: Massive Parallel Processing Engine
like Google BigQuery(Dremel), Cloudera Impala
short latency queries (It’s not main usage of Hive)
SQL
HTTP JSON API
Java 7 !
14年2月20日木曜日
![Page 44: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/44.jpg)
Shib v0.3.0: presto support
User(browser)
AnalysisBatches
ServiceAdmin Tools
Shib
HiveServer
Presto
HTTP JSON API HTTP JSON API
THRIFT
HiveServer2THRIFT
14年2月20日木曜日
![Page 45: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/45.jpg)
Non-monolithic architecture
Many subsystems for many purposes
Add/Update/Replace per subsystems
High interoperability by RPC-based connections
Gateway can hide backend implementations
14年2月20日木曜日
![Page 46: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/46.jpg)
WHAT TO DOIS
NOT WHAT WE WANT TOBUT
WHAT WE ARE WANTED TO.
14年2月20日木曜日
![Page 47: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/47.jpg)
THERE ARE MANY OF WHAT TO DO!
THANKS!
14年2月20日木曜日
![Page 48: Log Analysis System And its designs in LINE Corp. 2014 early](https://reader037.vdocuments.site/reader037/viewer/2022103109/540de0878d7f728d7e8b4b4a/html5/thumbnails/48.jpg)
Software list:
http://fluentd.org/
http://prestodb.io/
http://norikra.github.io/
https://github.com/tagomoris/shib
14年2月20日木曜日