batch and stream processing with sql
DESCRIPTION
TRANSCRIPT
![Page 1: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/1.jpg)
Batch and Stream processingwith SQL
2013/11/07 Cloudera World Tokyo 2013
TAGOMORI Satoshi @tagomorisLINE Corp.
13年11月7日木曜日
![Page 2: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/2.jpg)
SQL、宇宙、すべての答え
SELECT 42 FROM anywhere
13年11月7日木曜日
![Page 3: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/3.jpg)
TAGOMORI Satoshi (@tagomoris)LINE Corp.
Hadoop, Fluentd, Norikra, ...
13年11月7日木曜日
![Page 4: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/4.jpg)
13年11月7日木曜日
![Page 5: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/5.jpg)
13年11月7日木曜日
![Page 6: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/6.jpg)
13年11月7日木曜日
![Page 7: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/7.jpg)
Data Collecting,Aggregation, Analytics,
Visualization
13年11月7日木曜日
![Page 8: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/8.jpg)
SQL好きですか?
13年11月7日木曜日
![Page 9: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/9.jpg)
How to write M/R(or Storm app, or...)
Java (or Scala, Clojure, JRuby, ...)
Hadoop Streaming
Pig
Hive, Impala (SQL!)
13年11月7日木曜日
![Page 10: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/10.jpg)
Our log trafficDaily
2.1+ TB (non compressed)
6.8+ Billion lines / day
Peak time
150,000+ lines / sec
380+ Mbps
13年11月7日木曜日
![Page 11: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/11.jpg)
Our Hadoop cluster
CDH 4.2.0
Master Nodes: 3 (NameNodeHA+QJM)
NameNode, JournalNode, JobTracker
Slave Nodes: 20
13年11月7日木曜日
![Page 12: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/12.jpg)
What we want to doCOUNT PV,UU and others (daily/realtime)
COUNT Service metrics (daily/hourly/realtime)
FIND Surprising Errors [4xx,5xx] (immediately)
CHECK Response Times (immediately)
SERCH Logs in troubles (hourly/immediately)
VISUALIZE/NOTIFY App Status (realtime)
13年11月7日木曜日
![Page 13: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/13.jpg)
Batches and StreamsHadoop is for batchesHigh performance batch is important
HDFS has good performance
Stream log writing and calculationsare also VERY VERY IMPORTANT
Hybrid System:Stream processing + Batch
13年11月7日木曜日
![Page 14: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/14.jpg)
System OverviewWeb Servers Fluentd
Cluster
ArchiveStorage(scribed)
FluentdWatchers
GraphTools
Notifications(IRC)
Hadoop Cluster(HDFS, MR)
webhdfs
HuahinManager
hiveserver
STREAM
Shib ShibUI
BATCH SCHEDULEDBATCH
Norikra
13年11月7日木曜日
![Page 15: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/15.jpg)
Data analytics players
StoragesHadoop Cluster
Visualization Tools
ADMINISTRATOR
Raw Log FormatsApplication Logs
Data SizesData Semantics
PROGRAMMER
SERVICE DIRECTORSALES
Whatever Metrics They Want
BOARD MEMBER
........
13年11月7日木曜日
![Page 16: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/16.jpg)
Data analytics players
StoragesHadoop Cluster
Visualization Tools
ADMINISTRATOR
Raw Log FormatsApplication Logs
Data SizesData Semantics
PROGRAMMER
SERVICE DIRECTORSALES
Whatever Metrics They Want
BOARD MEMBER
........
WE NEED THE QUERY LANGUAGEWHAT THEY ALL CAN
RUN AND UNDERSTAND!!!!!!!!!!
13年11月7日木曜日
![Page 17: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/17.jpg)
SQL: Hive
13年11月7日木曜日
![Page 18: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/18.jpg)
SQL: Hive
13年11月7日木曜日
![Page 19: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/19.jpg)
Hive
SQL: w/o compile, w/o deployment
HiveServer: w/o server login
Shib: Select only
13年11月7日木曜日
![Page 20: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/20.jpg)
13年11月7日木曜日
![Page 21: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/21.jpg)
Hive:
Simplify versioning problems
Hive 0.10 of CDH 4.2.0
Upgrade CDH for only Hive version
13年11月7日木曜日
![Page 22: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/22.jpg)
Hive: Pros
Many Scheduled Queries Metrics OnDemand Queries
13年11月7日木曜日
![Page 23: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/23.jpg)
Hive: Cons
Too Many Scheduled Queries for short time window
13年11月7日木曜日
![Page 24: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/24.jpg)
Stream processingQueries for fixed Window
every 1hour, 10minutes, 1minutes, ...latest 10evens, ...all events
Once query registered, Runs foreverResults appear automatically
NO MORE STORAGES
13年11月7日木曜日
![Page 25: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/25.jpg)
Stream processing
And
SQL13年11月7日木曜日
![Page 26: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/26.jpg)
Norikra:Schema-less Stream Processing with SQL
13年11月7日木曜日
![Page 27: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/27.jpg)
Norikra(1):Schema-less event stream:
Add/Remove data fields whenever you want
SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/UDF
Truly Complex events:Nested Hash/Array, accessible directly from SQL
13年11月7日木曜日
![Page 28: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/28.jpg)
Norikra(2):Open source software:
Licensed under GPLv2Based on EsperUDF plugins from rubygems.org
Ultra-fast bootstrap & small start:3mins to install/start1 server
13年11月7日木曜日
![Page 29: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/29.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
13年11月7日木曜日
![Page 30: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/30.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Meguro”}
{“name”:”tagomoris”,”age”:34}
13年11月7日木曜日
![Page 31: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/31.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Meguro”}
nothing
13年11月7日木曜日
![Page 32: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/32.jpg)
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”Meguro”
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Meguro”}
{“name”:”tagomoris”,”age”:34}
13年11月7日木曜日
![Page 33: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/33.jpg)
Norikra Queries: (2){“name”:”frsyuki”, “age”:25, “address”:”MountainView”, “corp”:”TD”, “current”:”BayArea”}
SELECT name, ageFROM events
WHERE current=”Meguro”
nothing
13年11月7日木曜日
![Page 34: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/34.jpg)
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
13年11月7日木曜日
![Page 35: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/35.jpg)
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Meguro”}
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
13年11月7日木曜日
![Page 36: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/36.jpg)
Norikra Queries: (4)
SELECT age, COUNT(*) as cntFROM
events.win:time_batch(5 mins)GROUP BY age
{“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Meguro”}
{”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ...every 5 mins
SELECT max(age) as maxFROM
events.win:time_batch(5 mins)
{“max”:51}
13年11月7日木曜日
![Page 37: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/37.jpg)
Norikra Queries: (5)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Meguro”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月7日木曜日
![Page 38: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/38.jpg)
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Meguro”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月7日木曜日
![Page 39: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/39.jpg)
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
WHERE current=”Meguro” AND attend.$0 AND attend.$1GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Meguro”, “speaker”:true, “attend”:[true,true,false, ...]}
13年11月7日木曜日
![Page 40: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/40.jpg)
Before: Hive EVERY HOUR!SELECT yyyymmdd, hh, campaign_id, region, lang, count(*) AS click, count(distinct member_id) AS uuFROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20131101' AND hh='00' AND get_json_object(log, '$.type') = 'click') xGROUP BY yyyymmdd, hh, campaign_id, region, lang
13年11月7日木曜日
![Page 41: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/41.jpg)
After: NorikraSELECT campaign.id AS campaign_id, member.region AS region, count(*) AS click, count(distinct member.id) AS uuFROM myservice.win:time_batch(1 hours)WHERE type="click"GROUP BY campaign.id, member.region
13年11月7日木曜日
![Page 42: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/42.jpg)
Norikra: Current Status
v0.1.0: Released at 2013/11/01
by tagomoris
http://norikra.github.io/
Documents: under development
Just started to use in production
13年11月7日木曜日
![Page 43: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/43.jpg)
SQL Queries for batches for streams
13年11月7日木曜日
![Page 44: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/44.jpg)
企画・開発 幅広く募集中
•データマーケティング•データベースエンジニア• BI企画・開発• etc…
13年11月7日木曜日
![Page 45: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/45.jpg)
MUSIC�
SHOPPING� COOKING�
MOVIE�
GAME�
TRAVEL�
NEWS�
MOM&KIDS�
SPORTS�
BOOK�
GIRLS�
Variety Volume Velocity�
13年11月7日木曜日
![Page 46: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/46.jpg)
企画・開発 幅広く募集中
コーポレートサイトからどうぞ応募を!
データ分析・解析 規模拡大/強化中
13年11月7日木曜日
![Page 47: Batch and Stream processing with SQL](https://reader034.vdocuments.site/reader034/viewer/2022052216/540dd2838d7f728d7e8b4b16/html5/thumbnails/47.jpg)
See Also:Log analysis system with Hadoop in livedoor 2013
http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013
Norikrahttp://norikra.github.io/https://github.com/norikra
Shibhttps://github.com/tagomoris/shib
Fluentdhttp://fluentd.org/https://github.com/fluent/fluentd
13年11月7日木曜日