norikra: sql stream processing in ruby
DESCRIPTION
Presentation in RubyConf 2014TRANSCRIPT
![Page 1: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/1.jpg)
Norikra:SQL Stream ProcessingIn Ruby
2014/11/19RubyConf 2014 DAY 3
Satoshi Tagomori (@tagomoris)
![Page 2: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/2.jpg)
Topics
Why I wrote Norikra
Norikra overview
Norikra queries
Use cases in production
JRuby for me
![Page 3: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/3.jpg)
Satoshi Tagomori (@tagomoris)Tokyo, Japan
LINE Corporation
![Page 4: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/4.jpg)
![Page 5: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/5.jpg)
![Page 6: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/6.jpg)
![Page 7: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/7.jpg)
Monitoring/Data Analytics Overview
collect parseclean up
process
visualize
processstoreAccess logs,Application logs, ...
![Page 8: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/8.jpg)
![Page 9: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/9.jpg)
collect parseclean up
process
visualize
processstore
![Page 10: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/10.jpg)
![Page 11: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/11.jpg)
collect parseclean up
process
visualize
processstore
![Page 12: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/12.jpg)
![Page 13: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/13.jpg)
collect parseclean up
process
visualize
processstore
Fluentd stream aggregation:Good for simple data/calculation
![Page 14: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/14.jpg)
Our services:
More and more different services
Many changes in a day (including logging)
Many kind of logs for each services
Many different metrics for each services
![Page 15: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/15.jpg)
collect parseclean up
process
visualize
processstore
Fluentd stream aggregation:Not good for processingabout complex/fragile environment...
![Page 16: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/16.jpg)
We want to:
add/remove queries anytime we want
write many queries for a service log stream
ignore events without data we want
make our service directors / growth hackers to write their own queries!
![Page 17: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/17.jpg)
collect parseclean up
process
visualize
processstore
![Page 18: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/18.jpg)
break.
![Page 19: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/19.jpg)
![Page 20: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/20.jpg)
Norikra:Schema-less Stream Processing with SQL
Server software, written in JRuby, runs on JVM
Open source software (GPLv2)
http://norikra.github.io/
https://github.com/norikra/norikra
![Page 21: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/21.jpg)
How To Setup Norikra:Install JRuby
download jruby.tar.gz, extract it and export $PATHuse rbenv
rbenv install jruby-1.7.xx
rbenv shell jruby-..
Install Norikragem install norikra
Execute Norikra servernorikra start
![Page 22: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/22.jpg)
Norikra Interface:CLI client/Client library: norikra-client
norikra-client target open ...
norikra-client query add ...
tail -f ... | norikra-client event send ...
WebUI
show status
show/add/remove queries
HTTP API
JSON, MessagePack
![Page 23: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/23.jpg)
Norikra:
Schema-less event stream:Add/Remove data fields whenever you want
SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF (in Java/Ruby as rubygems)
Truly Complex events:Nested Hash/Array, accessible directly from SQL
![Page 24: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/24.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
target
![Page 25: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/25.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
{“name”:”tagomoris”,”age”:35}
![Page 26: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/26.jpg)
Norikra Queries: (1)
SELECT name, ageFROM events
nothing
without “age”
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
![Page 27: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/27.jpg)
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”San Diego”
{“name”:”tagomoris”,”age”:35}
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
![Page 28: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/28.jpg)
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”San Diego”
nothing
{“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”}
current is not “San Diego”
![Page 29: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/29.jpg)
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
![Page 30: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/30.jpg)
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
![Page 31: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/31.jpg)
Norikra Queries: (4)
SELECT age, COUNT(*) as cntFROM
events.win:time_batch(5 mins)GROUP BY age
{”age”:35,”cnt”:3},{“age”:33,”cnt”:1},
...
SELECT max(age) as maxFROM
events.win:time_batch(5 mins)
{“max”:51}every 5 mins
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
![Page 32: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/32.jpg)
Norikra Queries: (5)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
![Page 33: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/33.jpg)
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
![Page 34: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/34.jpg)
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
WHERE current=”San Diego”AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
![Page 35: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/35.jpg)
break.next: use cases
![Page 36: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/36.jpg)
Use case 1:External API call reports for partners (LINE)
External API call for LINE Business Connect
LINE backend sends requests to partner’s API endpoint using users’ messages
http://developers.linecorp.com/blog/?p=3386
![Page 37: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/37.jpg)
Use case 1:External API call reports for partners (LINE)
channelgateway
partner’sserver
logs
queryresults
MySQL Mail
SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestampFROM api_error_log.win:time_batch(60 sec)GROUP BY channelId,reason,detailHAVING count(*) > 0
http://developers.linecorp.com/blog/?p=3386
![Page 38: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/38.jpg)
Use case 1:External API call reports for partners (LINE)
API error response summaries
http://developers.linecorp.com/blog/?p=3386
![Page 39: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/39.jpg)
Use case 2: Lambda architecturePrompt reports for Ad service console
Prompt reports with Norikra + Fixed reports with Hive
appserverapp
serverappserver
appserverapp
serverappserver
Fluentd
HDFS
consoleservice
fetch query results(frequently)
execute hive query(daily)
impressionlogs
![Page 40: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/40.jpg)
SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uuFROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click') xGROUP BY yyyymmdd, hh, campaign_id, region, lang
Hive queryfor fixed reports
Use case 2:Prompt reports for Ad service console
![Page 41: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/41.jpg)
SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uuFROM myservice.win:time_batch(1 hours)WHERE type="click"GROUP BY campaign.id, member.region, member.lang
Norikra query for prompt reports
Use case 2:Prompt reports for Ad service console
![Page 42: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/42.jpg)
Use case 3:Realtime access dashboard on Google Platform
Access log visualizationCount using Norikra (2-step), Store on Google BigQueryDashboard on Google Spreadsheet + Apps Script
https://www.youtube.com/watch?v=EZkw5TDcCGw
http://qiita.com/kazunori279/items/6329df57635799405547
![Page 43: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/43.jpg)
Use case 3:Realtime access dashboard on Google Platform
https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547
Server
Fluentd
ngnix
access log
access logsto BigQuery
norikra query resultsto aggregate nodenorikra query
to aggregate locally
![Page 44: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/44.jpg)
Use case 3:Realtime access dashboard on Google Platform
https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547
Fluentd
ngnix
70 servers, 120,000 requests/sec (or more!)
ngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnix
GoogleBigQuery
GoogleSpreadsheet+ Apps script
...
counts per host
logs to store
total count
![Page 45: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/45.jpg)
Why Norikra is written in JRuby
Esper
CEP(Complex Event Processing) library, written in Java
Rubygems.org
Open repository, for public UDF plugins of Norikra provided as gem
![Page 46: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/46.jpg)
JRuby for me
Ruby! (by great JRuby developer team!)
makes developing Norikra dramatically faster
with rubygems and rubygems.org for easy deployment/installation
with Java libraries, ex: Jetty, Esper, ...
There are not so many users in Tokyo :(
![Page 47: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/47.jpg)
More queries, more simplicityand less latency
in data processing
Thanks!
photo: by my co-workers
http://norikra.github.io/https://github.com/norikra/norikra
![Page 48: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/48.jpg)
See also:http://norikra.github.io/
“Lambda Architecture Platform Using SQL”http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-2014-taiwan
“Stream processing and Norikra”http://www.slideshare.net/tagomoris/stream-processing-and-norikra
“Batch processing and Stream processing by SQL”http://www.slideshare.net/tagomoris/hcj2014-sql
“Norikra in Action”http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
http://www.slideshare.net/tagomoris/presentations
![Page 49: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/49.jpg)
Storm or Norikra?
Simple and fixed workload for huge traffic
Use Storm!
Complex and fragile workload for non-huge traffic
Use Norikra!
![Page 50: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/50.jpg)
Scalability?
10,000 - 100,000 events/sec
on 2CPU 8Core server
![Page 51: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/51.jpg)
HA? Distributed?
NO!
I have some idea, but I have no time to implement it
There are no needs for HA/Distributed processing
![Page 52: Norikra: SQL Stream Processing In Ruby](https://reader035.vdocuments.site/reader035/viewer/2022062419/559446a11a28abfc728b4634/html5/thumbnails/52.jpg)
Data flow & API?
Use Fluentd!