Download - Norikra: SQL Stream Processing In Ruby
Norikra:SQL Stream ProcessingIn Ruby
2014/11/19RubyConf 2014 DAY 3
Satoshi Tagomori (@tagomoris)
Topics
Why I wrote Norikra
Norikra overview
Norikra queries
Use cases in production
JRuby for me
Satoshi Tagomori (@tagomoris)Tokyo, Japan
LINE Corporation
Monitoring/Data Analytics Overview
collect parseclean up
process
visualize
processstoreAccess logs,Application logs, ...
collect parseclean up
process
visualize
processstore
collect parseclean up
process
visualize
processstore
collect parseclean up
process
visualize
processstore
Fluentd stream aggregation:Good for simple data/calculation
Our services:
More and more different services
Many changes in a day (including logging)
Many kind of logs for each services
Many different metrics for each services
collect parseclean up
process
visualize
processstore
Fluentd stream aggregation:Not good for processingabout complex/fragile environment...
We want to:
add/remove queries anytime we want
write many queries for a service log stream
ignore events without data we want
make our service directors / growth hackers to write their own queries!
collect parseclean up
process
visualize
processstore
break.
Norikra:Schema-less Stream Processing with SQL
Server software, written in JRuby, runs on JVM
Open source software (GPLv2)
http://norikra.github.io/
https://github.com/norikra/norikra
How To Setup Norikra:Install JRuby
download jruby.tar.gz, extract it and export $PATHuse rbenv
rbenv install jruby-1.7.xx
rbenv shell jruby-..
Install Norikragem install norikra
Execute Norikra servernorikra start
Norikra Interface:CLI client/Client library: norikra-client
norikra-client target open ...
norikra-client query add ...
tail -f ... | norikra-client event send ...
WebUI
show status
show/add/remove queries
HTTP API
JSON, MessagePack
Norikra:
Schema-less event stream:Add/Remove data fields whenever you want
SQL:No more restarts to add/remove queriesw/ JOINs, w/ SubQueriesw/ UDF (in Java/Ruby as rubygems)
Truly Complex events:Nested Hash/Array, accessible directly from SQL
Norikra Queries: (1)
SELECT name, ageFROM events
target
Norikra Queries: (1)
SELECT name, ageFROM events
{“name”:”tagomoris”, “age”:35, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
{“name”:”tagomoris”,”age”:35}
Norikra Queries: (1)
SELECT name, ageFROM events
nothing
without “age”
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”San Diego”
{“name”:”tagomoris”,”age”:35}
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
Norikra Queries: (2)
SELECT name, ageFROM events
WHERE current=”San Diego”
nothing
{“name”:”nobu”, “age”:0, “address”:”Somewhere”, “corp”:”Heroku”, “current”:”SAN”}
current is not “San Diego”
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
Norikra Queries: (3)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
every 5 mins
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
Norikra Queries: (4)
SELECT age, COUNT(*) as cntFROM
events.win:time_batch(5 mins)GROUP BY age
{”age”:35,”cnt”:3},{“age”:33,”cnt”:1},
...
SELECT max(age) as maxFROM
events.win:time_batch(5 mins)
{“max”:51}every 5 mins
{“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”San Diego”}
Norikra Queries: (5)
SELECT age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
Norikra Queries: (5)
SELECT user.age, COUNT(*) as cntFROM events.win:time_batch(5 mins)
WHERE current=”San Diego”AND attend.$0 AND attend.$1
GROUP BY user.age
{“name”:”tagomoris”, “user:{“age”:35, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”San Diego”, “speaker”:true, “attend”:[true,true,false, ...]}
break.next: use cases
Use case 1:External API call reports for partners (LINE)
External API call for LINE Business Connect
LINE backend sends requests to partner’s API endpoint using users’ messages
http://developers.linecorp.com/blog/?p=3386
Use case 1:External API call reports for partners (LINE)
channelgateway
partner’sserver
logs
queryresults
MySQL Mail
SELECT channelId AS channel_id, reason, detail, count(*) AS error_count, min(timestamp) AS first_timestamp, max(timestamp) AS last_timestampFROM api_error_log.win:time_batch(60 sec)GROUP BY channelId,reason,detailHAVING count(*) > 0
http://developers.linecorp.com/blog/?p=3386
Use case 1:External API call reports for partners (LINE)
API error response summaries
http://developers.linecorp.com/blog/?p=3386
Use case 2: Lambda architecturePrompt reports for Ad service console
Prompt reports with Norikra + Fixed reports with Hive
appserverapp
serverappserver
appserverapp
serverappserver
Fluentd
HDFS
consoleservice
fetch query results(frequently)
execute hive query(daily)
impressionlogs
SELECT yyyymmdd, hh, campaign_id, region, lang, COUNT(*) AS click, COUNT(DISTINCT member_id) AS uuFROM ( SELECT yyyymmdd, hh, get_json_object(log, '$.campaign.id') AS campaign_id, get_json_object(log, '$.member.region') AS region, get_json_object(log, '$.member.lang') AS lang, get_json_object(log, '$.member.id') AS member_id FROM applog WHERE service='myservice' AND yyyymmdd='20140913' AND get_json_object(log, '$.type')='click') xGROUP BY yyyymmdd, hh, campaign_id, region, lang
Hive queryfor fixed reports
Use case 2:Prompt reports for Ad service console
SELECT campaign.id AS campaign_id, member.region AS region, member.lang AS lang, COUNT(*) AS click, COUNT(DISTINCT member.id) AS uuFROM myservice.win:time_batch(1 hours)WHERE type="click"GROUP BY campaign.id, member.region, member.lang
Norikra query for prompt reports
Use case 2:Prompt reports for Ad service console
Use case 3:Realtime access dashboard on Google Platform
Access log visualizationCount using Norikra (2-step), Store on Google BigQueryDashboard on Google Spreadsheet + Apps Script
https://www.youtube.com/watch?v=EZkw5TDcCGw
http://qiita.com/kazunori279/items/6329df57635799405547
Use case 3:Realtime access dashboard on Google Platform
https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547
Server
Fluentd
ngnix
access log
access logsto BigQuery
norikra query resultsto aggregate nodenorikra query
to aggregate locally
Use case 3:Realtime access dashboard on Google Platform
https://www.youtube.com/watch?v=EZkw5TDcCGwhttp://qiita.com/kazunori279/items/6329df57635799405547
Fluentd
ngnix
70 servers, 120,000 requests/sec (or more!)
ngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnixngnix
GoogleBigQuery
GoogleSpreadsheet+ Apps script
...
counts per host
logs to store
total count
Why Norikra is written in JRuby
Esper
CEP(Complex Event Processing) library, written in Java
Rubygems.org
Open repository, for public UDF plugins of Norikra provided as gem
JRuby for me
Ruby! (by great JRuby developer team!)
makes developing Norikra dramatically faster
with rubygems and rubygems.org for easy deployment/installation
with Java libraries, ex: Jetty, Esper, ...
There are not so many users in Tokyo :(
More queries, more simplicityand less latency
in data processing
Thanks!
photo: by my co-workers
http://norikra.github.io/https://github.com/norikra/norikra
See also:http://norikra.github.io/
“Lambda Architecture Platform Using SQL”http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-2014-taiwan
“Stream processing and Norikra”http://www.slideshare.net/tagomoris/stream-processing-and-norikra
“Batch processing and Stream processing by SQL”http://www.slideshare.net/tagomoris/hcj2014-sql
“Norikra in Action”http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
http://www.slideshare.net/tagomoris/presentations
Storm or Norikra?
Simple and fixed workload for huge traffic
Use Storm!
Complex and fragile workload for non-huge traffic
Use Norikra!
Scalability?
10,000 - 100,000 events/sec
on 2CPU 8Core server
HA? Distributed?
NO!
I have some idea, but I have no time to implement it
There are no needs for HA/Distributed processing
Data flow & API?
Use Fluentd!