api analytics with bigquery, by javier ramirez from teowaki
DESCRIPTION
At https://teowaki.com we have a system for API usage analytics, with Redis as a fast intermediate store and google Bigquery as a big data backend. As a result, we can launch aggregated queries on our traffic/usage data in just a few seconds and we can try and find for usage patterns that wouldn’t be obvious otherwise. In this session I will talk about how we entered the Big Data world, which alternatives we evaluated, and how we are using Redis and Bigquery to solve our problem.TRANSCRIPT
![Page 1: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/1.jpg)
javier ramirez@supercoco9
API Analytics withRedis, BigQuery, and AppsScripts
![Page 2: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/2.jpg)
a two peoplestart-up
![Page 3: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/3.jpg)
![Page 4: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/4.jpg)
![Page 5: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/5.jpg)
a different league...
![Page 6: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/6.jpg)
.. or maybe not
![Page 7: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/7.jpg)
![Page 8: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/8.jpg)
![Page 9: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/9.jpg)
moral of the story
you can do big, if you know how
![Page 10: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/10.jpg)
![Page 11: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/11.jpg)
![Page 12: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/12.jpg)
![Page 13: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/13.jpg)
Set a distance.
Set an expiration time.
Bye bye noise.
![Page 14: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/14.jpg)
![Page 15: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/15.jpg)
![Page 16: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/16.jpg)
![Page 17: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/17.jpg)
![Page 18: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/18.jpg)
![Page 19: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/19.jpg)
![Page 20: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/20.jpg)
![Page 21: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/21.jpg)
javier ramirez @supercoco9 https://teowaki.com
REST API (Ruby on Rails) +
Web on top (AngularJS)
![Page 22: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/22.jpg)
javier ramirez @supercoco9 https://teowaki.com
![Page 23: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/23.jpg)
data that’s an order of magnitude greater than data you’re accustomed to
javier ramirez @supercoco9 https://teowaki.com
Doug Laney VP Research, Business Analytics and Performance Management at Gartner
![Page 24: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/24.jpg)
data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the structures of your database architectures.
Ed Dumbill program chair for the O’Reilly Strata Conference
javier ramirez @supercoco9 https://teowaki.com
![Page 25: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/25.jpg)
bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds
javier ramirez @supercoco9 https://teowaki.com
Javier Ramirezimpresionable teowaki founder
![Page 26: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/26.jpg)
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com
![Page 27: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/27.jpg)
twitterstackoverflowpinterestbooking.comWorld of WarcraftYouPornHipChatSnapchat
javier ramirez @supercoco9 https://teowaki.com
ntopngLogStash
![Page 28: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/28.jpg)
javier ramirez @supercoco9 https://teowaki.com
Non intrusive metrics
Capture data really fast.
Then process the data on the background
![Page 29: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/29.jpg)
javier ramirez @supercoco9 https://teowaki.com
![Page 30: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/30.jpg)
javier ramirez @supercoco9 https://teowaki.com
![Page 31: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/31.jpg)
Gzip to AWS S3/Glacier
orGoogle Cloud Storage
javier ramirez @supercoco9 https://teowaki.com
![Page 32: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/32.jpg)
javier ramirez @supercoco9 https://teowaki.com
![Page 33: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/33.jpg)
HadoopCassandraHadoop + Voldemort + KafkaHBase…Amazon Redshift
javier ramirez @supercoco9 https://teowaki.com
tools we considered:
![Page 34: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/34.jpg)
but...
hard to set up and monitor
not interactive enough
expensive cluster
javier ramirez @supercoco9 https://teowaki.com
![Page 35: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/35.jpg)
Our choice:
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
javier ramirez @supercoco9 https://teowaki.com
![Page 36: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/36.jpg)
Based on “Dremel”
Specifically designed for interactive queries over petabytes of real-time data
javier ramirez @supercoco9 https://teowaki.com
![Page 37: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/37.jpg)
loading data
You just send the data intext (or JSON) format
javier ramirez @supercoco9 https://teowaki.com
![Page 38: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/38.jpg)
SQL
javier ramirez @supercoco9 https://teowaki.com
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
![Page 39: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/39.jpg)
specific extensions for analytics
javier ramirez @supercoco9 https://teowaki.com
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
correlations
![Page 40: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/40.jpg)
Things you always wanted to try but were too scared to
javier ramirez @supercoco9 https://teowaki.com
select count(*) from publicdata:samples.wikipedia
where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;
223,163,387Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
![Page 41: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/41.jpg)
columnar storage
javier ramirez @supercoco9 https://teowaki.com
![Page 42: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/42.jpg)
highly distributed execution using a tree
javier ramirez @supercoco9 https://teowaki.com
![Page 43: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/43.jpg)
web console screenshot
javier ramirez @supercoco9 https://teowaki.com
![Page 44: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/44.jpg)
javier ramirez @supercoco9 https://teowaki.com
country segmented traffic
![Page 45: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/45.jpg)
javier ramirez @supercoco9 https://teowaki.com
window functions
![Page 46: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/46.jpg)
javier ramirez @supercoco9 https://teowaki.com
our most active user
![Page 47: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/47.jpg)
new users per month
![Page 48: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/48.jpg)
javier ramirez @supercoco9 https://teowaki.com
10 request we should be caching
![Page 49: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/49.jpg)
javier ramirez @supercoco9 http://teowaki.com
5 most created resources
select uri, count(*) total from stats where method = 'POST' group by URI;
![Page 50: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/50.jpg)
javier ramirez @supercoco9 http://teowaki.com
...but
/users/javier/shouts/users/rgo/shouts/teams/javier-community/links/teams/nosqlmatters-cgn/links
![Page 51: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/51.jpg)
javier ramirez @supercoco9 http://teowaki.com
5 most created resources
![Page 52: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/52.jpg)
![Page 53: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/53.jpg)
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday} 20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
![Page 54: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/54.jpg)
![Page 55: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/55.jpg)
![Page 56: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/56.jpg)
![Page 57: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/57.jpg)
NO
![Page 58: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/58.jpg)
Automation with Apps Script
Read from bigquery
Create a spreadsheet on Drive
E-mail it everyday as a PDF
javier ramirez @supercoco9 https://teowaki.com
![Page 59: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/59.jpg)
![Page 63: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/63.jpg)
![Page 64: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/64.jpg)
bigquery pricing
$26 per stored TB1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB1 full scan = 160 MB1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03
javier ramirez @supercoco9 https://teowaki.com
![Page 65: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/65.jpg)
£0.054307 / month*
per 1MM rows
*the 1st 1TB every month is free of charge
javier ramirez @supercoco9 https://teowaki.com
![Page 66: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/66.jpg)
1. non intrusive metrics2. keep the history3. avoid vendor lock-in4. interactive queries5. cheap6. extra ball: real time
javier ramirez @supercoco9 https://teowaki.com
![Page 67: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/67.jpg)
ig
![Page 68: API analytics with Bigquery, by Javier Ramirez from teowaki](https://reader033.vdocuments.site/reader033/viewer/2022052505/554f59a5b4c905524c8b53f0/html5/thumbnails/68.jpg)
Find related links at
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
Thanks!תודה
Javier Ramírez@supercoco9