Download - OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself
![Page 1: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/1.jpg)
www.dataiku.com
Take back control of your
Web Tracking
@ClementStenac CTO, Dataiku
![Page 2: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/2.jpg)
www.dataiku.com
Give me dashboards !
![Page 3: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/3.jpg)
www.dataiku.com
Choose one
Raw data Do what you want
Your money
Access to raw data is a premium feature
![Page 4: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/4.jpg)
www.dataiku.com
Who cares about raw data ?
• SAAS analytics are full-featured
• Custom variables to link with your backend data
• Did you really join all data for your future needs ?
• Do you have access / want to push to the JS all necessary data ?
• What kinds of analysis will you do later on ?
![Page 5: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/5.jpg)
www.dataiku.com
A real example
Segmentation and tracking user-satisfaction
Raw tracking
data
User-level stats
User base segmentation
Metrics per segments
Tracking over time
TB
GB
![Page 6: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/6.jpg)
www.dataiku.com
User-level data
![Page 7: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/7.jpg)
www.dataiku.com
Clustering
![Page 8: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/8.jpg)
www.dataiku.com
Labeling
Search for a specific Topic
Newcomer from Google
News
Foreigner Discovering The
Site
Fan who loves to comment
Home Page Wanderer
Dark Bot (Competitor?)
Here you need your business intelligence
![Page 9: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/9.jpg)
www.dataiku.com
Compute metrics per segment
Search for a specific Topic
Newcomer from Google
News Foreigner
Discovering The Site
Fan that loves to comment
Home Page Wanderer
Dark Bot (Competitor?)
0.3€ per session
0.23€ acquisition costs
``
`
13k sessions
1.3€ per session
0.23€ acquisition costs
938k sessions
938k sessions
0.3€ per session
0.23€ acquisition costs
738k sessions
0.83€ per session
0.73€ acquisition costs 68k sessions
0.3€ per session
1.23€ acquisition costs
1k sessions
0€ per session
0€ acquisition costs
Here you need to cross with your CRM
![Page 10: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/10.jpg)
www.dataiku.com
Track metrics over time
Search for a specific Topic
Newcomer from Google
News
Foreigner Discovering The
Site
Fan that loves to comment
Home Page Wanderer
Dark Bot (Competitor?)
Using your already-computed segments
Damn our latest
release has diverging
effects on segments
![Page 11: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/11.jpg)
www.dataiku.com
A few other examples
• Churn prediction and explanation
• Customer lifetime value prediction
![Page 12: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/12.jpg)
www.dataiku.com
OK
I WANT TO
DO IT
![Page 13: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/13.jpg)
www.dataiku.com
So, I have these Apache logs
• First level of web tracking
• "Nothing required"
![Page 14: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/14.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 1 : Identify a visitor
• IP ?
• NAT / Proxy
• Not everyone has a public IP address
• IP + user-agent ?
• Big companies !
![Page 15: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/15.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 2 : Re-create sessions
• Using expiration times
• Advanced SQL / Hive / …
makes this easier
![Page 16: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/16.jpg)
www.dataiku.com
Are backend logs a solution ?
Challenge 3 : single-page webapps
• Track behaviour within each page
• Track events, not pages
Also: getting logs from IT is sometimes another challenge
![Page 17: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/17.jpg)
www.dataiku.com
Client-side tracking
• visitor_id and session_id handled with cookies
• Tracking page loads and various events
• Historically, "tracking" = fetching a 1x1 image
• AJAX
www.website.com
Browser
tracker.com
JS tracking code
Tracking calls
![Page 18: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/18.jpg)
www.dataiku.com
Are cookies good for your (web) health ?
• Each cookie belongs to a domain (and its subdomains)
• Who can write a cookie ?
– The HTTP server, who becomes owner (via the Set-Cookie HTTP header)
– JS code running on the "owner" domain
• Who can read a cookie ? – The owner HTTP server (sent by the browser) – JS code running on the "owner" domain
![Page 19: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/19.jpg)
www.dataiku.com
First-party cookies
• Set by the originating server (HTTP) or JS code
• Belong to the originating domain
• Sent by HTTP to the originating domain only
• Readable by JS code
www.website.com
Browser
Cookies for www.website.com: None
tracker.com
GET / Cookies: none
Fetch tracking script
Tracking JS code: read cookies for www.website.com Tracking JS code: create visitor id and set cookie
Contents
![Page 20: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/20.jpg)
www.dataiku.com
First-party cookies
• Set by the originating server (HTTP) or JS code
• Belong to the originating domain
• Sent by HTTP to the originating domain only
• Readable by JS code
www.website.com
Browser
tracker.com
GET /track?visitor_id=d37ecba Cookies: None
JS code: send AJAX request to tracker.com with visitor_id
Cookies for www.website.com: visitor_id=d37ecba
![Page 21: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/21.jpg)
www.dataiku.com
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain
• Not send by HTTP to the originating domain (does not belong)
• NOT readable by JS code (does not belong)
www.website.com
Browser
tracker.com
GET / Cookies: none
Fetch tracking script
Contents
Cookies for www.website.com: None
Cookies for tracker.com: None
![Page 22: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/22.jpg)
www.dataiku.com
www.website.com
Browser
Cookies for www.website.com: None
tracker.com
Cookies for tracker.com: None
GET /track Cookies: None
200 OK Set-Cookie: visitor_id=33d7
Tracker code: assign visitor_id
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain
• Not send by HTTP to the originating domain (does not belong)
• NOT readable by JS code (does not belong)
![Page 23: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/23.jpg)
www.dataiku.com
Third-party cookies
• Set (in HTTP) by the tracker's domain – Belong to the tracker's domain
• Not send by HTTP to the originating domain (does not belong)
• NOT readable by JS code (does not belong)
www.website.com
Browser
tracker.com
Cookies for tracker.com: visitor_id=33d7
GET /track Cookies: visitor_id=33d7
200 OK
Tracker code: read visitor_id
Cookies for www.website.com: None
![Page 24: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/24.jpg)
www.dataiku.com
First party cookie
• Tracks on a single website • Requires JS code for tracking • Reduced privacy impact:
No exchange of information between sites
• Usage: track your user's behaviour
Third party cookie
• Tracks across all websites using the same tracker
• More frowned upon
• Usage: generally, ads but also multi-website
Why each ?
Rarely blocked (used for logins)
Blocked by up to 40% visitors
![Page 25: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/25.jpg)
www.dataiku.com
What are your obligations ?
With ALL cookies
• You should ask user whether he wants cookies
• Even non-tracking related cookies
• Yes, even login-related ones
![Page 26: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/26.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies
• Obey the Do-Not-Track header
www.website.com
Browser
tracker.com
GET /track Cookies: None DNT: 1
200 OK
Tracker code: DO NOTHING
![Page 27: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/27.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies
• Provide an opt-out URL
• Allows the user to /optin , /optout or /status
See in action : www.youronlinechoices.com
![Page 28: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/28.jpg)
www.dataiku.com
What are your obligations ?
With third party cookies
• Provide a P3P policy
• Else, older IE blocks you
"What are you doing with my data ?"
Looks like this:
CP="IDC DSP COR ADM DEVi TAIi PSA PSD IVAi IVDi CONi HIS OUR IND CNT"
![Page 29: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/29.jpg)
www.dataiku.com
Tracking in mobile apps
• Preserve battery
– Each network call is costly
– Do not track everything synchronously
• Network access is intermittent
– Queue events and wait for network access
![Page 30: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/30.jpg)
www.dataiku.com
So, what are my choices ?
• You might really want to be your own web tracker
• Most used open source Webtracker : Piwik
• Provides both raw data and nice dashboards – MySQL backend
– Raw data via API
– Slightly less suited for analytics
![Page 31: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/31.jpg)
www.dataiku.com
WT1
YOUR OWN
TRACKER
IN MINUTES
![Page 32: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/32.jpg)
www.dataiku.com
WT1
An open source (Apache License) server to build your own web tracking
https://github.com/dataiku/wt1
• Designed to provide you with raw data, directly usable for analytics
• Very high performance and scalability
![Page 33: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/33.jpg)
www.dataiku.com
Features
• 1st or 3rd party cookies – Handling of DNT and opt-out
– Helps handling P3P
• Track events or pages with key-value data
• Visitor-scope and session-scope variables
• "Live view" debugging console
![Page 34: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/34.jpg)
www.dataiku.com
Features
• Dashboards: None
• Events processing and storage – Filesystem, S3
– Event queues: Flume
– Custom processors
• JSON API for custom tracking
• iOS library
![Page 35: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/35.jpg)
www.dataiku.com
Architecture
Client-side JS tracker
iOS library
• 1st or 3rd party cookies
• Event-level tracking
• Automatic batching • Queuing to deal with
network interruptions
WT1 Server
Raw storage • Filesystem • S3
Event processors: • Real-time aggregations • Custom code
Event queues • Flume • Kafka, RabbitMQ, …
• Java • > 20K events / second • Handles DNT, P3P, opt-out, …
JSON POST
![Page 36: OWF14 - Big Data Track : Take back control of your web tracking Go further by doing it yourself](https://reader033.vdocuments.site/reader033/viewer/2022060122/559640781a28ab58558b4657/html5/thumbnails/36.jpg)
www.dataiku.com
Future work
• Android library
• More event queues supported OOTB
– Kafka
– RabbitMQ
• Avro storage