appboy analytics - nyc mug 11/19/13
TRANSCRIPT
![Page 1: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/1.jpg)
Appboy Analytics Jon Hyman NY MongoDB User Group, November 19, 2013 eBay NYC
@appboy @jon_hyman
![Page 2: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/2.jpg)
A LITTLE BIT ABOUT US & APPBOY
Jon Hyman CIO :: @jon_hyman !
Appboy is a mobile relationship management platform for apps
(who we are and what we do)
Harvard Bridgewater
![Page 3: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/3.jpg)
Appboy improves engagement by helping you understand your app users• IDENTIFY - Understand demographics,
social and behavioral data
• SEGMENT - Organize customers into
groups based on behaviors, events, user
attributes, and location
• ENGAGE - Message users through
push notifications, emails, and multiple
forms of in-app messages
![Page 4: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/4.jpg)
Use Case: Customer engagement begins with onboarding
Urban Outfitters textPlus Shape Magazine
![Page 5: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/5.jpg)
Agenda
• How to quickly store time series data in MongoDB using flexible schemas
• Learn how flexible schemas can easily provide breakdowns across dimensions
• Counting quickly: statistical analysis on top of MongoDB queries
![Page 6: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/6.jpg)
What kinds of analytics does Appboy track?• Lots of time series data
• App opens over time
• Events over time
• Revenue over time
• Marketing campaign stats and efficacy over time
![Page 7: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/7.jpg)
What kinds of analytics does Appboy track?
• Breakdowns* • Device types
• Device OS versions
• Screen resolutions
• Revenue by product
* We also care about this over time!
![Page 8: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/8.jpg)
What kinds of analytics does Appboy track?
• User segment membership • How many users are in each segment?
• How many can be emailed or reached via push notifications?
• What is the average revenue per user in the segment?
• Per paying user?
![Page 9: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/9.jpg)
Pre-aggregated Analytics:
APP OPENS OVER TIME
![Page 10: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/10.jpg)
Typical time series collection
Log a new row for each open received !{! timestamp: 2013-11-14 00:00:00 UTC,! app_id: App identifier!}!!db.app_opens.find({app_id: A, timestamp: {$gte: date}})!
Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
Pro: Really, really simple. Easy to add attribution to users.
![Page 11: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/11.jpg)
Fewer documents with pre-aggregation iteration 1
Create a document that groups by the time period ! {! app_id: App identifier,! date: Date of the document,! hour: 0-23 based hour this document represents,! opens: Number of opens this hour! }!!db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}})
Con: We never care about an hour by itself. We lose attribution.
Pro: Really easy to draw histograms
![Page 12: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/12.jpg)
Fewer documents with pre-aggregation iteration 2Create a document by day and have each hour be a field ! {! app_id: App identifier,! date: Date of the document,! total_opens: Total number of opens this day,! 0: Number of opens at midnight,! 1: Number of opens at 1am,! ...! 23: Number of opens at 11pm! }!! db.app_opens.update(! {date: D, app_id: A}, ! {$inc: {“0”:1, total:1}}! )
Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
![Page 13: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/13.jpg)
Fewer documents with pre-aggregation iteration 2
• What about looking at different dimensions?
• App opens by device type (e.g., how do iPads
compare to iPhones?)
• Demographics (gender, age group)
![Page 14: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/14.jpg)
Solution!
FLEXIBLE SCHEMAS!
![Page 15: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/15.jpg)
Fewer documents with pre-aggregation iteration 3
!{! app_id; App identifier,! date: Date of the document,! totals: {! app_opens: Total number of opens this day,! devices: {! "iPad Air": Total number of opens on the iPad Air,! "iPhone 4": Total number of opens on the iPhone 4,! },! genders: {! male: Total number of opens from male users,! female: Total number of opens from female users! },! ...! },! 0: {! app_opens: Number of opens at midnight,! devices: {! "iPad Air": Number of opens on the iPad Air at midnight,! "iPhone 4": Number of opens on the iPhone 4 at midnight,! },! ...! },! ...!}!!db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
Dynamically add dimensions in the document
![Page 16: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/16.jpg)
Pre-aggregated analytics
• Pros • Easily extensible to add other dimensions
• Still only using one document, therefore you can create
charts very quickly
• You get breakdowns over a time period for free
!
• Cons • Pre-aggregated data has no attribution
• Have to know questions ahead of time
Follow up: What if we wanted to look at a graph by age group?
![Page 17: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/17.jpg)
Pre-aggregated analytics summary
• Get started tracking time series data quickly
• You get breakdowns for free
• Adding dimensions is super simple
• No attribution, need to know questions ahead of time
• Don’t just rely on pre-aggregated analytics
![Page 18: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/18.jpg)
Counting quickly:
USER SEGMENTATION & STATISTICAL ANALYSIS
![Page 19: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/19.jpg)
User Segmentation
• A group of users who match some set of filters
![Page 20: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/20.jpg)
Counting quickly
Appboy shows you segment membership in real-time as you add/edit/remove filters. !
How do we do it quickly? !
We estimate the population sizes of segments when using our web UI.
![Page 21: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/21.jpg)
Counting quickly
Goal: Quickly get the count() of an arbitrary query !
Problem: MongoDB counts are slow, especially unindexed ones
![Page 22: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/22.jpg)
Counting quickly
{! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
10 million documents that represent people:
![Page 23: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/23.jpg)
Counting quickly
{! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
10 million documents that represent people:
• How many people like blue? • How many live in NYC and love pizza? • How many men have a shoe size less than 10?
![Page 24: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/24.jpg)
Big Question: How do you estimate counts?
Answer: The same way news
networks do it.
!
With confidence.
![Page 25: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/25.jpg)
Counting quickly
{! random: 4583,! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
Add a random number in a known range to each document. Say, between 0 and 9999.
Add an index on the random number: !db.users.ensureIndex({random:1})
![Page 26: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/26.jpg)
Counting quickly
Step 1: Get a random sample !I have 10 million documents. Of my 10,000 random “buckets”, I should expect each “bucket” to hold about 1,000 users. !E.g., !db.users.find({random: 123}).count() == ~1000!db.users.find({random: 9043}).count() == ~1000!db.users.find({random: 4982}).count() == ~1000
![Page 27: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/27.jpg)
Counting quickly
Step 1: Get a random sample !Let’s take a random 100,000 users. Grab a random range that “holds” those users. These all work: !db.users.find({random: {$gt: 0, $lt: 101})!db.users.find({random: {$gt: 503, $lt: 604})!db.users.find({random: {$gt: 8938, $lt: 9039})!db.users.find({$or: [! {random: {$gt: 9955}}, ! {random: {$lt: 56}}!])
Tip: Limit $maxScan to 100,000 just to be safe
![Page 28: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/28.jpg)
Counting quicklyStep 2: Learn about that random sample !db.users.find(! {! random: {$gt: 0, $lt: 101},! gender: “M”,! favorite_color: “blue”,! size_size: {$gt: 10}! }, !)!._addSpecial(“$maxScan”, 100000)!.explain()
Explain Result: !{! nscannedObjects: 100000,! n: 11302,! ...!} !
![Page 29: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/29.jpg)
Counting quickly
Step 3: Do the math !Population: 10,000,000 !Sample size: 100,000 !Num matches: 11,302 !Percentage of users who matched: 11.3% !Estimated total count: 1,130,000 +/- 0.2% with 95% confidence
![Page 30: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/30.jpg)
Counting quickly
Step 4: Optimize !• Limit $maxScan to (100,000/numShards) to be even faster !
• Cache the random range for a few hours !
• Add more RAM (or shards) !
• Cache results to not hit the database for the same query
![Page 31: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/31.jpg)
Counting quickly
Step 5: Improve !• Get more than one count: use the aggregation framework on top of the population’s sample size
• Work around all sorts of Mongo bugs :-(
![Page 32: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/32.jpg)
Summarize
• Pre-aggregated analytics
• Create a document that represents event occurrences
in some time period
• Takes full advantage of MongoDB’s flexible schemas
• Not a catch-all for analytics, you should still store event
data
![Page 33: Appboy analytics - NYC MUG 11/19/13](https://reader033.vdocuments.site/reader033/viewer/2022052905/55853a63d8b42a5e018b46fc/html5/thumbnails/33.jpg)
Summarize
• Counting quickly
• Estimate results of arbitrary queries using population
sample sizes
• Depending on your app, this could be a great way to
keep response time predictable as you scale