Ben Porterfield Founder, VP Engineering
Business Analytics: Asking the Right
Questions
B U S I N E S S I N T E L L I G E N C E
Operational Control How many sales did I do today?
Understand & Improve Experience Are users engaging? Do they like the new features?
Make business decisions Should we start delivering in a new city?
—ANDREW LEONARD Salon
“Data indicated that the same subscribers who loved the original BBC production also gobbled down movies starring Kevin Spacey or directed by David Fincher”
1 1 Tracking Data
2 Storing Data
3 Merging Data (ETL)
4 Retrieving Data
5 Analysis & Decision Making
The Analytical Process
Tracking Data
What To Track?
Views Clicks
In-app actions
E v e n t
Users Orders
Inventory
T r a n s a c t i o n a l
Embed in product process
Server-side too
Taxonomy matters
Tracking - Event Data
Every new feature should come with events
Lots of non-transactional events happen on server
Big flat event space becomes unwieldy
Storing Data
Go with SQL
Store all states
Keep it clean
Storing - Transactional Data
NoSQL could be a burden long-term
Even offline processes Messy schema = complicated analytics
—MICHAEL ERASMUS Back-end Engineer, Buffer
“We were relying on MongoDB…while it was easier for developers to play with the data, it became a hurdle for other team members.”
Own it Use eco-system too
Store all the IDs
Storing - Event Data
Or, at a minimum, be able to get it
Lots of great SaaS event platforms
Need to be able to correlate events to transactions
Merging Data
O t h e r D a ta?
Tra n s a c t i o n a l D a ta E ve n t D a ta
Ra w Q u e r i e s B i z- Us e r To o l s
You should combine transaction and event data, +more
Use an analytical database
Redshift is current leader
Difficult - data is heavy
A p p l i ca t i o n
WITH user_order_activity AS ( SELECT user_id, age FROM ORDERS GROUP BY user_id) SELECT AVG(users.age) as average_age_of_purchaser FROM user_order_activity LEFT JOIN users ON user_order_activity.user_id = users.user_id
SUMMARY
Traditional Approach
OLAP / Data Summaries S I L O E D
Restricted Q&A L I M I T E D
I
G
M N
L
D
Q
B A
P
R
S
Q
D I F F I C U LT & C O M B E R S O M E ETL - Heavy Transformation
E N D US E R B I T E A M E T L T E A M E D W T E A M
W A N T T O A S K N E W Q U E S T I O N S ?
A B
? C F
X
E B
A EVENT DATA
TRANSACTIONAL DATA
Modern Approach
3 R D PA RTY A P P
A P I
A N Y D E V I C E
Transformation at Query F L E X I B L E
Anywhere for Anyone A C C E S S I B L E C O N S O L I D A T E D
Simple Extract & Load
I
G
M N
D
Q
A
P
R
S
Q
T U W
X
G Q
U
S
A
Z
Data Modeling Layer A G I L E
D A T A T E A M E N D U S E R S
Data Model
- name: first_purchasers type: single_value base_view: orders measures: [orders.first_purchase_count] listen: - name: orders_by_day_and_category title: "Orders by Day and Category" type: looker_area base_view: order_items
I N N O V A T I O N
TRANSACTIONAL DATA
EVENT DATA
Z
B
Q A A Z
M P P | R E D S H I F T | I M P A L A
Asana’s Data Infrastructure
Retrieving Data
—TODD LEHR SVP Engineering, Dollar Shave Club
“We have a developer name Juan and any reports we needed would flow through him.”
—TODD LEHR SVP Engineering, Dollar Shave Club
“When he got backlogged, our team didn’t have access to the data immediately.”
—ANNIE CORBETT Business Intelligence Analyst, Venmo
“Initially whenever we were asked for data, we would write a custom script…”
—ANNIE CORBETT Business Intelligence Analyst, Venmo
“..and then repeat this process whenever the product team wanted to extend the timeframe.”
What’s selling? What colors and sizes is it selling in?
What’s getting returned? Is there a particular size/color?
Is there a product people buy first that increases their likelihood of becoming a repeat customer?
Questions from a retail buyer at e-commerce store:
Get them the tool
Decisions vs. data science
Game-changing insights
Self-Service is Key
People with questions are running the businesses.
“Should we open a new market in Maine?”
Don’t only come from analyst group
Analysis and Decision Making
1 1 Clearly define success metrics
2 Look for low-hanging fruit
3 Go one level deeper
Analysis and Decision Making
Analysis and Decision Making: Success Metrics
Focus on desired outcome What do you want users to experience?
Measure Engagement In most cases this is first-line business analytics
Measure Retention Are people coming back?
S U C C E S S M E T R I C S
H O W T O T R A C K E N G A G E M E N T ?
Not with page views Usually not even with time on page
Upworthy’s attention minutes Lots of indicators (mouse, video, etc)
Looker’s approximate usage Any event in 2 minute window
Deriving Approximate Usage
SELECT event.created_at AS created_date, event.user_id as user_id, COUNT(*) AS count, COUNT(DISTINCT CONCAT( CONCAT(event.user_id,'|',event.user_browser_id), FLOOR(UNIX_TIMESTAMP(event.created_at)/(60*2)) ) )*2 AS approximate_usage_in_minutes FROM event GROUP BY created_date, user_id
created_date user_id count approximate_usage
1/10 1 123 100 minutes
1/10 2 228 50 minutes
1/10 3 45 80 minutes
Derived Tables
SELECT orders.user_id as user_id COUNT(*) as lifetime_orders MIN(orders.created_at) as first_order MAX(orders.created_at) as latest_order COUNT(DISTINCT DATE_TRUNC('month’)) as
distinct_months_with_orders FROM orders GROUP BY user_id
Transactional
Event
Analytical
Derived Table
Insights
Start simple
Most useful at row level
Great for cohorts and sessionization
Derived Tables
Subselects until slow, SQL on cron works surprisingly well
Don’t roll up data, pre-compute facts
Tiered derived dimension vs. some other metric
Derived Table - User Order Facts
SELECT orders.user_id as user_id COUNT(*) as lifetime_orders MIN(orders.created_at) as first_order MAX(orders.created_at) as latest_order COUNT(DISTINCT DATE_TRUNC('month’)) as
distinct_months_with_orders FROM orders GROUP BY user_id
user_id lifetime_orders first_order latest_order distinct_months_with_orders
1 10 1/10/15 2/14/15 2
Derived Table + Sourcing
Derived Table + Sourcing
Churn Users that will likely never do X again
Usage How likely to purchase if they do X
Time to transaction How long till first X
Retention Are users coming back
??? Invent a metric
Repeat buyers What’s different about them
Pay/Charge Mistake.
It was clear some users were
accidentally paying instead of charging, but it wasn't clear
how widespread the problem was and
whether it was worth prioritizing a fix
Inventing Metrics
Identify behavior
Measure % of population
Experiment
Inventing Metrics
Can be good or bad – just something possibly significant
Who is doing this thing? Ability to play with numbers is crucial
Analysis and Decision Making: Low-hanging Fruit
This is the kind of very visual, very
data‑driven piece of analysis that
helps us think, "Is opening the sale at
noon the right decision?”
???
Low-hanging Fruit
Out of stocks are huge detractors from
the customer experience - it sucks ordering something and then not getting
it - as well as revenue we failed to
capture
Low-hanging Fruit
Analysis and Decision Making: One Level Deeper
While this immediate insight might have led us to focus on small groups, this didn’t match our expectations of
people planning an outing on a Friday
night, prompting us to look further.
One Level Deeper
2 3 4
Time To Book
2 3 4
Group Size
We analyze all the platform data
available - When someone attempts to sign, completes the signup, pushes an app, has spend,
etc
One Level Deeper
Even though it looks like we were
having nice incremental
growth, looking into the details we see
some things to look into further
One Level Deeper
Don’t confuse an increase in a metric with success.
Put data in analytical database
Give business users tool
Define success metrics
Takeaways
Make sure it’s fast and speaks SQL
Empower them to answer their own questions
Focus on engagement and retention
Ben Porterfield Founder, VP Engineering