analytics at motorola
TRANSCRIPT
Analytics at Motorola, a journey to enable
self-service using Google tools
Patrick Deglon, June 2015
After a PhD in Particle Physics and 10 years at the University of Geneva studying the creation of the Universe, Patrick spent the next decade driving business insights at eBay and Motorola Mobility.
At eBay, he led significant improvements in marketing effectiveness by developing methods to measure incremental sales, and by running large scale experiments on Internet marketing channels.
At Motorola Mobility, he raised the bar in Analytics and on-board open Google tools and technologies including Google Docs, Big Query, App Engine and Compute Engine.
In June 2015, he founded Deglon Consulting to help companies adopt the latest technologies in Cloud computing as well as integrate sound analytical methodologies to measure business impact and marketing incrementality.
He is married with two kids and recently moved to Sarasota, Florida
Patrick Deglon Bio
2
Agenda
● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + gChart + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
3
Imagine a world...
… where information is ubiquitous (anytime & anywhere)
… where buildings can recognize your presence
… where even streetlights are connected to Internet
Welcome to a
digital world
Mobile was a revolution, but Mobile is an outdated concept.
The Digital World (Cloud, Internet, World’s Information, Digital Personal Assistant…) will be available everywhere: phones, watches, glasses, cars, appliance, microchip implant, ...
9
Evolution of mankind
1973: First hand-held portable telephone1989: Web proposal2009: First microchip implant...
Homo Sapiens
Homo Technicus
10
● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
11
1928 1936 1943 1955 1969World’s first
portable FM two-way
radio
Motorola introduced
Police Cruiser Radio Receiver
World’s first high-power transistor
in commercial production
First words from moon relayed via Motorola
radio
Motorola: 80+ YEARS OF INNOVATION1947 1963
World’s first truly
rectangular color TV
tube
1983 19961991World’s first commercial handheld
cellular phoneDynaTAC 8000X
weighed 28-ounces (794 grams)
World’s first GSM cellular
system
World’s first dual-mode
cellular phone
The 3.1 ounce (88 grams) StarTac©
wearable cellular phone is the
world’s smallest and lightest
1990World’s first
HDTV technical standard
1973Demonstrated prototype of the DynaTAC
portable cellular system
World's first handset , iDEN i1000plus, to
combine a digital phone, two-way radio, Internet microbrowser,
e-mail, fax and two-way messaging
1999 2000World’s first
general packet radio service
(GPRS) wireless phone for always on
Internet access
2004 2006 2009Iconic RAZR V3 wireless
phone introduced
MING smart phone recognizes
10,000+ handwritten characters
from Chinese alphabet
20122002World’s first
wireless cable modem gateway
introduced
Motorola DROID #1 on
Time’s Top Ten of 2009
2013 2015Launch Moto X, Moto G
Fast upgrades
Galvin Manufacturing Corp
Moto EMoto 360
12
Motorola Cloud Customers Ecosystem
WebProduct
Sales
Business Operation
CustomerSupport
Partners & Carriers
MotoMaker
Consumers: Phones, Wearables &
Companion ProductsInternal Business
Teams
Marketing
FinanceEngineering
Motorola Cloud
13
Motorola Cloud Applications & Services
Infrastructure as a Service
On-Device Applications & Services
Web Applications & Software as a Service
Platform as a Service
Cloud Applications & Services
14
Google Cloud Platform (GCP) 101
GAEGoogle App Engine
GCEGoogle Compute Engine
GCSGoogle Cloud Storage
BQGoogle Big Query
GAGoogle Analytics
Virtual Linux/Windows Server
Web Server with automatic scaling
“FTP”
Big Data warehouse, public version of Dremel that is powering Google Search
Website, Mobile and IoT tracking & analysis
15
Confluence’s Data Wiki
OSQA’s FAQ (“Stackoverflow”)
Data & Analytics Summits
Solution Engineering
Analytics EcosystemDevice Instrumentation
Check it out:→ Android Settings → Motorola Privacy
⌧ Help Improve Motorola Products (On/Off) ⌧ Moto Care (On/Off)
Motorola Big Data Environment
Motorola Cloud (GAE/GCE) Big Data (BQ)
Moto Insights (GAE)
BigFeed (GAE)
16
● Industry overview (mobile)
● Motorola Example
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
17
How to provide a global source of truth,
available on any form factor with an outreach
mentality?
Existing Situation
- Numerous (conflicting) sources of truth- Too many variations of same data cube- “Table in your face” approach- No global business definition- No curation of manually entered data points - Report accessible on an internal portal only (through VPN)- No mobile form factor
Daily Activations Report
18
Motorola Factory# Shipments
Distribution Channels# Sales
First Usage# Activations
Simplified Business Flow
Key Performance Indicators
19
Motorola Factory# Shipments
Distribution Channels# Sales
First Usage# Activations
Google BigQuery
MotorolaCloud
Insights
...
Data Flow
Motorola IT
20
Demo Daily Report: Final Email
21
Features of Daily Report
● Get data (pivot) from BigQuery● Spreadsheet magic● Insights: WoW trends with statistics test, Key driver for
growth, Key milestone, internal QA tests ● Email● Embedded Chart● Scheduler
Demo: https://docs.google.com/spreadsheet/ccc?key=0AjgpL8JvOwsvdDJjV0s3NFphS3RnRzBXakNpZUR1ZGc#gid=21
22
• Assume sales follow a diffusion S-shape, i.e.
Description of the illustrative simulation
Marketing Word of mouth
ΔN
Nmax
N• Add random noise to theoretical daily activations (Poisson)
• Simulated daily activations (sales) for United States, Canada, Brazil, India, Russia, China, Germany and United Kingdom with various launch date per region
ΔN = a (Nmax - N) + b N (Nmax - N)
23
Step 1: Create a backbone table
SELECTCAL_DT,Country
FROM
ON A.Dummy=B.DummyWHERE
B.CAL_DT>=A.Launch_Date
motorola.com:sandbox:demo.backbone:INNER JOIN (SELECT Country, CASE WHEN Country IN ('United States','Canada') THEN '2013-08-01' WHEN Country IN ('Brazil','Russia','India','China') THEN '2013-10-01' ELSE '2013-12-01' END AS Launch_Date, GDP_USD/1e7 AS Scale, 1 AS DummyFROM [motorola.com:sandbox:pdeglon.countries]WHERE Country IN ('United States','Canada','Brazil','Russia','India', 'China','Germany','United Kingdom')) AS B
(SELECT CAL_DT, 1 AS Dummy FROM [motorola.com:sandbox:pdeglon.calendar]) AS A
X
BA
CK
UP
24
Step 2: Calculate KPI value over time
SELECTCAL_DT,Country,
‘Phone 123’ AS Model,INTEGER(Scale*
EXP(-POW( -150,2)/2/POW(75,2))/(75*SQRT(2*PI()))
) AS Daily_ActivationsFROM
[motorola.com:sandbox:demo.backbone]
motorola.com:sandbox:demo.baseline:
DATEDIFF(TIMESTAMP(CAL_DT),TIMESTAMP(Launch_Date))
...
Normal Distribution:
BA
CK
UP
25
Step 3: Add Random Noise
SELECTCAL_DT,Model,
Country,INTEGER(
Daily_Activations + SQRT(Daily_Activations) *SQRT(-2*LN(RAND()))*COS(2*PI()*RAND())
) AS Daily_ActivationsFROM
[motorola.com:sandbox:demo.baseline]
motorola.com:sandbox:demo.simulation:
Normal (Gaussian) Random Number (mu=0, sigma=1)
(pseudo) Poisson distribution for N=Daily_activation
BA
CK
UP
26
Step 4: Final Pivot for reportSELECT
CAL_DT,SUM(Daily_Activations) AS Total,SUM(CASE WHEN Country IN ('United States','Canada') THEN Daily_Activations ELSE 0 END) AS NA,SUM(CASE WHEN Country IN ('Brazil','Russia','India','China') THEN Daily_Activations ELSE 0 END) AS BRIC,SUM(CASE WHEN Country IN ('Germany','United Kingdom') THEN Daily_Activations ELSE 0 END) AS EU,SUM(CASE WHEN Country='United States' THEN Daily_Activations ELSE 0 END) AS UnitedStates,SUM(CASE WHEN Country='Canada' THEN Daily_Activations ELSE 0 END) AS Canada,SUM(CASE WHEN Country='Brazil' THEN Daily_Activations ELSE 0 END) AS Brazil,SUM(CASE WHEN Country='Russia' THEN Daily_Activations ELSE 0 END) AS Russia,SUM(CASE WHEN Country='India' THEN Daily_Activations ELSE 0 END) AS India,SUM(CASE WHEN Country='China' THEN Daily_Activations ELSE 0 END) AS China,SUM(CASE WHEN Country='Germany' THEN Daily_Activations ELSE 0 END) AS Germany,SUM(CASE WHEN Country='United Kingdom' THEN Daily_Activations ELSE 0 END) AS UnitedKingdom
FROM[motorola.com:sandbox:demo.simulation]
WHERECAL_DT<CURRENT_DATE()
GROUP BY 1ORDER BY 1 DESC
BA
CK
UP
27
Demo Daily Report: New Menu Item
28
Demo Daily Report: Edit Code (App Script)
29
Demo Daily Report: Example for adding a new menu item
30
Demo Daily Report: Key Query (pivot)
31
Demo Daily Report: Running the query
32
Demo Daily Report: Parsing the results
33
Demo Daily Report: Data Sheet
34
Demo Daily Report: Summary sheet
35
Demo Daily Report: Preparing email
36
Demo Daily Report: Email template sheet
37
Demo Daily Report: Preparing email
38
Demo Daily Report: Sending email
39
Demo Daily Report: Signal vs Noise and Main Drivers
40
Demo Daily Report: Key Milestones
41
Demo Daily Report: Final Email
42
Demo Daily Report: Scheduling
43
● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Wish List
● Q & A
AgendaShare of time
44
Big Query to Big Query ETL
BigFeedCheck-in
Data(PB)
StaggingData(TB) Reporting
Data(GB)
BigFeed
45
● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
46
Moto Insights: Democratizing Business Intelligence
Old Analytics Portal
1. Require VPN2. New Report takes weeks to develop3. New portal features takes months4. Tableau incompatibility with BigQuery5. Reports are produce by a centralized team6. Role management is becoming out of
control
1. Global access with App Engine2. Programmatic approach (SQL + metadata)3. Lightweight App Engine framework
(Go/AngularJS) using G API4. Google Charts and native BQ SQL5. Google Drive API6. Google Groups
Moto Insights (GAE+BQ+gChart)
47
Product Architecture
Big Querydatasets
Moto
Insights
AppEngine
Google Analytics
data
Device Instrumentation
App Engine
Tableaureports
Big FeedApp
Engine
Users, ReportsDatastore
Goo
gle
Driv
e
Users
Machine Learned
Models
gCha
rt +
D3
+ Ta
blea
u AP
I
48
Features of Moto Insights
● Responsive design● Report metadata, Chart widget metadata● Report sharing● Report viewing● Google Drive integration
49
Demo Moto Insights: Main menu & Responsive Design
responsive design
Front-end based on Bootstrap CSS and AngularJS
50
Demo Moto Insights: Report Details
51
Demo Moto Insights: 1st Chart Details
52
Demo Moto Insights: Chart Types
53
Demo Moto Insights: Dummy Example
54
● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
55
Enlightenment Questions for an AnalystWhen was this BigQuery table last refreshed?How often is it refreshed?How was it created? Which underlying data sources/tables is it using?Who created this table?Who knows how to use this table?Where can I find this great query I ran?Who knows how to use this tag?
How much bandwidth am I using in BigQuery?How much space are my tables using?How much does my usage of BigQuery cost?
www.holymolecartoon.com
56
How to track Big Query usage?Google does not provide a data feed on Motorola’s usage of BigQuery. However three API can help us:
bigquery.jobs.list
List all the Jobs in a specified project.
Note: use projection = full to get email of user
bigquery.jobs.get
Retrieve the specified job by ID.
We created an App Engine (Moto Monitor) to crawl Google API so we can recursively collect all queries ran (since mid 2013; for a specific list of projects). The queries are parsed to extract underlying tables used, and the data is stored in the App Engine datastore as well as in Big Query through the streaming API (every 15 minutes).
bigquery.projects.list
List all (visible) projects
57
Product Architecture
Moto Monitor App Engine (Web Service)
default module
web pages, CSS, JS, etc.
bqusage module
user requests
worker module
CRON jobs Big Querydatastorequeries/tables
information
GoogleAPIs
58
Moto Monitor Browsing
Browse a table
Browse a job
Browse a flow
https://moto-monitor.appspot.com/bq/info/{long table name}
e.g. https://moto-monitor.appspot.com/bq/info/motorola.com:analytics-data:activations.gcp_activations_shipments
https://moto-monitor.appspot.com/bq/jobinfo/{long job name}
e.g. https://moto-monitor.appspot.com/bq/jobinfo/bold-site-589:job_pz_J3anj2HjIz5AEX0_STXPtWb4
https://moto-monitor.appspot.com/bq/flow/{long table name}
e.g. https://moto-monitor.appspot.com/bq/flow/motorola.com:analytics-data:devices.asn_r12
Browse your usage
https://moto-monitor.appspot.com/bq/about/me
59
How often is activations.gcp_foundation3 refreshed?
60
How is activations.gcp_foundation3 being populated?
61
Example: 1000 files limit bug for BQ Load (Apr 17th)
SELECT STRING(MSEC_TO_TIMESTAMP(creationtime)) AS creationtime, id, User_email, DestinationProjectId, DestinationDatasetId, InputFiles FROM [moto-monitor:usage.bq_raw] WHERE creationtime>=TIMESTAMP_TO_MSEC(TIMESTAMP('2015-04-16 00:00:00')) AND JobType='Load' AND InputFiles>=1000ORDER BY creationtime
Project: motorola.com:ds-prodJob:job_qrH3tNUgKT_R84PKWJXM_AATRPMState: DONE
StatisticsCreationTime: Thu Apr 16 12:30:16 2015StartTime: Thu Apr 16 12:33:41 2015EndTime: Thu Apr 16 12:36:08 2015Gate Time: 205259Run Time: 146249TotalBytesProcessed: 0InputFileBytes: 134834356368InputFiles: 2048OutputBytes: 39957719968OutputRows: 89549136Dry Run flag: false
LoadAllowJaggedRows: falseAllowQuotedNewlines: falseIgnoreUnknownValues: falseLoad.MaxBadRecords: 0SkipLeadingRows: 0SourceFormat:NEWLINE_DELIMITED_JSONSourceUris: [gs://dspipeline-event-export/transfer/20150416_12/job_gdi_1429186260434*]
Project: motorola.com:ds-prodJob: job_XWekiaqjKnHg2kpZQZp_BJbclS0State: DONE
StatisticsCreationTime: Fri Apr 17 12:27:02 2015StartTime: Fri Apr 17 12:28:48 2015EndTime: Fri Apr 17 12:30:02 2015Gate Time: 105788Run Time: 74061TotalBytesProcessed: 0InputFileBytes: 67622530704InputFiles: 1000OutputBytes: 20037554506OutputRows: 44919153Dry Run flag: false
LoadAllowJaggedRows: falseAllowQuotedNewlines: falseIgnoreUnknownValues: falseLoad.MaxBadRecords: 0SkipLeadingRows: 0SourceFormat: NEWLINE_DELIMITED_JSONSourceUris: [gs://dspipeline-event-export/transfer/20150417_12/job_gdi_1429272660897*]
Moto Monitor portal Moto Monitor BQ data
62
Which table are impacted by motorola.com:analytics-data:devices.asn_r12?
63
Where can I find this great query I ran the other day?https://moto-monitor.appspot.com/bq/about/me
https://moto-monitor.appspot.com/bq/about/me?before=2014-12-09
https://moto-monitor.appspot.com/bq/about/me?before=2014-12-09&limit=500
https://moto-monitor.appspot.com/bq/about/me?before=1418223600&limit=500
64
Moto Monitor is available in BigQuery tooId STRING Unique Identifier for the job (ProjectId:JobId)
ProjectId STRING Project Id under which the job was run
JobId STRING Job Id
CreationTime INTEGER Unix Time when the job was submitted
StartTime INTEGER Unix Time when the job started
EndTime INTEGER Unix Time when the job finished
GateTime INTEGER Gating time in ms between CreationTime and StartTime
RunTime INTEGER Running time in ms between StartTime and EndTime
TotalBytesProcessed INTEGER Total Bytes scanned for the job
CacheHit BOOLEAN Boolean flag to indicate if cache was used
User STRING Email of user running the job
MD5 STRING MD5 of the full query
Query STRING Query truncated to 18,000 characters
Status STRING Status of the job (=DONE here)
AllowLargeResults BOOLEAN If true, allows the query to produce arbitrarily large result tables at a slight cost in performance. Requires destinationTable to be set.
Priority STRING Specifies a priority for the query. Possible values include INTERACTIVE and BATCH.
UseQueryCache BOOLEAN Boolean flag to indicate if cache was requested in the job
DestinationProjectId STRING Define the project where the results of the query will be written.
DestinationDatasetId STRING Define the dataset where the results of the query will be written.
DestinationTableId STRING Define the table where the results of the query will be written.
ErrorLocation STRING Specifies where the error occurred, if present.
ErrorMessage STRING A human-readable description of the error.
ErrorReason STRING A short error code that summarizes the error.
moto-monitor:usage.bq_raw
65
Who was using the tag MOT_DEVICE_STATS_L1 in the last 7 days?
66
How much bandwidth am I using in BigQuery?Use the view moto-monitor:usage.bq_view
67
Beyond Queries, we also scan Tables
bigquery.projects.list
List projects visible
bigquery.tables.list
List tables within a dataset
bigquery.datasets.list
List datasets within a project
bigquery.tables.get
Get details about a table
datastorequeries
information
user email
store details
68
A snapshot of Table statistics is kept as wellmoto-monitor:usage.daily_table (daily snapshot) or moto-monitor:usage.snapshot (latest manual snapshot with self-destruction after 3 days)
CreationTime INTEGER Table Creation Time (Unix)
Description STRING Table Description
Etag STRING NULLABLE
ExpirationTime INTEGER Expiration Time (Unix)
FriendlyName STRING Friendly name
Id STRING Unique Id
LastModifiedTime INTEGER Last Modified Time (Unix)
NumBytes INTEGER Number of bytes
NumRows INTEGER Number of rows
Fields STRING Schema definition
ProjectId STRING Project Id
DatasetId STRING Dataset Id
TableId STRING Table Id
Type STRING TABLE or VIEW
View STRING View Query
User STRING Last user who populated
Query STRING Last query used to populate
JobId STRING Last Job Id to populate
RefreshedTime INTEGER Last time it was populated
SnapshotTime STRING Snapshot timestamp
69
How did the size of a dataset grow over time?
70
How much space are my tables using?
Bigquery Storing Cost = $0.02 per GB per month,
i.e. $6.83 per TB per day , i.e. $2,458 per TB per year
71
How much does my usage of BigQuery cost?
$0.02 per GB per month$6.83 per TB per day
$2,458 per TB per yearStorage Cost
Query Cost $5 per TB$20,000 per month
for 5 GB/s unit,i.e. $1.58 per TB*
On-demand Reserved capacity
* Note: for continuous usage of the 5 GB/s bandwidth
72
How much does my usage of BigQuery cost?Assuming that the Motorola bandwidth is elastic, i.e. we always pay for the optimal number of units (5 Gb/s), we can use $1.58 per TB as a proxy
Caveat: API Volume ~ Billing Volume
<> Real Volume Used
73
Weekly Email to largest BQ users
74
Usage statisticsCaveat:
API Volume ~ Billing Volume <> Real Volume Used
75
Usage statisticsCaveat:
API Volume ~ Billing Volume <> Real Volume Used
76
BigQuery outage?Caveat:
API Volume ~ Billing Volume <> Real Volume Used
77
What’s next with Moto Monitor ?
Alert & Exceptions Report
78
ExamplesData Issue (illustrative)
time (day)
# Active Users
Normal Band
Number of Active Users using their camera in US
Possible Root Causes
● Some files don’t get loaded properly in BigQuery, creating gaps in user count.
● The instrumentation changed on the device
● Customer behavior
Business Issue (real life)
# System
Restarts
Number of System Restarts in Brazil in Oct ‘14
Real life Root Cause
A buggy Android app (Color Notes) doesn’t handle the timezone change in Brazil properly, crashing the devices.
79
Approach1. Define a multi-dimensional cubes with real data. For example: Day, Product, Market, # Users taking a picture
2. Each cell becomes then a time series
3. Clean the data (remove seasonality, weekday cycle and any other know perturbation)
* Note: (Bayesian likelihood with knowledge base)
4. Fit trend and establish volatility band (2 std deviations)
5. Measure variance versus prediction for each cell (e.g. market/product/metric) and trigger an exception if outside band
6. Collect all exceptions into a matrix and apply fuzzy logic* to propose potential root causes (prescriptive analytics)
mar
kets
BR
productsmetr
ics
80
Exceptions Report POC
https://moto-monitor.appspot.com/fcst/matrixhttps://moto-monitor.appspot.com/fcst/matrix?d=2015-01-06
Note: During POC, access is strictly restricted
81
Exceptions Report POCReal life example with Moto E in Spainhttps://moto-monitor.appspot.com/fcst/trend?market=Spain&product=Moto%20E
Trend: where we should have been
Actuals: where we are
Story
Investigating with the Spain GTM team, this large increase is seasonal and due to “The Three Kings Day” (Día De Los Reyes Magos) where sales are usually larger than pre-Holidays.
82
Demo Exception ReportDaily Email on Exceptions/Anomalies Online Report & Drilldown Immediate Learning & Findings
Juno Storm impact on Daily ActivationsDaily Activation WoW on Jan 27th 2014
83
● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q&A
Agenda
84
How to democratize daily/weekly email
report with an App Engine solution?
Existing Situation
- Numerous teams use Spreadsheet to send weekly/daily email- Enable very agile development of email body- Ease of connection to Big Query- Can’t enable easily customization and open-rate tracking at a user level- Can’t leverage advance statistics (R in GCE)
Self-Service Email System
85
Self-Service Email SystemEmail Widget
HTML Header template
HTML Body template
HTML Footer template
SQL to produce data for Body
gDrive Objects (image, attachment)
Underlying Widgets
App
Eng
ine
86
● Industry overview (mobile)
● Demos
o Daily Activations Report (BQ + Spreadsheet)
o Big Feed (Big Query ETL)
o Moto Insights (BQ + Charts + gDrive + GA)
o Moto Monitor (BQ usage & cost optimization)
o Self-Serve Email (Next gen of self-service)
● Q & A
Agenda
?87