aws game analytics - gdc 2014
DESCRIPTION
Use AWS to learn how much players love your game by analyzing in-game metrics to measure engagement and retention. Start simple by uploading data to S3 and analyzing it with Redshift. Add additional game data sources and dive deeper with Cohort analysis. Finally I cover real-time analytics with Kinesis and Spark.TRANSCRIPT
AWS Gaming Solutions | GDC 2014
Game Analytics with AWS Or, How to learn what your players love so they will love your game Nate Wiger @nateware | Principal Gaming Solutions Architect
AWS Gaming Solutions | GDC 2014
Mobile Game Landscape
• Free To Play • In-App Purchases • Long-Tail • Cross-Platform • Go Global • User Retention = Revenue
AWS Gaming Solutions | GDC 2014
Projected Mobile App Revenue
0 10000 20000 30000 40000 50000 60000 70000 80000 90000
2011 2012 2013 2014 2015 2016 2017
Ads IAP Paid
Source: Gartner
AWS Gaming Solutions | GDC 2014
Winning at Free to Play
• Phase 1: Collect Data • Phase 2: Analyze • Phase 3: Profit
AWS Gaming Solutions | GDC 2014
Analyze What?
Emotions • Enjoying game • Engaged • Like/dislike new content • Stuck on a level • Bored • Abandonment
Behaviors • Hours played day/week • Number of sessions/day • Level progression • Friend invites/referrals • Response to mobile push • Money spent/week
AWS Gaming Solutions | GDC 2014
Example: Level Progression (One Metric)
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
# of Tries
AWS Gaming Solutions | GDC 2014
Example: Level Progression (Two Metrics)
0 10 20 30 40 50 60
0
2
4
6
8
10
L1 L2 L3 L4 L5 L6 L7 L8 L9 L10
Tries / Level
% Highest Level # of Tries
AWS Gaming Solutions | GDC 2014
Key Takeaways
• Multiple data sources • Correlate variables • Deltas vs absolutes • Settle on terminology (game vs level) • Time matters
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Events & Metrics
• Event = Moment in Time – Login/quit – Game start/end – Level up – In-app purchase
• Metrics = What to Measure – KISS – Numbers – Booleans – Strings (Enums)
• Always Include (ALWAYS) – User – Action – Session (context-dependent) – Timestamp in ISO8601
2014-‐03-‐16T16:28:26
AWS Gaming Solutions | GDC 2014
Off The Shelf Analytics
• Easy To Integrate • Pre-Baked Reports • Rate Limits • Retention Windows • Data Lock-In
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest Store Process Analyze
AWS Gaming Solutions | GDC 2014
Ok, A Real Business Plan
Ingest • HTTP PUT • Kafka • Kinesis • Scribe
Store • S3 • DynamoDB • HDFS • Redshift
Process • EMR (Hadoop) • Spark • Storm
Analyze • Tableau • Pentaho • Jaspersoft
AWS Gaming Solutions | GDC 2014
• Write Events File on Device • Periodically Upload to S3 • Process into Redshift • Point GUI Tool to Redshift
Start Simple
2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart
Profit!
AWS Gaming Solutions | GDC 2014
Redshift at a Glance
10 GigE (HPC)
Ingestion Backup Restore
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
Amazon S3/DynamoDB
JDBC/ODBC
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
128GB RAM
16TB disk
16 cores Compute Node
Leader Node
• Leader Node – SQL endpoint – Stores metadata – Coordinates query execution
• Compute Nodes – Columnar table storage – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB
• Single node version available
AWS Gaming Solutions | GDC 2014
Tableau + Redshift
AWS Gaming Solutions | GDC 2014
Plumbing
① Create S3 bucket ("mygame-analytics-events") ② Request a security token for your mobile app:
http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html
③ Upload data from your users' devices ④ Run a scheduled copy to Redshift ⑤ Setup Tableau to access Redshift ⑥ Go to the Beach
AWS Gaming Solutions | GDC 2014
Loading Redshift from S3
copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=',';
Scheduled Redshift Load using Data Pipeline: http://aws.amazon.com/articles/1143507459230804
AWS Gaming Solutions | GDC 2014
• Also Collect Server Logs • Periodically Upload to S3 • Stuff into Redshift • External Analytics Data Too
More Data Sources
EC2
External Analytics
AWS Gaming Solutions | GDC 2014
Logrotate to S3
/var/log/apache2/*.log { sharedscripts postrotate sudo /usr/sbin/apache2ctl graceful s3cmd sync /var/log/*.gz s3://mygame-‐logs/ endscript }
Blog Entry on Log Rotation: http://www.dowdandassociates.com/blog/content/howto-rotate-logs-to-s3/ And/or, Use ELB Access Logs: http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/access-log-collection.html
AWS Gaming Solutions | GDC 2014
• Different File Formats • Device vs Apache vs CDN • Cleanup with EMR Job • Output to Clean Bucket • Load into Redshift
Dealing With Messy Data
EC2
AWS Gaming Solutions | GDC 2014
Redshift vs Elastic MapReduce
Redshift • Columnar DB • Familiar SQL • Structured Data • Batch Load • Faster to Query • Long-term Storage
Elastic MapReduce • Hadoop • Hive/Pig are SQL-like • Unstructured Data • Streaming Loop • Scales > PB's • Transient
AWS Gaming Solutions | GDC 2014
• Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
• Integrate Game DB • Load Directly into Redshift • Redshift does Intelligent Merge • Tracks Hash Keys, Columns • Or Stream into EMR
Direct From DynamoDB
EC2
AWS Gaming Solutions | GDC 2014
Loading Redshift from DynamoDB
copy games from 'dynamodb://games' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>';
copy events from 's3://mygame-‐analytics-‐events' credentials 'aws_access_key_id=<access-‐key-‐id>; aws_secret_access_key=<secret-‐access-‐key>' delimiter=',';
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Funnel Cake
AWS Gaming Solutions | GDC 2014
Back To Basics
2014-‐01-‐24,nateware,e4df,login 2014-‐01-‐24,nateware,e4df,gamestart 2014-‐01-‐24,nateware,e4df,gameend 2014-‐01-‐25,nateware,a88c,login 2014-‐01-‐25,nateware,a88c,friendlist 2014-‐01-‐25,nateware,a88c,gamestart
AWS Gaming Solutions | GDC 2014
Measure Retention: Repeated Plays
create view events_by_user_by_month as select user_id, date_trunc('month', event_date) as month_active, count(*) as total_events from events group by user_id, month_active;
AWS Gaming Solutions | GDC 2014
First-Pass Retention – Too Noisy
0 5
10 15 20 25 30 35 40
# Play Sessions / Month
nateware Lazyd0g AK187 3strikes
AWS Gaming Solutions | GDC 2014
Cohorts & Cambria
• Enables calculating relative metrics • Group users by a common attribute
– Month game installed – Demographics
• Run analysis by cohort – Join with metrics
• Use Redshift as it's SQL – Example of where SQL is a good fit
AWS Gaming Solutions | GDC 2014
Creating Cohorts with Redshift
create view cohort_by_first_event_date as select user_id, date_trunc('month', min(event_date)) as first_month from events group by user_id;
http://snowplowanalytics.com/analytics/customer-analytics/cohort-analysis.html
AWS Gaming Solutions | GDC 2014
Retention by Cohort – Join Events with Cohort
0
5
10
15
20
25
Week 1 Week 2 Week 3 Week 5 Week 6 Week 7
# Sessions / Week
2013-11 2013-12 2014-01 2014-02 2014-03 2014-04
AWS Gaming Solutions | GDC 2014
Moar Cohorts
• Define multiple cohorts – By activity, time, demographics – As many as you like
• Change cohort depending on analysis • Join same metrics with different cohorts
– Retention by date – Retention by demographic – Retention by average plays/month quartile
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Example Event Stream
2014-‐03-‐17T09:52:08-‐07:00,nateware,e4b5,login 2014-‐03-‐17T09:52:54-‐07:00,nateware,e4b5,gamestart 2014-‐03-‐17T09:53:15-‐07:00,nateware,e4b5,levelup 2014-‐03-‐17T09:54:06-‐07:00,nateware,e4b5,gameend 2014-‐03-‐17T09:54:23-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:55:14-‐07:00,nateware,30a4,gameend 2014-‐03-‐17T09:55:41-‐07:00,nateware,30a4,gamestart 2014-‐03-‐17T09:57:12-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:58:50-‐07:00,nateware,6ebd,levelup 2014-‐03-‐17T09:59:52-‐07:00,nateware,6ebd,gameend
AWS Gaming Solutions | GDC 2014
Cohorts by Type of Activity
create view cohort_by_first_play_date as select user_id, date_trunc('month', min(event_date)) as first_month from events where action = 'gamestart' group by user_id;
AWS Gaming Solutions | GDC 2014
AWS Gaming Solutions | GDC 2014
Post-Match Heatmaps
AWS Gaming Solutions | GDC 2014
Real-Time Analytics
Batch • What game modes do
people like best? • How many people have
downloaded DLC pack 2? • Where do most people
die on map 4? • How many daily players
are there on average?
Real-Time • What game modes are
people playing now? • Are more or less people
downloading DLC today? • Are people dying in the
same places? Different? • How many people are
playing today? Variance?
AWS Gaming Solutions | GDC 2014
Why Real-Time Analytics?
30x in 24 hours What if you ran a promo?
AWS Gaming Solutions | GDC 2014
Real-Time Tools
Spark • High-Performance
Hadoop Alternative • Berkeley.edu • Compatible with HiveQL • 100x faster than Hadoop • Runs on EMR
Kinesis • Amazon fully-managed
streaming data layer • Similar to Kafka • Streams contain Shards • Each Shard ingests data
up to 1MB/sec, 1000 TPS • Data stored for 24 hours
AWS Gaming Solutions | GDC 2014
• Always Batch Due to S3
Back To Basics [Dubstep Remix]
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift
Need Data Faster!
EC2
AWS Gaming Solutions | GDC 2014
• Stream Data With Kinesis • Multiple Writers and Readers • Still Output to Redshift • Stream to Spark on EMR • Storm via Kinesis Spout • Custom EC2 Workers
Lots of Ins and Outs
EC2
EC2
AWS Gaming Solutions | GDC 2014
Data Sources
App.4
[Machine Learning]
AW
S En
dpoint
App.1
[Aggregate & De-‐Duplicate]
Data Sources
Data Sources
Data Sources
App.2
[Metric Extrac=on]
S3
DynamoDB
Redshift
App.3 [Sliding Window Analysis]
Data Sources
Availability Zone
Shard 1 Shard 2 Shard N
Availability Zone
Availability Zone
Introducing Amazon Kinesis Service for Real-Time Big Data Ingestion
AWS Gaming Solutions | GDC 2014
Putting Data into Kinesis
• Producers use PUT to send data to a Stream
• PutRecord {Data, PartitionKey, StreamName}
• Partition Key distributes PUTs across Shards
• Unique Sequence # returned on PUT call
• Documentation:
http://docs.aws.amazon.com/kinesis/latest/dev/
introduction.html
Producer
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis
AWS Gaming Solutions | GDC 2014
Writing to a Kinesis Stream POST / HTTP/1.1 Host: kinesis.<region>.<domain> x-‐amz-‐Date: <Date> Authorization: AWS4-‐HMAC-‐SHA256 Credential=<Credential>, SignedHeaders=content-‐type;date;host;user-‐agent;x-‐amz-‐date;x-‐amz-‐target;x-‐amzn-‐requestid, Signature=<Signature> User-‐Agent: <UserAgentString> Content-‐Type: application/x-‐amz-‐json-‐1.1 Content-‐Length: <PayloadSizeBytes> Connection: Keep-‐Alive X-‐Amz-‐Target: Kinesis_20131202.PutRecord { "StreamName": "exampleStreamName", "Data": "XzxkYXRhPl8x", "PartitionKey": "partitionKey" }
AWS Gaming Solutions | GDC 2014
Kinesis + Spark
http://aws.amazon.com/articles/4926593393724923
AWS Gaming Solutions | GDC 2014
Death in Real-Time
PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"274,591,48"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":13,"victim":27,"coord":"101,206,35"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":38,"victim":39,"coord":"165,609,17"} PUT "kills" {"game_id":"e4b5","map":"Boston","killer":6,"victim":29,"coord":"120,422,26"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":34,"victim":18,"coord":"163,677,18"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":20,"victim":37,"coord":"71,473,20"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":21,"victim":19,"coord":"332,381,17"} PUT "kills" {"game_id":"30a4","map":"Los Angeles","killer":0,"victim":10,"coord":"14,108,25"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":32,"victim":18,"coord":"13,685,32"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":7,"victim":14,"coord":"16,233,16"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":27,"victim":19,"coord":"16,498,29"} PUT "kills" {"game_id":"6ebd","map":"Seattle","killer":1,"victim":38,"coord":"138,732,21"}
AWS Gaming Solutions | GDC 2014
Real-Time Heatmaps
AWS Gaming Solutions | GDC 2014
But A Bow On It
• Collect data from the start • Store it even if you can't process it (yet) • Start simple – S3 + Redshift • Add data sources – process with EMR • Real-time – Kinesis + Spark • Tons of untapped potential for gaming
AWS Gaming Solutions | GDC 2014
Fallback Plan
Cheers – Nate Wiger @nateware