amazon redshift, customer acquisition cost & advertising roi presented with aggregate knowledge
DESCRIPTION
In today's world, consumer habits change fast and marketing decisions need to be made within seconds, not days. Delivering engaging advertising experiences requires real time, high performing architectures that provide digital advertisers the ability to measure and improve the performance of their campaigns and tie them more closely to corporate goals. The insights gleaned from the massive amounts of data collected can then be used to dynamically adjust media spend and creative execution for optimal performance. The AWS Cloud enables you to deliver marketing content and advertisements with the levels of availability, performance, and personalization that your customers expect. Plus, AWS lowers your costs. Join us to learn about how big data and low latency / high performing architectures are changing the game for digital advertising.TRANSCRIPT
October 3, 2013
Amazon Redshift, Customer
Acquisition Cost & Advertising ROI
Rahul Pathak, AWS (@rahulpathak)
Timon Karnezos, Aggregate Knowledge
AWS Database Services
Fully managed SQL database service for OLTP
workloads Amazon
RDS
Amazon
DynamoDB
Fully managed NoSQL service for massively scalable,
high throughput, low latency workloads
Amazon
Redshift
Fully managed fast and powerful, petabyte-scale
data warehouse service
Amazon
ElastiCache
Fully managed Memcached-compliant in memory
caching service
Data Warehousing the AWS way
Deploy Easy to provision
Pay as you go, no up front costs
Fast, cheap, easy to use
SQL
• Fastest growing service in AWS history
• 1,000+ customers; adding over a hundred a week
• Over 20 partners; adding one a week
• SOC1, SOC2, PCI certification obtained with more on the way
• Available in US East (N. Virginia), US West (Oregon), EU (Ireland),
Asia Pacific (Tokyo), with more regions coming soon
Progress since launch on Feb 14, 2013
• LZO/LZOP compression support
• JSON, Regex, Cursors
• UTF-8 4 byte and invalid character substitution
• CRC32, SHA1, MD5
• Statement and workload queue timeouts
• Time zone support
• JDBC Fetch size
• UNLOAD encrypted files
New features since launch on Feb 14, 2013
Full list: http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html
Common Customer Use Cases
• Reduce costs by extending DW
rather than adding HW
• Migrate completely from existing
DW systems
• Respond faster to business;
provision in minutes
• Improve performance by an
order of magnitude
• Make more data available for
analysis
• Access business data via
standard reporting tools
• Add analytic functionality to
applications
• Scale DW capacity as
demand grows
• Reduce HW & SW costs by an
order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
• Customer acquisition
– Ad spend
– Traffic sources
• Customer behavior
– Clickstream
– Referrals, sharing
– Actions taken
• Lifetime value
– Conversions
– Churn rate
Digital marketing and advertising use cases
Amazon S3
Amazon EMR
Amazon
Redshift
JDBC/ODBC DynamoDB
Amazon RDS
Amazon Redshift Customers
“[Amazon Redshift] took an industry famous for its opaque pricing, high TCO and unreliable
results and completely turned it on its head.”
“Redshift is twenty times faster than Hive…The cost saving is even more impressive…Our
analysts like [it] so much they don’t want to go back.”
“Queries that used to take hours came back in seconds. Our analysts are orders of
magnitude more productive.”
“We saw 50% reduction in costs with 2x improvement in query times.”
“We use Redshift anytime we need fast, interactive analysis.”
Amazon Redshift Customers “When we want to answer a question with Redshift, we just write a SQL query and get an
answer within a few minutes – if not seconds.”
“[We] run queries up to 50 times faster than our current OLAP solution.”
“Customers can get consistent, accurate, and useful data fast - in weeks not months or years.”
“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an
alternative to Hadoop.”
“Team played with Redshift today and concluded it is ****** awesome. Un-indexed complex
queries returning in < 10s.”
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon S3
– Parallel load from Amazon DynamoDB
• Single node version available
Amazon Redshift architecture
10 GigE (HPC)
Ingestion Backup Restore
JDBC/ODBC
• Optimized for I/O intensive workloads
• High disk density
• Runs in HPC - fast network
• HS1.8XL available on Amazon EC2
Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate
HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage
Amazon Redshift parallelizes and distributes everything
• Query
• Load
• Backup/Restore
• Resize
• Query
• Load
• Backup/Restore
• Resize
Amazon Redshift parallelizes and distributes everything
• Load in parallel from Amazon S3 or
Amazon DynamoDB
• Columnar storage, automatic
compression
• Data automatically distributed and
sorted according to DDL
• Scales linearly with number of nodes
• Query
• Load
• Backup/Restore
• Resize
Amazon Redshift parallelizes and distributes everything
• Backups to Amazon S3 are automatic, continuous and incremental
• Configurable system snapshot retention period
• Take user snapshots on-demand
• Streaming restores enable you to resume querying faster
• Query
• Load
• Backup/Restore
• Resize
Amazon Redshift parallelizes and distributes everything
• Resize while remaining online
• Provision a new cluster in the
background
• Copy data in parallel from
node to node
• Only charged for source cluster
• Query
• Load
• Backup/Restore
• Resize • Automatic SQL endpoint
switchover via DNS
• Decommission the source cluster
• Simple operation via AWS Console
or API
Amazon Redshift parallelizes and distributes everything
Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Amazon Redshift lets you start small and grow big
Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
Price Per Hour for
HS1.XL Single Node Effective Hourly
Price per TB Effective Annual
Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Amazon Redshift is priced to let you analyze all your data
Simple Pricing
Number of Nodes x Cost per Hour
No charge for Leader Node
No upfront costs
Pay as you go
• Provision in minutes
• Monitor query performance
• Point and click resize
• Built in security
• Automatic backups
Amazon Redshift is easy to use
Amazon Redshift continuously backs up your data and recovers from failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple
copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental
– Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and nodes
• Able to restore snapshots to any Availability Zone within a region
• SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated
– All blocks on disks and in Amazon S3
encrypted
• No direct access to compute nodes
• Amazon VPC support
Amazon Redshift has security built-in
10 GigE (HPC)
Ingestion Backup Restore
Customer VPC
Internal Security Group
JDBC/ODBC
MTA and Redshift
Understanding the Cost of Customer Acquisition and Marketing ROI
Timon Karnezos | @timonk
The Future of Digital Advertising with Cloud Computing
San Francisco, CA – 10/03/2013
23
MTA
vs.
LTA
Framing MTA
24
Browsing the Web – Monday
Tracking
Impression
(Site A)
Time
Monday
25
Browsing the Web – Tuesday
Tracking
Impression
(Site A)
Time
Tuesday
26
Search – Wednesday
Tracking
Impression
(Search)
Time
Wednesday
27
Convert – Wednesday
Time Conversion
Wednesday
28
View Chains by Site
(Site A)
(Search)
Time
Time
Conversion
Wednesday
29
Properties of the Conversion Chains
(Site A)
(Search)
Time
Time
Position: 3
Day: 2
Position: 2
Day: 1
Position: 1
Day: 0 (same as conv.)
Conversion
Conversion: Wednesday
Chain Length: 3 (touches)
Chain Age: 2 (days)
30
Last-Touch Attribution (LTA)
Site A:
Search: 0 / 1 = 0.0 Conversions
1 / 1 = 1.0 Conversions
(Site A)
(Search)
Time
Time
Position: 1
Day: 0 (same as conv.)
Conversion
31
Multi-Touch Attribution – 2 Positions (Touches)
Site A:
Search: 1 / 2 = 0.5 Conversions
1 / 2 = 0.5 Conversions
(Site A)
(Search)
Time
Time
Position: 2
Day: 1
Position: 1
Day: 0 (same as conv.)
Conversion
32
Multi-Touch Attribution – 3 Positions (Touches)
Site A:
Search: 2 / 3 = 0.67 Conversions
1 / 3 = 0.33 Conversions
(Site A)
(Search)
Time
Time
Position: 3
Day: 2
Position: 2
Day: 1
Position: 1
Day: 0 (same as conv.)
Conversion
33
Richer model space…
More nuanced…
Adaptable to client’s business!
Why do we sell MTA?
34
1. Build user sessions (chains) by site
2. Window over report period
3. Assign credit to sites
4. Aggregate by day
Why is MTA hard?
Daily scale
~109 impressions, cookies
~107 conversions
~104 sites
x 90 per report
35
Redshift is MPP
Why is MTA easier with RS?
Fast
columnar scans
~109 rows in ~10s
Sorting
as main index
100k/s/$ load
Even
work distribution
Cookie ID shards well
36
Redshift is MPP
Why is MTA easier with RS?
Fast
columnar scans
~109 rows in ~10s
37
Redshift is MPP
Why is MTA easier with RS?
Sorting
as main index
100k/s/$ load
38
Redshift is MPP
Why is MTA easier with RS?
Even
work distribution
Cookie ID shards well
39
Redshift is SQL
Why is MTA easier with RS?
Logical
decomposition
CTEs cut complexity
Powerful
aggregates
COUNT DISTINCT works
Window
functions
Market basket is easy
40
Redshift is SQL
Why is MTA easier with RS?
Logical
decomposition
CTEs cut complexity
41
Example: CTE
WITH chains AS (
SELECT campaign_id, site_id,
DENSE_RANK() OVER
(PARTITION BY user_id, advertiser_id
ORDER BY record_date DESC) AS position
FROM impressions
WHERE record_date >= 'YYYY-MM-DD' AND
record_date < 'YYYY-MM-DD'
)
SELECT campaign_id, site_id, position, COUNT(*) as ct
FROM chains
GROUP BY 1,2,3;
42
Redshift is SQL
Why is MTA easier with RS?
Powerful
aggregates
COUNT DISTINCT works
43
Redshift is SQL
Why is MTA easier with RS?
Window
functions
Market basket is easy
44
Example: Window Function
WITH chains AS (
SELECT campaign_id, site_id,
DENSE_RANK() OVER
(PARTITION BY user_id, advertiser_id
ORDER BY record_date DESC) AS position
FROM impressions
WHERE record_date >= 'YYYY-MM-DD' AND
record_date < 'YYYY-MM-DD'
)
SELECT campaign_id, site_id, position, COUNT(*) as ct
FROM chains
GROUP BY 1,2,3;
45
Redshift is EASY
Bigger Picture: Why is X easier with RS?
Credit card
= eval
No reps, no PoC
Operations
made simple
Dashboards rock
Integrations
are outstanding
S3 means no pain
46
Redshift changed
the game.
http://bit.ly/rs_ak