amazon redshift, customer acquisition cost & advertising roi presented with aggregate knowledge

October 3, 2013

Amazon Redshift, Customer

Acquisition Cost & Advertising ROI

Rahul Pathak, AWS (@rahulpathak)

Timon Karnezos, Aggregate Knowledge

AWS Database Services

Fully managed SQL database service for OLTP

workloads Amazon

RDS

Amazon

DynamoDB

Fully managed NoSQL service for massively scalable,

high throughput, low latency workloads

Amazon

Redshift

Fully managed fast and powerful, petabyte-scale

data warehouse service

Amazon

ElastiCache

Fully managed Memcached-compliant in memory

caching service

Data Warehousing the AWS way

Deploy Easy to provision

Pay as you go, no up front costs

Fast, cheap, easy to use

SQL

• Fastest growing service in AWS history

• 1,000+ customers; adding over a hundred a week

• Over 20 partners; adding one a week

• SOC1, SOC2, PCI certification obtained with more on the way

• Available in US East (N. Virginia), US West (Oregon), EU (Ireland),

Asia Pacific (Tokyo), with more regions coming soon

Progress since launch on Feb 14, 2013

• LZO/LZOP compression support

• JSON, Regex, Cursors

• UTF-8 4 byte and invalid character substitution

• CRC32, SHA1, MD5

• Statement and workload queue timeouts

• Time zone support

• JDBC Fetch size

• UNLOAD encrypted files

New features since launch on Feb 14, 2013

Full list: http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html

http://docs.aws.amazon.com/redshift/latest/dg/doc-history.html



Common Customer Use Cases

• Reduce costs by extending DW

rather than adding HW

• Migrate completely from existing

DW systems

• Respond faster to business;

provision in minutes

• Improve performance by an

order of magnitude

• Make more data available for

analysis

• Access business data via

standard reporting tools

• Add analytic functionality to

applications

• Scale DW capacity as

demand grows

• Reduce HW & SW costs by an

order of magnitude

Traditional Enterprise DW Companies with Big Data SaaS Companies

• Customer acquisition

– Ad spend

– Traffic sources

• Customer behavior

– Clickstream

– Referrals, sharing

– Actions taken

• Lifetime value

– Conversions

– Churn rate

Digital marketing and advertising use cases

Amazon S3

Amazon EMR

Amazon

Redshift

JDBC/ODBC DynamoDB

Amazon RDS

Amazon Redshift Customers

“[Amazon Redshift] took an industry famous for its opaque pricing, high TCO and unreliable

results and completely turned it on its head.”

“Redshift is twenty times faster than Hive…The cost saving is even more impressive…Our

analysts like [it] so much they don’t want to go back.”

“Queries that used to take hours came back in seconds. Our analysts are orders of

magnitude more productive.”

“We saw 50% reduction in costs with 2x improvement in query times.”

“We use Redshift anytime we need fast, interactive analysis.”

Amazon Redshift Customers “When we want to answer a question with Redshift, we just write a SQL query and get an

answer within a few minutes – if not seconds.”

“[We] run queries up to 50 times faster than our current OLAP solution.”

“Customers can get consistent, accurate, and useful data fast - in weeks not months or years.”

“Did I mention it's ridiculously fast? We'll be using it immediately to provide our analysts an

alternative to Hadoop.”

“Team played with Redshift today and concluded it is ****** awesome. Un-indexed complex

queries returning in < 10s.”

• Leader Node

– SQL endpoint

– Stores metadata

– Coordinates query execution

• Compute Nodes

– Local, columnar storage

– Execute queries in parallel

– Load, backup, restore via Amazon S3

– Parallel load from Amazon DynamoDB

• Single node version available

Amazon Redshift architecture

10 GigE (HPC)

Ingestion Backup Restore

JDBC/ODBC

• Optimized for I/O intensive workloads

• High disk density

• Runs in HPC - fast network

• HS1.8XL available on Amazon EC2

Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate

HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage

Amazon Redshift parallelizes and distributes everything

• Query

• Load

• Backup/Restore

• Resize

• Query

• Load

• Backup/Restore

• Resize


• Load in parallel from Amazon S3 or

Amazon DynamoDB

• Columnar storage, automatic

compression

• Data automatically distributed and

sorted according to DDL

• Scales linearly with number of nodes

• Query

• Load

• Backup/Restore

• Resize


• Backups to Amazon S3 are automatic, continuous and incremental

• Configurable system snapshot retention period

• Take user snapshots on-demand

• Streaming restores enable you to resume querying faster

• Query

• Load

• Backup/Restore

• Resize


• Resize while remaining online

• Provision a new cluster in the

background

• Copy data in parallel from

node to node

• Only charged for source cluster

• Query

• Load

• Backup/Restore

• Resize • Automatic SQL endpoint

switchover via DNS

• Decommission the source cluster

• Simple operation via AWS Console

or API


Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Amazon Redshift lets you start small and grow big

Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Note: Nodes not to scale

Price Per Hour for

HS1.XL Single Node Effective Hourly

Price per TB Effective Annual

Price per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year Reservation $ 0.500 $ 0.250 $ 2,190

3 Year Reservation $ 0.228 $ 0.114 $ 999

Amazon Redshift is priced to let you analyze all your data

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

• Provision in minutes

• Monitor query performance

• Point and click resize

• Built in security

• Automatic backups

Amazon Redshift is easy to use

Amazon Redshift continuously backs up your data and recovers from failures

• Replication within the cluster and backup to Amazon S3 to maintain multiple

copies of data at all times

• Backups to Amazon S3 are continuous, automatic, and incremental

– Designed for eleven nines of durability

• Continuous monitoring and automated recovery from failures of drives and nodes

• Able to restore snapshots to any Availability Zone within a region

• SSL to secure data in transit

• Encryption to secure data at rest

– AES-256; hardware accelerated

– All blocks on disks and in Amazon S3

encrypted

• No direct access to compute nodes

• Amazon VPC support

Amazon Redshift has security built-in

10 GigE (HPC)

Ingestion Backup Restore

Customer VPC

Internal Security Group

JDBC/ODBC

MTA and Redshift

Understanding the Cost of Customer Acquisition and Marketing ROI

Timon Karnezos | @timonk

The Future of Digital Advertising with Cloud Computing

San Francisco, CA – 10/03/2013

23

MTA

vs.

LTA

Framing MTA

24

Browsing the Web – Monday

Tracking

Impression

(Site A)

Time

Monday

25

Browsing the Web – Tuesday

Tracking

Impression

(Site A)

Time

Tuesday

26

Search – Wednesday

Tracking

Impression

(Search)

Time

Wednesday

27

Convert – Wednesday

Time Conversion

Wednesday

28

View Chains by Site

(Site A)

(Search)

Time

Time

Conversion

Wednesday

29

Properties of the Conversion Chains

(Site A)

(Search)

Time

Time

Position: 3

Day: 2

Position: 2

Day: 1

Position: 1

Day: 0 (same as conv.)

Conversion

Conversion: Wednesday

Chain Length: 3 (touches)

Chain Age: 2 (days)

30

Last-Touch Attribution (LTA)

Site A:

Search: 0 / 1 = 0.0 Conversions

1 / 1 = 1.0 Conversions

(Site A)

(Search)

Time

Time

Position: 1


Conversion

31

Multi-Touch Attribution – 2 Positions (Touches)

Site A:



(Site A)

(Search)

Time

Time

Position: 2

Day: 1

Position: 1


Conversion

32

Multi-Touch Attribution – 3 Positions (Touches)

Site A:



(Site A)

(Search)

Time

Time

Position: 3

Day: 2

Position: 2

Day: 1

Position: 1


Conversion

33

Richer model space…

More nuanced…

Adaptable to client’s business!

Why do we sell MTA?

34

1. Build user sessions (chains) by site

2. Window over report period

3. Assign credit to sites

4. Aggregate by day

Why is MTA hard?

Daily scale

~109 impressions, cookies

~107 conversions

~104 sites

x 90 per report

35

Redshift is MPP

Why is MTA easier with RS?

Fast

columnar scans

~109 rows in ~10s

Sorting

as main index

100k/s/$ load

Even

work distribution

Cookie ID shards well

36

Redshift is MPP


Fast

columnar scans

~109 rows in ~10s

37

Redshift is MPP


Sorting

as main index

100k/s/$ load

38

Redshift is MPP


Even

work distribution

Cookie ID shards well

39

Redshift is SQL


Logical

decomposition

CTEs cut complexity

Powerful

aggregates

COUNT DISTINCT works

Window

functions

Market basket is easy

40

Redshift is SQL


Logical

decomposition

CTEs cut complexity

41

Example: CTE

WITH chains AS (

SELECT campaign_id, site_id,

DENSE_RANK() OVER

(PARTITION BY user_id, advertiser_id

ORDER BY record_date DESC) AS position

FROM impressions

WHERE record_date >= 'YYYY-MM-DD' AND

record_date < 'YYYY-MM-DD'

)

SELECT campaign_id, site_id, position, COUNT(*) as ct

FROM chains

GROUP BY 1,2,3;

42

Redshift is SQL


Powerful

aggregates

COUNT DISTINCT works

43

Redshift is SQL


Window

functions

Market basket is easy

44

Example: Window Function

WITH chains AS (

SELECT campaign_id, site_id,

DENSE_RANK() OVER

(PARTITION BY user_id, advertiser_id

ORDER BY record_date DESC) AS position

FROM impressions

WHERE record_date >= 'YYYY-MM-DD' AND

record_date < 'YYYY-MM-DD'

)

SELECT campaign_id, site_id, position, COUNT(*) as ct

FROM chains

GROUP BY 1,2,3;

45

Redshift is EASY

Bigger Picture: Why is X easier with RS?

Credit card

= eval

No reps, no PoC

Operations

made simple

Dashboards rock

Integrations

are outstanding

S3 means no pain

46

Redshift changed

the game.

http://bit.ly/rs_ak

amazon redshift, customer acquisition cost & advertising roi presented with aggregate knowledge

Technology

amazon ec2 amazon redshift

tb amazon redshift

api amazon redshift

amazon s3 parallel load

query load backuprestore

sql query

data available

tb cluster