(bdt306) mission-critical stream processing with amazon emr and amazon kinesis | aws re:invent 2014
DESCRIPTION
Organizations processing mission critical high-volume data must be able to achieve high levels of throughput and durability in data processing workflows. In this session, we will learn how DataXu is using Amazon Kinesis, Amazon S3, and Amazon EMR for its patented approach to programmatic marketing. Every second, the DataXu Marketing Cloud processes over 1 Million ad requests and makes more than 40 billion decisions to select and bid on ad impressions that are most likely to convert. In addition to addressing the scalability and availability of the platform, we will explore Amazon Kinesis producer and consumer applications that support high levels of scalability and durability in mission-critical record processing.TRANSCRIPT
![Page 1: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/1.jpg)
![Page 2: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/2.jpg)
Amazon
Redshift
Amazon EMR
Amazon
EC2
Analyze
Amazon
Glacier
Amazon S3
Amazon
DynamoDB
Store
AWS Import/Export
AWS Direct Connect
Collect
Amazon Kinesis
![Page 3: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/3.jpg)
Big data
•Hourly server logs: were your systems misbehaving 1hr ago
•Weekly / Monthly Bill:
what you spent this billing cycle
•Daily customer-preferences report from your web
site’s click stream:
what deal or ad to try next time
•Daily fraud reports:
was there fraud yesterday
what went wrong now
:
prevent overspending now
what to offer the current customer now
block fraudulent use now
![Page 4: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/4.jpg)
HTTP Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client
Library
+
Connector Library
Apache
Storm
Amazon Elastic
MapReduce
Sending Reading
![Page 5: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/5.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 6: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/6.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 7: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/7.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 8: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/8.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 9: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/9.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 10: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/10.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 11: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/11.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 12: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/12.jpg)
![Page 13: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/13.jpg)
DataXu
![Page 14: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/14.jpg)
![Page 15: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/15.jpg)
DataXu Records
tx_id: "AFTfN0uAWZ"
exchange: “APPNEXUS"
request_id:"bb656107-3bf7-47a7-8548-8229563e9dc9”
….
adslot: {slot_id: "2686449714718898993”, uuid: "9d2403f1-fc6c-4d38-b6b1-
839fe4b42455”, price_micro_cpm: 661385, currency: "USD”, seat_id: "12-914”,
campaign_id: "C0513n7”, creative_id: “R53a537”}
…
time_stamp: 1415393474434
serviced_by_host: "cr02.us-east-01”
Confirmation Record
[- 69.120.26.172 - - [08/Nov/2014:21:59:54 -0500] "GET
/rs?id=fc6f2106175a43df8ae4f3b7e6fa8c37&t=marketing&cbust=14155020001916
62 HTTP/1.1" 302 - "http://ads-
by.madadsmedia.com/tags/25628/10217/iframe/728x90.html" "Mozilla/5.0
(compatible; MSIE 10.0; Windows NT 6.2; Trident/6.0)" "wfivefivec=c876d00e-
1831-4eba-b78d-cd99188e951a" "OWW=-"
Fraud Record
![Page 16: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/16.jpg)
Continuous
Processing
CDN
Real-time
Bidding
Retargeting
Platform
Reporting
Qubole
Real Time
AppsKCL Apps
Archiver
Amazon Kinesis Event ReplayAmazon S3
Producers AggregatorContinuous
ProcessingStorage Analytics
Redshift
![Page 17: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/17.jpg)
![Page 18: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/18.jpg)
![Page 19: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/19.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 21: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/21.jpg)
![Page 22: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/22.jpg)
https://github.com/awslabs/kinesis-log4j-appender
![Page 23: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/23.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 24: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/24.jpg)
Amazon Kinesis storage is replicated across
Availability Zones
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or adata warehouse
Inexpensive: $0.028 per million puts
![Page 25: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/25.jpg)
![Page 26: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/26.jpg)
0
200000
400000
600000
800000
1000000
1200000
0 100 200 300 400 500 600 700 800 900 1000 1100
1K
B M
essages/s
ec
Shards
![Page 28: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/28.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 29: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/29.jpg)
![Page 30: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/30.jpg)
Amazon Kinesis
1417182123
Shard-i
235810
Shard
ID
Lock Seq
num
Shard-i
Host A
Host B
Shard ID Last Archived
Shard-i
0
10
18X2
3
5
8
10
14
17
18
21
23
0
310
Host AHost B
{Event 10, …}
1023
14
17
1821
23
![Page 32: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/32.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 33: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/33.jpg)
CDN
Real Time
Bidding
Retargetin
g
Platform
Reporting
Qubole
Real Time
AppsKCL Apps
Archiver
Kinesis Event ReplayS3
![Page 34: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/34.jpg)
Producers AggregatorContinuous
ProcessingStorage Analytics
CDN
Real-time
Bidding
Retargeting
Platform
Reporting
Qubole
Real Time
AppsKCL Apps
Archiver
Amazon Kinesis Event ReplayAmazon S3
Amazon
Redshift
![Page 35: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/35.jpg)
Producers AggregatorContinuous
ProcessingStorage Analytics
CDN
Real-time
Bidding
Retargeting
Platform
Reporting
Qubole
Real Time
AppsKCL Apps
Archiver
Amazon Kinesis Event ReplayAmazon S3
Redshift
![Page 36: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/36.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 37: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/37.jpg)
• Unordered processing
– Randomize partition key to distribute events over
many shards and use multiple workers
• Exact order processing
– Control the partition key to ensure events are
grouped onto the same shard and read by the
same worker.
• Need both? Get global sequence number Producer
Get Global
SequenceUnordered
Stream
Campaign Centric
Stream
Fraud Inspection
Stream
Get Event
Metadata
Id event Stream – partition key
1 confirmation Campaign-centric stream - UUID
2 fraudUnordered Stream
Fraud-inspection stream – sessionid
![Page 38: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/38.jpg)
HTTP
Post
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Apache
Storm
Amazon
Elastic
MapReduce
Sending Reading
Amazon EMR
PlaybackAmazon S3
Archiver
![Page 39: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/39.jpg)
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
![Page 40: (BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesis | AWS re:Invent 2014](https://reader034.vdocuments.site/reader034/viewer/2022052601/559447781a28ab0f0d8b4615/html5/thumbnails/40.jpg)
Please give us your feedback on this session.
Complete session evaluations and earn re:Invent swag.
http://bit.ly/awsevals