amazon kinesis capture, deliver, and process real-time ... kinesis meet up with data… · amazon...
TRANSCRIPT
![Page 1: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/1.jpg)
Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS
![Page 2: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/2.jpg)
What to Expect from this 30 minute Session • Amazon Kinesis Overview • Kinesis data ingestion model • 6 things you need to know about Kinesis Streams
• How to think about partition keys • Sizing the stream • Extended Rete • PutRecords API for high throughput • Kinesis Producer Library • Timestamps in your data • Scaling your stream
• Consuming from Kinesis Streams
![Page 3: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/3.jpg)
Amazon Kinesis Overview
![Page 4: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/4.jpg)
Amazon Kinesis Managed service to capture and expose data streams for processing
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
![Page 5: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/5.jpg)
Amazon Kinesis Streams Foundational service for stream data processing
Real-time Ingest• Highly Scalable• Durable• Elastic • Replay-able Reads
Continuous Processing FX • Elastic• Load-balancing incoming streams• Fault-tolerance, Checkpoint / Replay• Enable multiple processing apps in parallel
Enable data movement into Stores/ Processing Engines
Managed Service
Low end-to-end latency
![Page 6: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/6.jpg)
Amazon Kinesis: Streaming Data Ingestion
![Page 7: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/7.jpg)
• Provisioned entity called a Stream
• Composed of Shards
• Each Shard ingests data up to 1MB/
sec, and up to 1000 TPS and
egresses up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams at any time
by splitting or merging Shards
• Replay data inside of 24Hr. Window
Kinesis Stream Managed Entity To Capture And Store Data
![Page 8: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/8.jpg)
• Producers use a PUT call to store data in a Stream.
Each record <= 1 MB
• PutRecord {Data,StreamName,PartitionKey}
• PutRecords{Records{Data,PartitionKey}, StreamName}
• A Partition Key is supplied by producer and used to
distribute (MD5 hash) the PUTs across (hash key
range) of Shards
• Unique Sequence # is returned to the Producer
upon a successful PUT call
• Unique timestamp affixed to each record
Producer
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis
Putting Data into Kinesis Simple Put interface to store data in Kinesis
![Page 9: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/9.jpg)
Managed Buffer • Care about a reliable, scalable
way to capture data • Defer all other aggregation to
consumer • Generate Random Partition
Keys • Ensure a high cardinality for
Partition Keys with respect to shards, to spray evenly across available shards
Topic #1: Thinking about ingestion Workload determines partition key strategy
Streaming Map-Reduce • Streaming Map-Reduce: leverage
partition keys as a natural way to aggregate data
• For e.g. Partition Keys per billing customer, per DeviceId, per stock symbol
• Design partition keys to scale • Be aware of “hot partition keys or
shards ”
![Page 10: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/10.jpg)
Topic #2: Sizing the Kinesis Stream Provision adequate Shards, You can always change them • For ingress needs – capture all incoming data
• Count likely data producers – log servers, sensor/ things, smartphone app installs
• Individual payload size and (desired) frequency of Puts
• For egress needs – feed all consuming applications • Each Shard can do 2 MB/ sec on egress • Add more Shards for more applications
• Include head-room for ‘catch-up’ with data in stream in the event of application failures
![Page 11: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/11.jpg)
Topic #3: Kinesis PutRecords API High throughput API for efficient writes to Kinesis
• PutRecords {Records {Data,PartitionKey}, StreamName} • Supports 500 records. • A record can be =<1 MB, up to 5 MB for entire API request • Can include records with different partition keys
• Response • PutRecords is not atomic. It can fail partially. • API response includes array of response Records both
successful and unsuccessful records. • An unsuccessful response record includes ErrorCode and
ErrorMessage values • You must write code that examines the PutRecordsResult objects to detect individual record failures and take appropriate action.
![Page 12: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/12.jpg)
Topic #4: Kinesis Producer Library Highly configurable library to write to Kinesis
• Collects records and uses PutRecords for high throughput writes • Can writes to multiple Streams with automatic and configurable retries • Retries in case of errors, with ability to distinguish between retryable
and non-retry-able errors • Tracks record age and enforces maximum buffering times • Aggregates user records to increase payload size and improve
throughput • Integrates seamlessly with the Amazon Kinesis Client Library (KCL) to
de-aggregate batched records • Submits Amazon CloudWatch metrics on your behalf to provide
visibility into producer performance
![Page 13: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/13.jpg)
Topic #5: Including ‘Time’ in your data ApproximateArrivalTimeStamp: Records in Stream at millisecond precision
• Each Amazon Kinesis record includes an approximate arrivaltimestamp at millisecond precision
• Set when Stream successfully receives record • No guarantees about the timestamp accuracy, or that it is
increasing across records in a shard or stream. • ApproximateArrivalTimestamp is exposed in
processRecords API call. • Use it when data producer can’t tell time, want to build
some time-windowed application, want to know age of oldest unread record in the Shard
New!
![Page 14: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/14.jpg)
• Keep track of your metrics • Monitor CloudWatch metrics:
PutRecord.Bytes and GetRecords.Bytes metrics keep track of shard usage
• Retry if rise in input rate is temporary • Reshard to increase number of
shards • SplitShard – Adds more shards • MergeShard – Removes shards • Use the Kinesis Scaling Utility (on
Github)
Metric Units PutRecords.Bytes Bytes
PutRecords.Latency Milliseconds
PutRecords.Success Count
PutRecords.Records Count
Incoming Bytes Bytes
Incoming Records Count
Topic #6: Dealing with provisioned throughput exceeded Metrics and Resharding (SplitShard/ MergeShard)
![Page 15: Amazon Kinesis Capture, Deliver, and Process Real-time ... Kinesis Meet up with Data… · Amazon Kinesis Capture, Deliver, and Process Real-time Data Streams on AWS . What to Expect](https://reader033.vdocuments.site/reader033/viewer/2022052718/5f0618a77e708231d416453f/html5/thumbnails/15.jpg)
Sending & Reading Data from Kinesis Streams
AWS SDK
LOG4J
Flume
Fluentd
Get* APIs
Kinesis Client Library + Connector Library
Apache Storm
Amazon Elastic MapReduce
Sending Consuming
AWS Mobile SDK
Kinesis Producer Library
AWS Lambda
Apache Spark