introduction to amazon kinesis analytics
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roger Barga, GM Amazon Kinesis
Ryan Nienhuis, Product Manager, Amazon Kinesis
08/11/2016
Introduction to Amazon Kinesis
AnalyticsEasily Analyze Streaming Data with Standard SQL
Most data is produced continuously
Mobile Apps Web Clickstream Application Logs
Metering Records IoT Sensors Smart Buildings
[Wed Oct 11 14:32:52
2000] [error] [client
127.0.0.1] client
denied by server
configuration:
/export/home/live/ap/ht
docs/test
The diminishing value of data
Recent data is highly valuable If you act on it in time
Perishable Insights (M. Gualtieri, Forrester)
Old + Recent data is more valuable If you have the means to combine them
Processing real-time, streaming data
• Durable
• Continuous
• Fast
• Correct
• Reactive
• Reliable
What are the key requirements?
Ingest Transform Analyze React Persist
Amazon Kinesis platform overview
Amazon Kinesis Streams
Easy administration: Create a stream, set capacity level with shards. Scale to
match your data throughput rate & volume.
Build real-time applications: Process streaming data with Kinesis Client
Library (KCL), Apache Spark/Storm, AWS Lambda, ....
Low cost: Cost-efficient for workloads of any scale.
Amazon Kinesis Firehose
Zero administration: Capture and deliver streaming data to Amazon S3, Amazon
Redshift, or Amazon Elasticsearch Service without writing an app or managing
infrastructure.
Direct-to-data-store integration: Batch, compress, and encrypt streaming data for
delivery in as little as 60 seconds.
Seamless elasticity: Seamlessly scales to match data throughput without
intervention.
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools Firehose loads streaming data
continuously into S3, Amazon Redshift,
and Amazon ES
Amazon Kinesis Analytics
Apply SQL on streams: Easily connect to a Kinesis stream or Firehose
delivery stream and apply SQL skills.
Build real-time applications: Perform continual processing on streaming big
data with sub-second processing latencies.
Easy scalability: Elastically scales to match data throughput.
Connect to Kinesis streams,
Firehose delivery streamsRun standard SQL queries
against data streams
Kinesis Analytics can send processed data
to analytics tools so you can create alerts
and respond in real time
Amazon Kinesis: streaming data made easyServices make it easy to capture, deliver, and process streams on
AWS
Kinesis Analytics
For all developers, data scientists
Easily analyze data streams
using standard SQL queries
Kinesis Firehose
For all developers, data scientists
Easily load massive
volumes of streaming data
into S3, Amazon Redshift,
or Amazon ES
Kinesis Streams
For Technical Developers
Collect and stream data
for ordered, replayable,
real-time processing
Amazon Kinesis Analytics
Kinesis Analytics
Pay for only what you use
Automatic elasticity
Standard SQL for analytics
Real-time processing
Easy to use
Use SQL to build real-time applications
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
Connect to streaming source
• Streaming data sources include Firehose or
Streams
• Input formats include JSON, .csv, variable
column, unstructured text
• Each input has a schema; schema is inferred,
but you can edit
• Reference data sources (S3) for data
enrichment
Write SQL code
• Build streaming applications with one-to-many
SQL statements
• Robust SQL support and advanced analytic
functions
• Extensions to the SQL standard to work
seamlessly with streaming data
• Support for at-least-once processing
semantics
Continuously deliver SQL results
• Send processed data to multiple destinations
• S3, Amazon Redshift, Amazon ES (through
Firehose)
• Streams (with AWS Lambda integration for
custom destinations)
• End-to-end processing speed as low as sub-
second
• Separation of processing and data delivery
Generate time series analytics
• Compute key performance indicates over-time windows
• Combine with historical data in S3 or Amazon Redshift
Analytics
Streams
Firehose
Amazon
Redshift
S3
Streams
Firehose
Custom, real-
time
destinations
Feed real-time dashboards
• Validate and transform raw data, and then process to calculate
meaningful statistics
• Send processed data downstream for visualization in BI and
visualization services
Amazon
QuickSight
Analytics
Amazon ES
Amazon
Redshift
Amazon
RDS
Streams
Firehose
Create real-time alarms and notifications
• Build sequences of events from the stream, like user sessions in a
clickstream or app behavior through logs
• Identify events (or a series of events) of interest, and react to the
data through alarms and notifications
Analytics
Streams
Firehose
Streams
Amazon
SNS
Amazon
CloudWatch
Lambda
SQL on streaming data
• SQL is an API to your data
• Ask for what you want, system decides how to get it
• For all data, not just “flat” data in a database
• Opportunity for novel data organization and algorithms
• A standard (ANSI 2008, 2011) and the most commonly
used data manipulation language
A simple streaming query
• Tweets about the AWS NYC Summit
• Selecting from a STREAM of tweets, an in-application
stream
• Each row has a corresponding ROWTIME
SELECT STREAM ROWTIME, author, text
FROM Tweets
WHERE text LIKE ‘%#AWSNYCSummit%'
A streaming table is a STREAM
• In relational databases, you work with SQL tables
• With Analytics, you work with STREAMS
• SELECT, INSERT, and CREATE can be used with STREAMs
CREATE STREAM Tweets
(author VARCHAR(20),
text VARCHAR(140));
INSERT INTO Tweets
SELECT …
Writing queries on unbounded data sets
• Streams are unbounded data sets
• Need continuous queries, row-by-row or across rows
• WINDOWs define a start and end to the query
SELECT STREAM author,
count(author) OVER ONE_MINUTE
FROM Tweets
WINDOW ONE_MINUTE AS
(PARTITION BY author
RANGE INTERVAL '1' MINUTE PRECEDING);
Different types of windows
Tumbling
Sliding
Custom
Real-time analytical patterns
• Pre-processing: filtering, transformations
• Basic analytics: simple counts, aggregates over windows
• Advanced analytics: detecting anomalies, event
correlation
• Post-processing: alerting, triggering, final filters
Simple pricing model
• Analytics elastically scales based upon your data
throughput and query complexity
• Automatically provision Kinesis Processing Units (KPU),
which represent 1 vCPU and 4 GB memory
• KPU-hour is $0.11 in us-east-1
• Approximate examples
• Filtering events to different destinations for 1 MB/sec stream
is ~$80/month
• Aggregations using 1-minute window for 5 MB/sec stream is
~$150/month
Questions?
Remember to complete
your evaluations!
Thank you!