introduction to amazon kinesis analytics

29
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roger Barga, GM Amazon Kinesis Ryan Nienhuis, Product Manager, Amazon Kinesis 08/11/2016 Introduction to Amazon Kinesis Analytics Easily Analyze Streaming Data with Standard SQL

Upload: amazon-web-services

Post on 11-Jan-2017

1.375 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Introduction to Amazon Kinesis Analytics

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Roger Barga, GM Amazon Kinesis

Ryan Nienhuis, Product Manager, Amazon Kinesis

08/11/2016

Introduction to Amazon Kinesis

AnalyticsEasily Analyze Streaming Data with Standard SQL

Page 2: Introduction to Amazon Kinesis Analytics

Most data is produced continuously

Mobile Apps Web Clickstream Application Logs

Metering Records IoT Sensors Smart Buildings

[Wed Oct 11 14:32:52

2000] [error] [client

127.0.0.1] client

denied by server

configuration:

/export/home/live/ap/ht

docs/test

Page 3: Introduction to Amazon Kinesis Analytics

The diminishing value of data

Recent data is highly valuable If you act on it in time

Perishable Insights (M. Gualtieri, Forrester)

Old + Recent data is more valuable If you have the means to combine them

Page 4: Introduction to Amazon Kinesis Analytics

Processing real-time, streaming data

• Durable

• Continuous

• Fast

• Correct

• Reactive

• Reliable

What are the key requirements?

Ingest Transform Analyze React Persist

Page 5: Introduction to Amazon Kinesis Analytics

Amazon Kinesis platform overview

Page 6: Introduction to Amazon Kinesis Analytics

Amazon Kinesis Streams

Easy administration: Create a stream, set capacity level with shards. Scale to

match your data throughput rate & volume.

Build real-time applications: Process streaming data with Kinesis Client

Library (KCL), Apache Spark/Storm, AWS Lambda, ....

Low cost: Cost-efficient for workloads of any scale.

Page 7: Introduction to Amazon Kinesis Analytics

Amazon Kinesis Firehose

Zero administration: Capture and deliver streaming data to Amazon S3, Amazon

Redshift, or Amazon Elasticsearch Service without writing an app or managing

infrastructure.

Direct-to-data-store integration: Batch, compress, and encrypt streaming data for

delivery in as little as 60 seconds.

Seamless elasticity: Seamlessly scales to match data throughput without

intervention.

Capture and submit

streaming data to Firehose

Analyze streaming data using your

favorite BI tools Firehose loads streaming data

continuously into S3, Amazon Redshift,

and Amazon ES

Page 8: Introduction to Amazon Kinesis Analytics

Amazon Kinesis Analytics

Apply SQL on streams: Easily connect to a Kinesis stream or Firehose

delivery stream and apply SQL skills.

Build real-time applications: Perform continual processing on streaming big

data with sub-second processing latencies.

Easy scalability: Elastically scales to match data throughput.

Connect to Kinesis streams,

Firehose delivery streamsRun standard SQL queries

against data streams

Kinesis Analytics can send processed data

to analytics tools so you can create alerts

and respond in real time

Page 9: Introduction to Amazon Kinesis Analytics

Amazon Kinesis: streaming data made easyServices make it easy to capture, deliver, and process streams on

AWS

Kinesis Analytics

For all developers, data scientists

Easily analyze data streams

using standard SQL queries

Kinesis Firehose

For all developers, data scientists

Easily load massive

volumes of streaming data

into S3, Amazon Redshift,

or Amazon ES

Kinesis Streams

For Technical Developers

Collect and stream data

for ordered, replayable,

real-time processing

Page 10: Introduction to Amazon Kinesis Analytics

Amazon Kinesis Analytics

Page 11: Introduction to Amazon Kinesis Analytics

Kinesis Analytics

Pay for only what you use

Automatic elasticity

Standard SQL for analytics

Real-time processing

Easy to use

Page 12: Introduction to Amazon Kinesis Analytics

Use SQL to build real-time applications

Easily write SQL code to process

streaming data

Connect to streaming source

Continuously deliver SQL results

Page 13: Introduction to Amazon Kinesis Analytics

Connect to streaming source

• Streaming data sources include Firehose or

Streams

• Input formats include JSON, .csv, variable

column, unstructured text

• Each input has a schema; schema is inferred,

but you can edit

• Reference data sources (S3) for data

enrichment

Page 14: Introduction to Amazon Kinesis Analytics

Write SQL code

• Build streaming applications with one-to-many

SQL statements

• Robust SQL support and advanced analytic

functions

• Extensions to the SQL standard to work

seamlessly with streaming data

• Support for at-least-once processing

semantics

Page 15: Introduction to Amazon Kinesis Analytics

Continuously deliver SQL results

• Send processed data to multiple destinations

• S3, Amazon Redshift, Amazon ES (through

Firehose)

• Streams (with AWS Lambda integration for

custom destinations)

• End-to-end processing speed as low as sub-

second

• Separation of processing and data delivery

Page 16: Introduction to Amazon Kinesis Analytics

Generate time series analytics

• Compute key performance indicates over-time windows

• Combine with historical data in S3 or Amazon Redshift

Analytics

Streams

Firehose

Amazon

Redshift

S3

Streams

Firehose

Custom, real-

time

destinations

Page 17: Introduction to Amazon Kinesis Analytics

Feed real-time dashboards

• Validate and transform raw data, and then process to calculate

meaningful statistics

• Send processed data downstream for visualization in BI and

visualization services

Amazon

QuickSight

Analytics

Amazon ES

Amazon

Redshift

Amazon

RDS

Streams

Firehose

Page 18: Introduction to Amazon Kinesis Analytics

Create real-time alarms and notifications

• Build sequences of events from the stream, like user sessions in a

clickstream or app behavior through logs

• Identify events (or a series of events) of interest, and react to the

data through alarms and notifications

Analytics

Streams

Firehose

Streams

Amazon

SNS

Amazon

CloudWatch

Lambda

Page 19: Introduction to Amazon Kinesis Analytics

SQL on streaming data

• SQL is an API to your data

• Ask for what you want, system decides how to get it

• For all data, not just “flat” data in a database

• Opportunity for novel data organization and algorithms

• A standard (ANSI 2008, 2011) and the most commonly

used data manipulation language

Page 20: Introduction to Amazon Kinesis Analytics

A simple streaming query

• Tweets about the AWS NYC Summit

• Selecting from a STREAM of tweets, an in-application

stream

• Each row has a corresponding ROWTIME

SELECT STREAM ROWTIME, author, text

FROM Tweets

WHERE text LIKE ‘%#AWSNYCSummit%'

Page 21: Introduction to Amazon Kinesis Analytics

A streaming table is a STREAM

• In relational databases, you work with SQL tables

• With Analytics, you work with STREAMS

• SELECT, INSERT, and CREATE can be used with STREAMs

CREATE STREAM Tweets

(author VARCHAR(20),

text VARCHAR(140));

INSERT INTO Tweets

SELECT …

Page 22: Introduction to Amazon Kinesis Analytics

Writing queries on unbounded data sets

• Streams are unbounded data sets

• Need continuous queries, row-by-row or across rows

• WINDOWs define a start and end to the query

SELECT STREAM author,

count(author) OVER ONE_MINUTE

FROM Tweets

WINDOW ONE_MINUTE AS

(PARTITION BY author

RANGE INTERVAL '1' MINUTE PRECEDING);

Page 23: Introduction to Amazon Kinesis Analytics

Different types of windows

Tumbling

Sliding

Custom

Page 24: Introduction to Amazon Kinesis Analytics

Real-time analytical patterns

• Pre-processing: filtering, transformations

• Basic analytics: simple counts, aggregates over windows

• Advanced analytics: detecting anomalies, event

correlation

• Post-processing: alerting, triggering, final filters

Page 25: Introduction to Amazon Kinesis Analytics

Simple pricing model

• Analytics elastically scales based upon your data

throughput and query complexity

• Automatically provision Kinesis Processing Units (KPU),

which represent 1 vCPU and 4 GB memory

• KPU-hour is $0.11 in us-east-1

• Approximate examples

• Filtering events to different destinations for 1 MB/sec stream

is ~$80/month

• Aggregations using 1-minute window for 5 MB/sec stream is

~$150/month

Page 26: Introduction to Amazon Kinesis Analytics
Page 27: Introduction to Amazon Kinesis Analytics

Questions?

Page 28: Introduction to Amazon Kinesis Analytics

Remember to complete

your evaluations!

Page 29: Introduction to Amazon Kinesis Analytics

Thank you!