![Page 1: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dr. Steffen Hausmann, Solutions Architect, AWS
May 18, 2017
Analyzing Streaming Data in Real-Time with Amazon Kinesis Analytics
![Page 2: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/2.jpg)
Amazon Kinesis makes it easy to work with real-time streaming data
Amazon Kinesis Streams
• For technical developers• Collect and stream data
for ordered, replayable, real-time processing
Amazon Kinesis Firehose
• For all developers, data scientists
• Easily load massive volumes of streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service
Amazon Kinesis Analytics
• For all developers, data scientists
• Easily analyze data streams using standard SQL queries
![Page 3: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/3.jpg)
Amazon Kinesis Analytics
Pay for only what you use
Automatic elasticity
Standard SQL for analytics
Real-time processing
Easy to use
![Page 4: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/4.jpg)
Connect to streaming source
• Streaming data sources include Amazon Kinesis Firehose or Amazon Kinesis Streams
• Input formats include JSON, .csv, variable column, or unstructured text
• Each input has a schema; schema is inferred, but you can edit
• Reference data sources (S3) for data enrichment
![Page 5: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/5.jpg)
Write SQL code
• Build streaming applications with one-to-many SQL statements
• Robust SQL support and advanced analytic functions
• Extensions to the SQL standard to work seamlessly with streaming data
• Support for at-least-once processing semantics
![Page 6: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/6.jpg)
Continuously deliver SQL results
• Send processed data to multiple destinations• S3, Amazon Redshift, Amazon ES (through
Firehose)• Streams (with AWS Lambda integration for
custom destinations)• End-to-end processing speed as low as sub-
second• Separation of processing and data delivery
![Page 7: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/7.jpg)
What are common uses for Amazon Kinesis Analytics?
![Page 8: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/8.jpg)
Generate time series analytics
• Compute key performance indicators over time periods• Combine with static or historical data in S3 or Amazon Redshift
Analytics
Streams
Firehose
Amazon Redshift
S3
Streams
Firehose
Custom, real-time
destinations
![Page 9: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/9.jpg)
Feed real-time dashboards
• Validate and transform raw data, and then process to calculate meaningful statistics
• Send processed data downstream for visualization in BI and visualization services
Amazon QuickSight
Analytics
Amazon ES
Amazon Redshift
AmazonRDS
Streams
Firehose
![Page 10: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/10.jpg)
Create real-time alarms and notifications
• Build sequences of events from the stream, like user sessions in a clickstream or app behavior through logs
• Identify events (or a series of events) of interest, and react to the data through alarms and notifications
Analytics
Streams
Firehose
Streams
AmazonSNS
Amazon CloudWatch
Lambda
![Page 11: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/11.jpg)
Example: Bundesliga Tweet Analysis
![Page 12: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/12.jpg)
Example Scenario Requirements
Data to capture• Filter for soccer-related tweets• Total number of tweets per hour that contain hashtags for
soccer teams• Top 5 mentioned teams names per hour
Output Requirements• Filtered tweets are saved to Amazon S3• Hourly aggregate count is saved to Amazon ES• Full team name of top 5 hashtags are saved to Amazon ES
![Page 13: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/13.jpg)
Why use Amazon Kinesis Analytics for this solution?
Challenges• Twitter stream can be noisy• Tweet structure is complex, with several levels of
nested JSON• soccer-related tweet volume is cyclical
With Amazon Kinesis Analytics:• Easily filter out unwanted tweets• Normalize tweet schema for simple SQL queries• Automatically scale to meet demand
![Page 14: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/14.jpg)
End-to-End Architecture
Amazon Kinesis Streams
Amazon Kinesis
Analytics
Amazon Kinesis
Firehose
Amazon Elasticsearch
Service
Amazon S3
EC2 instance
Referencedata
![Page 15: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/15.jpg)
How is streaming data accessed with SQL?STREAM• Analogous to a TABLE• Represents continuous data flow
CREATE OR REPLACE STREAM ”BL_TWEET_STREAM" (ID BIGINT, TWEET_TEXT VARCHAR(140), HASHTAG VARCHAR(140));
PUMP• Continuous INSERT query • Inserts data from one in-application stream to another
CREATE OR REPLACE PUMP ”BL_TWEET_PUMP" ASINSERT INTO ”BL_TWEET_STREAM"
SELECT STREAM * FROM . . .
![Page 16: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/16.jpg)
Kinesis Analytics Application Overview
BL_TWEET_STREAM
•ID•TEXT•HASHTAG
TOTAL_TWEETS_STREAM
•TWEET_COUNT
MENTION_COUNT_STREAM
•TEAMNAME•MENTIONCOUNT
SOURCE_STREAM
•id•text•tag
Amazon Kinesisstream
Amazon KinesisFirehose
Amazon KinesisFirehoseTeamName
• hashtag• team
![Page 17: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/17.jpg)
How are tweets mapped to a schema?
Amazon Kinesis stream Amazon Kinesis Analytics
{ "id": 795296435386388500, "text": "#FCB Spiel heute Abend! #bl", "created_at": "11-06-2016 16:07:00", "tags": [{ "tag": ”FCB" }, { "tag": ”bl" }]}
id text created_at tag
795… #FCB… 11-06-2016… FCB
795… #FCB… 11-06-2016… bl
Source data for Amazon Kinesis Analytics
![Page 18: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/18.jpg)
How do we filter unwanted tweets?
Use PUMP to insert filtered data into STREAM
BL_TWEET_STREAM
•ID•TEXT•HASHTAG
SOURCE_STREAM
•id•text•tag
CREATE OR REPLACE PUMP ”BL_TWEET_PUMP" AS INSERT INTO ”BL_TWEET_STREAM" SELECT STREAM "id", "text", LOWER("tag") FROM "SOURCE_STREAM" WHERE LOWER("tag") NOT IN (‘bl’,‘bundesliga’);
![Page 19: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/19.jpg)
How do we get team name from the hashtag?• Create CSV file with hashtag to team name map in S3• Configure Amazon Kinesis Analytics application to import
file as reference data• Reference data appears as a table• Join streaming data on reference data
hashtag,teamFCB,FC Bayern MünchenBayern,FC Bayern MünchenBVB,Borussia DortmundBorussia,Borussia DortmundTSV,TSV 1860 München
s3://mybucket/team_map.csv
![Page 20: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/20.jpg)
Use Reference Data in Query
SELECT STREAM tn."team"FROM ”BL_TWEET_STREAM" tweetsINNER JOIN "TeamName" tn ON tweets."HASHTAG" = LOWER(tn."hashtag")
TeamName
• hashtag• team
BL_TWEET_STREAM
•ID•TEXT•HASHTAG
FCB,FC Bayern MünchenBayern,FC Bayern MünchenBVB,Borussia DortmundBorussia,Borussia DortmundTSV,TSV 1860 München
s3://mybucket/team_map.csv
FC Bayern MünchenFC Bayern MünchenWerder BremenBorussia DortmundHertha BSC
![Page 21: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/21.jpg)
How do we aggregate streaming data?
• A common requirement in streaming analytics is to perform set-based operation(s) (count, average, max, min,..) over events that arrive within a specified period of time
• Cannot simply aggregate over an entire table like typical static database
• How do we define a subset in a potentially infinite stream?
• Windowing functions!
![Page 22: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/22.jpg)
Windowing Concepts• Windows can be tumbling or sliding• Windows are fixed length
Output record will have the timestamp of the end of the window
1 5 4 26 8 6 4
t1 t2 t5 t6t3 t4
Time
Window 1 Window 2 Window 3
AggregateFunction (Sum)
18 14Output Events
![Page 23: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/23.jpg)
How do we aggregate team mentions per hour?• Use TOP_K_ITEMS_TUMBLING function• Pass cursor to team name stream • Define window size of 3600 seconds
INSERT INTO "MENTION_COUNT_STREAM" SELECT STREAM * FROM TABLE(TOP_K_ITEMS_TUMBLING( CURSOR(SELECT STREAM tn."team"... ), 'teamname', -- name of column to aggregate 5, -- number of top items 3600 -- tumbling window size in seconds ));
![Page 24: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/24.jpg)
Output to Amazon Kinesis FirehoseMENTION_COUNT_STREAM
•TEAMNAME•MENTIONCOUNT
AmazonElasticsearch
Service
KibanaAmazon Kinesis Firehose
{ "aggregationtime": "2016-11-06T14:42:03.335", "teamname": "Hertha BSC", "mentioncount": 604 }
5 records, every hour
![Page 25: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/25.jpg)
Visualize Results with Kibana
![Page 26: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/26.jpg)
Amazon Kinesis Analytics Best Practices
![Page 27: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/27.jpg)
Managing Applications
Set up Cloudwatch Alarms• MillisBehindLatest metric tracks how far
behind the application is from the source• Alarm on MillisBehindLatest metric.
Consider triggering when 1-hour behind, on a 1-minute average. Adjust accordingly for applications with lower end-to-end processing needs.
![Page 28: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/28.jpg)
Managing Applications
Increase input parallelism to improve performance • By default, a single source in-application
stream is created• If application is not keeping up with input
stream, consider increasing input parallelism to create multiple source in-application streams
![Page 29: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/29.jpg)
Managing Applications
Limit number of applications reading from same source• Avoid ReadProvisionedThroughputExceeded
exceptions• For an Amazon Kinesis Streams source, limit
to 2 total applications• For an Amazon Kinesis Firehose source, limit
to 1 application
![Page 30: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/30.jpg)
Defining Input Schema
• Review and adequately test inferred input schema
• Manually update schema to handle nested JSON with greater than 2 levels of depth
• Use SQL functions in your application for unstructured data
![Page 31: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/31.jpg)
Authoring Application Code
• Avoid time-based windows greater than one hour
• Keep window sizes small during development• Use smaller SQL queries, with multiple in-
application streams, rather than a single, large query
![Page 32: NGA1-7 AWS Hausmann Analyzing Streaming Data in Real-time ...aws-de-media.s3-eu-west-1.amazonaws.com/images/AWS_Summit_… · Amazon Kinesis makes it easy to work with real-time streaming](https://reader033.vdocuments.site/reader033/viewer/2022042218/5ec45e90a278ab1da5430c0f/html5/thumbnails/32.jpg)
Thank you!