aws apac webinar week - real time data processing with kinesis

Post on 16-Apr-2017

1.942 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

aws.amazon.com/webinars/apac/webinar-week | #AWSWebinarWeek

Real-time Data ProcessingKinesis and beyond

Santanu Dutt

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond>!

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

v

Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond!

v

v

ChallengesA. Speed of Analytics and Response

B. Volume of data

C. Maturity or Capabilities of Analytics Framework

D. Storing and Presentation of results

The Motivation for Continuous Processing

v

Some statistics about what AWS Data Services• Metering service

• 10s of millions records per second• Terabytes per hour• Hundreds of thousands of sources• Auditors guarantee 100% accuracy at month end

• Data Warehouse• 100s extract-transform-load (ETL) jobs every day• Hundreds of thousands of files per load cycle• Hundreds of daily users• Hundreds of queries per hour

Metering Service

v

Internal AWS Metering ServiceWorkload• 10s of millions records/sec• Multiple TB per hour• 100,000s of sources

Pain points• Doesn’t scale elastically• Customers want real-time

alerts• Expensive to operate• Relies on eventually consistent

storage

v

Our Big Data Transition

Old requirements• Capture huge amounts of data and process it in hourly or daily batches

New requirements• Make decisions faster, sometimes in real-time• Scale entire system elastically • Make it easy to “keep everything”• Multiple applications can process data in parallel

A General Purpose Data FlowMany different technologies, at different stages of evolution

Client/Sensor Aggregator Continuous Processing

Storage Analytics + Reporting

Kafka

?

vKinesis

Movement or activity in response to a stimulus.

A fully managed service for real-time processing of high-volume, streaming data. Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. Data is replicated across multiple Availability Zones to ensure high durability and availability.

Customer View

Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis

Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data

Software/ Technology

IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence

Digital Ad Tech./ Marketing

Advertising Data aggregation Advertising metrics like coverage, yield, conversion

Analytics on User engagement with Ads, Optimized bid/ buy engines

Financial Services Market/ Financial Transaction order data collection

Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data

Consumer Online/E-Commerce

Online customer engagement data aggregation

Consumer engagement metrics like page views, CTR

Customer clickstream analytics, Recommendation engines

Customer Scenarios across Industry Segments

1 2 3

What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech.

Deliver continuous/ real-time delivery of game insight data by 100’s of game servers

Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers

Custom-built solutions operationally complex to manage, & not scalable

Store + Forward fleet of log servers, and Hadoop based processing pipeline

• Delay with critical business data delivery• Developer burden in building reliable, scalable

platform for real-time data ingestion/ processing• Slow-down of real-time customer insights

• Lost data with Store/ Forward layer• Operational burden in managing reliable, scalable platform

for real-time data ingestion/ processing• Batch-driven real-time customer insights

? Accelerate time to market of elastic, real-time applications – while minimizing operational overhead

Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Amazon Kinesis StreamsBuild your own data streaming applications

• Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume.

• Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.

• Low cost: Cost-efficient for workloads of any scale.

Kinesis Architecture

Run code in response to an event and automatically manage compute.

Amazon Kinesis – An Overview

Kinesis Stream: Managed ability to capture and store data

• Streams are made of Shards

• Each Shard ingests data up to

1MB/sec, and up to 1000 TPS

• Each Shard emits up to 2 MB/sec

• All data is stored for 24 hours

• Scale Kinesis streams by adding or

removing Shards

• Replay data inside of 24Hr. Window

Putting Data into KinesisSimple Put interface to store data in Kinesis• Producers use a PUT call to store data in a

Stream• PutRecord {Data, PartitionKey,

StreamName}

• A Partition Key is supplied by producer and used to distribute the PUTs across Shards

• Kinesis MD5 hashes supplied partition key over the hash key range of a Shard

• A unique Sequence # is returned to the Producer upon a successful PUT call

Creating and Sizing a Kinesis Stream

Building Kinesis Processing Apps: Kinesis Client LibraryClient library for fault-tolerant, at least-once, Continuous Processing

o Java client library, source available on Github

o Build & Deploy app with KCL on your EC2 instance(s)

o KCL is intermediary b/w your application & stream

Automatically starts a Kinesis Worker for each shard

Simplifies reading by abstracting individual shards

Increase / Decrease Workers as # of shards changes

Checkpoints to keep track of a Worker’s location in the

stream, Restarts Workers if they fail

o Integrates with AutoScaling groups to redistribute workers to

new instances

Amazon Kinesis Connector LibraryCustomizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB

S3

DynamoDB

Redshift

Kinesis

ITransformer

• Defines the transformation of records from the Amazon Kinesis stream in order to suit the user-defined data model

IFilter

• Excludes irrelevant records from the processing.

IBuffer

• Buffers the set of records to be processed by specifying size limit (# of records)& total byte count

IEmitter

• Makes client calls to other AWS services and persists the records stored in the buffer.

v

USE Cases Ultra Low Latency Analytics (seconds) Complex Computations• => Complex algorithm execution

• => Tuple Processing – every bit of data processed independently vs. aggregation where it goes from 1st row to last row.

• => Moving Window Analysis – moving car from 2nd to 3rd min and then 5th to 6th min.

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift

• Zero administration: Capture and deliver streaming data into S3, Redshift, and other destinations without writing an application or managing infrastructure.

• Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.

• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention

Capture and submit streaming data to Firehose

Firehose loads streaming data continuously into S3 and Redshift

Analyze streaming data using your favorite BI tools

v

Amazon Kinesis Firehose to RedshiftA two-step process

• Use customer-provided S3 bucket as an intermediate destination• Still the most efficient way to do large scale loads to Redshift.• Never lose data, always safe, and available in your S3 bucket.

• Firehose issues customer-provided COPY command synchronously. It continuously issues a COPY command once the previous COPY command is finished and acknowledged back from Redshift.

1

2

v

USE Cases Kinesis Firehose used when needed to do batch with more frequency. As

long as analysis can be done with SQL.

Micro-batching scenarios with latencies more 60 second tolerable

In case of Redshift Target – Analytics that can be achieved with standard SQL and User Defined Functions (UDFs)

Most “Real-Time Business Insights” kind of scenarios can be easily supported with

Kinesis Firehose + Redshift!

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL

• Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.

• Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies

• Scale elastically: Elastically scales to match data throughput without any operator intervention.

Announcement Only!

Amazon Confidential

Connect to Kinesis streams,Firehose delivery streams

Run standard SQL queries against data streams

Kinesis Analytics can send processed data to analytics tools so you can create alerts and

respond in real-time

v

USE Cases

Low latency time series analytics

Analytics that can be achieved with confines of supported SQL • - Running Totals• - Moving Averages• - Number of people entering a stadium

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Amazon DynamoDB Streams – time-ordered sequence of item-level changes• Time and partition ordered log

• Provides a stream of inserts, deletes, updates• Old item• New item• Primary key• Change type

• Stream items delivered exactly once

• Streams are asynchronous

• Scales with your table

DynamoDB DynamoDB Streams

v

USE Cases

Ultra Low Latency Analytics (seconds) when data is available in Kinesis and DynamoDB Stream, e.g.

Energy meters data coming into Kinesis, to continuously update billing info.

Changes to social network profile stored in DynamoDB, to transmit updates to connection immediately (e.g. user adds a new job to his profile).

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis and Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

v

How Elasticsearch can help

• Combined with Logstash and Kibana, the ELK stack provides a tool for real-time analytics and data visualization

Plug-insA. Kinaba 3B. Kibana 4C. JettyD. cloud-awsE. KuromojiF. icu

v

v

ElasticSearch APIQUERY

AGGREGATION

Aggregation and FilteringDocuments

Aggregation and FilteringDocuments

Query

Aggregation and FilteringDocuments

Query

Buckets

Aggregation and FilteringDocuments

Query

Buckets

Aggregation and FilteringDocuments

Query

Buckets

Metrics 123 420 510

v

USE Cases Real-Time Dashboards (Kibana)

Alerting (Percolator API)

Real-Text Analytics, as in Social Media Listening

Real-Time Geospatial Queries and Geospatial Analysis

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis & Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

v

AWS IoT

“Securely connect one or one-billion devices to AWS, so they can interact with applications and other devices”

v

AWS IoT

DEVICE SDKSet of client libraries to

connect, authenticate and exchange messages

DEVICE GATEWAYCommunicate with devices via

MQTT and HTTP

AUTHENTICATIONAUTHORIZATION

Secure with mutual authentication and encryption

RULES ENGINETransform messages

based on rules and route to AWS Services

AWS Services- - - - -

3P Services

DEVICE SHADOWPersistent thing state during

intermittent connections

APPLICATIONS

AWS IoT API

DEVICE REGISTRYIdentity and Management of

your things

v

USE Cases Processing sensor data (millions of data points from hundreds of thousands of

sensors) in real time for Alerting

Redirecting sensor data for multi-data-point analysis to Kinesis, DynamoDB

Spark/Storm

Lambda(arbitrary, Node,

Python, Java)

Redshift(structured, SQL)

ElasticSearch(un-structured, JSON)

HIVE SQL

Quick Sight(GUI)

Kinesis Analytics(Limited SQL)

IoT Rule Engine(SQL)

Diffi

culty

of w

orki

ng

with

Spark/StormKinesis

ElasticSearch+ Logstash

Lambda+ Kinesis

Kinesis Analytics

Redshift + DMS

Redshift +Firehose

MR/HIVE/Impala/ Presto +

Firehose

Quick Sight

LATENCY

CAPA

BILI

TIES

IoT Rule Engine

Sub-second Few seconds 2-5 Minutes

Storm+ Kafka

INDEXA. What is real-time?

B. Examples and Challenges

C. Kinesis & Beyond

1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine

D. Demo

v

Demo Time.

Website - https://secure.amitksh.net/cdn/webinarWeek.htmlReal time updates from Kinesis - https://secure.amitksh.net/rtChart.html

Interesting Possibilities!

Quick Sight

Online Labs & Training

Gain confidence and hands-on experience with AWS.

Watch free Instructional Videos and explore Self-Paced Labs

Instructor Led Classes

Learn how to design, deploy and operate highly available, cost-

effective and secure applications on AWS in courses led by qualified

AWS instructors

Validate your technical expertise with AWS and use practice exams to help you

prepare for AWS Certification

AWS Certification

More info at http://aws.amazon.com/training

top related