devices device connectivity storage analytics presentation ... · - azure blob transform - temporal...

28

Upload: others

Post on 20-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Devices Device Connectivity Storage Analytics Presentation & Action

Event Hubs SQL DatabaseMachine

LearningApp Service

Service BusTable/Blob

Storage

Stream

AnalyticsPower BI

External Data

SourcesDocumentDB HDInsight

Notification

Hubs

External Data

SourcesData Factory Mobile Services

BizTalk Services

{ }

• What happened?

• What is happening?

• Why did it happen?

• What will happen?

Past

Present

Future

“Understand the pulse of the Organization”

Everything around us produce data

Traditional Business Intelligence first collects data and analyzes it afterwards

But we live in a fast paced world

Offline data is unuseful

We work with streaming data

We want to monitor and analyze data in near real time

So we don’t have the time to stop, copy data and analyze, but we have to work with streams of data

Batch Analytics

Data Parking into Relational database

Bigdata notion velocity verity volume

Data in motion & live event

Cost effective way

Sort of queries on data warehouse Business competitive age

Intake millions of events per second

At variable loads

Transform, augment, correlate, temporal operations

Elasticity of the cloud for scale out

No hardware (PaaS offering)

Rapid development

TimeDevelopment and operations resources

Infrastructure – Procure and setup

Develop solution (code) for ingress,

processing and egress

Develop solutions to integrate with other

components like ML, BI etc

Develop solutions to manage resiliency,

such as infrastructure failures

Develop solutions and infrastructure for

increasing scale with business growth

Monitoring and Troubleshooting of

solution

Infrastructure – Procure and setup

Develop solution (code) for ingress,

processing and egress

Develop solutions to integrate with other

components like ML, BI etc

Develop solutions to manage resiliency,

such as infrastructure failures

Develop solutions and infrastructure for

increasing scale with business growth

Monitoring and Troubleshooting of

solution

From Event or Data Streams to Real Time Insights in less time with less people resources

End-to-End Architecture Overview

Data Source Collect Process ConsumeDeliver

Event Inputs- Event Hub

- Azure Blob

Transform- Temporal joins

- Filter

- Aggregates

- Projections

- Windows

- Etc.

Enrich

Correlate

Outputs- SQL Azure

- Azure Blobs

- Event Hub

Azure

Storage

• Temporal Semantics

• Guaranteed delivery

• Guaranteed up time

Azure Stream Analytics

Reference Data- Azure Blob

Aerocrine has 26 sales territories staffed, 4 regional managers

Decrease bar to create Stream Processing Solutions via SQL-like LanguageEasily filter, project, aggregate, join streams, add static data with streaming data, detect patterns or lack of patterns with a few lines of SQL

Built-in temporal semantics

Development and debugging experience through Azure PortalManage out-of-order events & actions on late arriving events via configurations

Rapid DevelopmentRapid Development

SELECT count(*), Topic FROM TweetsGROUP BY Topic, TumblingWindow(second, 5)

Pain Points with other Streaming Solutions

Not an end to end solution

Hard to develop

Need expertise and special skills

Costs lot of money on Development

@ApplicationAnnotation(name="WordCountDemo")

public class Application implements StreamingApplication

{

protected String fileName = "com/datatorrent/demos/wordcount/samplefile.txt";

private Locality locality = null;

@Override public void populateDAG(DAG dag, Configuration conf)

{

locality = Locality.CONTAINER_LOCAL;

WordCountInputOperator input = dag.addOperator("wordinput", new WordCountInputOperator());

input.setFileName(fileName);

UniqueCounter<String> wordCount = dag.addOperator("count", new UniqueCounter<String>());

dag.addStream("wordinput-count", input.outputPort, wordCount.data).setLocality(locality);

ConsoleOutputOperator consoleOperator = dag.addOperator("console", new ConsoleOutputOperator());

dag.addStream("count-console",wordCount.count, consoleOperator.input);

}

}

No code compilation, easy to author and deploy

Brings together event streams, reference data and machine learning extensions

All operators respect, and some use, the temporal properties of events

These should (mostly) look familiar if you know relational databases

Filters, projections, joins, windowed (temporal) aggregates, text and date manipulation

Our toll station has multiple toll booths, where a sensor placed on top of the booth scans an RFID card affixed to the windshield of the vehicles as they pass the toll booth.

The passage of vehicles through these toll stations can be modelled as event streams over which interesting operations can be performed.

Toll Id

EntryTime LicensePlate State Make ModelVehicle Type

Vehicle Weight

Toll Tag

12014-09-10 12:01:00.000

JNB 7001 NY Honda CRV 1 1535 7

22014-09-10 12:02:00.000

YXZ 1001 NY Toyota Camry 1 1399 4 123456789

Toll Id ExitTime LicensePlate

1 2014-09-10T12:03:00.0000000Z JNB 7001

2 2014-09-10T12:03:00.0000000Z YXZ 1001

Projections

1, 1450, “VW”,

“Golf”, (…)

2, 1230, “Toyota”,

“Camry”, (…)

1, 2400, “VW”,

“Passat”, (…)1, 980, “Ford”,

“Fiesta”, (…)

SELECT TollId, VehicleWeight / 1000 AS Tons FROM EntryStream

1, 1.45 2, 1.23 1, 2.40 1, 0.980

Show me the Toll Id and Vehicle Weight in Tons for all vehicles passing through the Toll Booth

Filters

SELECT Model FROM EntryStream WHERE Make = "VW"

1, 1450, “VW”,

“Golf”, (…)

2, 1230, “Toyota”,

“Camry”, (…)

1, 2400, “VW”,

“Passat”, (…)1, 980, “Ford”,

“Fiesta”, (…)

“Golf” “Passat”

Show me the Model of vehicles manufactured by Volkswagen

Tumbling Windows

SELECT TollId, COUNT(*) FROM EntryStreamGROUP BY TollId, TumblingWindow(minute,5)

How many vehicles entered each toll both every 5 minutes?

Aggregate functions

Scalar functions

Date and time:

String:

Types

Type Description

bigint Integers in the range -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807).

float Floating point numbers in the range - 1.79E+308 to -2.23E-308, 0, and 2.23E-308 to 1.79E+308.

nvarchar(max) Text values, comprised of Unicode characters. Note: A value other than max is not supported.

datetime Defines a date that is combined with a time of day with fractional seconds that is based on a 24-hour clock and relative to

UTC (time zone offset 0).