implementing a canonical iot backend in azure with azure stream analytics
TRANSCRIPT
Marco Parenzan
Implementing a canonical IoT backend in Azure with Azure Stream Analytics
IoT Experiments @Intel - Assago (Milano)DotNetLombardiaWednesday, May 27, 2015 from 9:00 AM to 6:00 PM (CEST)Milano Fiori, Italy
Speaker info/Marco Parenzan
www.slideshare.net/marco.parenzan www.github.com/marcoparenzan marco [dot] parenzan [at] 1nn0va [dot]
it www.1nnova.it @marco_parenzan
Formazione ,Divulgazione e Consulenza con 1nn0vaMicrosoft MVP 2014 for Microsoft AzureCloud Architect, NET developerLoves Functional Programming, Html5 Game Programming and Internet of Things
AZURE COMMUNITY
BOOTCAMP 2015
IoT Day - 08/05/2015
@1nn0va#microservicesconf20159 Maggio 2015
IoT as an hobby (now…?)
Canonical Stream Analytics PatternPresentation and action
Storage andBatch Analysis
StreamAnalysis
IngestionCollectionEvent production
Event hubs
Cloud gateways(web APIs)
Field gateways
Applications
Legacy IOT (custom protocols)
Devices
IP-capable devices(Windows/Linux)
Low-power devices (RTOS)
Search and query
Data analytics(Power BI)
Web/thick client dashboardsEvent Hubs
SQL DB
Storage Tables
Power BI
Storage Blobs
Stream Analytics
Devices to take action
MachineLearning
more to come…
• Analytics on Data in motion• Focus on building solutions• … not on solution infrastructure• … and get there faster
Scenario
ARCHIVING DASHBOARDING TRIGGERING WORKFLOWS
What is Streaming Data?
Data in MotionData at Rest
Introducing Azure Stream Analytics
Mission critical reliability and scale
Enables rapid development
Fully managed real-time analytics
Real-time analytics
Fully managed real-time analytics
Real-time Analytics
• Intake millions of events per second (up to 1 GB/s)
• Low processing latency, auto adaptive (sub-second to seconds)
• Correlate between different streams, or with reference data
• Find patterns or lack of patterns in data in real-time
Fully Managed Cloud Service
• No hardware acquisition and maintenance
• No platform/infrastructure deployment and maintenance
• Easily expand your business globally leveraging Azure regions
Mission critical
Mission Critical Reliability
• Guaranteed event delivery
• Guaranteed business continuity: Automatic and fast recovery
Effective Audits
• Privacy and security properties of solutions are evident
• Azure integration for monitoring and ops alerting
Easy To Scale
• Scale from small to large on demand
Mission critical reliability and scale
Rapid development
Rapid Development with SQL like language
• High-level: focus on stream analytics solution
• Concise: less code to maintain
• Fast test: Rapid development and debugging
• First-class support for event streams and reference data
Built in temporal semantics
• Built-in temporal windowing and joining
• Simple policy configuration to manage out-of-order eventsand late arrivals
Enables rapid development
DML• SELECT• FROM• WHERE• GROUP BY• HAVING• CASE WHEN THEN ELSE• INNER/LEFT OUTER JOIN• UNION• CROSS/OUTER APPLY• CAST• INTO• ORDER BY ASC, DSC
SAQL – Language & Library
Scaling Extensions• WITH• PARTITION BY• OVER
Date and Time Functions• DateName• DatePart• Day• Month• Year• DateTimeFromParts• DateDiff• DateAdd
Windowing Extensions• TumblingWindow• HoppingWindow• SlidingWindow
Aggregate Functions• Sum• Count• Avg• Min• Max• StDev• StDevP• Var• VarP
String Functions• Len• Concat• CharIndex• Substring• PatIndex
Temporal Functions• Lag, IsFirst• CollectTop
Pipeline
SELECT UserName, TimeZoneINTO OutputTableFROM InputStream
Put the data in a static data container
Filters
SELECT UserName, TimeZoneFROM InputStreamWHERE Topic = 'XBox'
Show me the user name and time zone of tweets on the topic XBox
"Haroon”, “Eastern Time (US & Canada)”
"XO", “London”
“Zach Dotseth“, “London”, “Football”,(…)
"Haroon”, “Eastern Time (US & Canada)” “XBox”,(…)
"XO",”London”, “XBox“, (…)
time
Windowing Concepts• Windows can be tumbling, hopping, or sliding
• Windows are fixed length
• Must be used in a GROUP BY clause
• Output event will have the timestamp of the end of the window
1 5 4 26 8 6 4
t1 t2 t5 t6t3 t4
Time
Window 1 Window 2 Window 3
Aggregate Function (Sum)
18 14Output Events
Tumbling Windows
SELECT Topic, Count(*) AS TotalTweetsFROM TwitterStream TIMESTAMP BY CreatedAtGROUP BY Topic, TumblingWindow(second, 10)
“Give me the count of tweets every 10 seconds”
1 5 4 26 8 6
0 2010 Time (secs)
A 10-second Tumbling Window
30
8 6
5 3 6 1
1 5 4 26
6 15 3
Hopping Windows
SELECT Topic, Count(*) AS TotalTweetsFROM TwitterStream TIMESTAMP BY CreatedAtGROUP BY Topic, HoppingWindow(second, 10, 5)
“Every 5 seconds give me the count of tweets over the last 10 seconds”
1 5 4 26 8 6
0 5 2010 15 Time (secs)
25
A 10-second Hopping Window with a 5-second “Hop”
30
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
Sliding Windows
SELECT Topic, Count(*) AS TotalTweetsFROM TwitterStream TIMESTAMP BY CreatedAtGROUP BY Topic, SlidingWindow(second, 10)
“Give me the count of tweets in every distinct 10 seconds window”
1 5 4 26 8 6
0 2010 Time (secs)
Every 10-second Sliding Window with changes
30
8 6
5 3 6 1
1 5 4 26
6 15 3
Using Windowing• Tumbling• Sample that cannot repeat• Sampling in a production line (item exist in just one window)
• Hopping• Sample that can repeat• Sampling in a “fixed group” (item exists in multiple window)
• Sliding• Every sample count• Sampling
Reference Data
Seamless correlation of event streams with reference dataStatic or slowly-changing data stored in blobs
CSV and JSON files in Azure Blobs;scanned for new snapshots on a settable cadence
JOIN (INNER or LEFT OUTER) between streams and reference data sources
Reference data appears like another input:SELECT myRefData.Name, myStream.Value FROM myStreamJOIN myRefData
ON myStream.myKey = myRefData.myKey
Multiple steps, multiple outputsWITH Step1 AS (
SELECT Count(*) AS CountTweets, Topic
FROM TwitterStream PARTITION BY PartitionId
GROUP BY TumblingWindow(second, 3), Topic, PartitionId
),
Step2 AS (
SELECT Avg(CountTweets)
FROM Step1
GROUP BY TumblingWindow(minute, 3)
)
SELECT * INTO Output1 FROM Step1
SELECT * INTO Output2 FROM Step2
SELECT * INTO Output3 FROM Step2
• A query can have multiple steps to enable pipeline execution
• A step is a sub-query defined using WITH (“common table expression”)
• Can be used to develop complex queries more elegantly by creating a intermediary named result
• Creates unit of execution for scaling out when PARTITION BY is used
• Each step’s output can be sent to multiple output targets using INTO
Stream Analytics
Scaling using Partitions Partitioning allows for parallel execution over scaled-out resources
SELECT Count(*) AS Count, Topic
FROM TwitterStream PARTITION BY PartitionId
GROUP BY TumblingWindow(minute, 3), Topic, PartitionId
Query Result 1
Query Result 2
Query Result 3
PartitionId = 1
PartitionId = 3PartitionId = 2
PartitionId = 1
PartitionId = 2
PartitionId = 3
Event Hub
Demo
PricingVolume of data processed by the streaming job• €0.0008/GB
• Streaming Unit (Blended measure of CPU, memory, throughput)• €0.0231/hr
Azure Machine Learning
Undestand the “sequence” of data in the history to predict the futurehttp://www.slideshare.net/davidemauri/azureml-creating-and-using-machine-learning-solutions-italian
Marco Parenzan
Grazie
IoT Experiments @Intel - Assago (Milano)DotNetLombardiaWednesday, May 27, 2015 from 9:00 AM to 6:00 PM (CEST)Milano Fiori, Italy