an open-source streaming machine learning and real-time ... · an open-source streaming machine...
TRANSCRIPT
1
William Markito@william_markito
Fred Melo@fredmelo_br
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
Using an IoT example
(incubating) (incubating)
2
Traditional Data Analytics - Limitations
HDFS
Data Lake
Store Analytics
Hard to change Labor intensive
Inefficient
No real-time information ETL based Data-source specific
3
Stream-based, Real-Time Closed-Loop Analytics
HDFSData LakeExpert System /
Machine Learning
In-Memory Real-Time Data
Continuous Learning Continuous
Improvement Continuous Adapting
Data Stream Pipeline
Multiple Data Sources Real-Time Processing Store Everything
4
A Streaming Machine Learning for IoT Example
Sensor Data
Smart System
Learns with HISTORICAL TRENDS
"How were the temperature and vibration sensors reading when the latest failures happened? "
Live data becomes historical over time
Real-Time
Evaluates LIVE DATA“According to historical trends, there’s an 80% chance this equipment would fail in the next 12 hours"
Historical
Predictive Maintenance Scenario
5
Info
Analysis
Look at past trends (for similar input)
Evaluate current input
Score / Predict
Machine Learning
Streaming Machine Learning
12
Neural Network
In-Memory Data GridReal-time scoring
Train
Supervised Learning ExampleStreaming Machine Learning
13
Ingest Transform SinkSpringXD
Store / Analyze
Fast Data
Distributed Computing
Predict / Machine Learning
Other Sources and Destinations
JMS
A Streaming Machine Learning Reference Architecture
Trilateration and its limitations
15
Noisy Data
Physical Barriers
Large Overlap Areas
Moving Targets
Innacuracy
Large Overlap Areas
The Solution
18
1. Capture signal strength 2. Calculate distance from
antenna 3. Trilaterate different sensors
to predict location in real-time 4. Show on a map with live
updates
Architecture Overview
19
Ingest
SpringXD
Groovy
JSON HTTP
+ Distance
Transform Sink
Calculate Device Distance Predict
Location
Spring Boot
Application Platform
GUI
20
• Cache • Configurable through XML, ,Java
• Region • Distributed j.u.Map on steroids • Highly available, redundant
• Member • Locator, Server, Client
• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial
Geode Basic Concepts
Spring XD
22
A stream is composed from modules. Each module is deployed to a container and its channels are bound to the transport.
24
Why have we selected those projects
• Iterative & Exploratory model
• Web based REPL • Multiple Interpreters
• Apache Geode • Apache Spark • Markdown • Flink • Python…
• In-memory & Persistent • Highly Consistent • Extreme transaction
processing • Thousands of concurrent
clients • Reliable event model
• Productivity • Built-in connectors • Cloud Agnostic • Highly Scalable • Easy to setup • Streams without coding
25
https://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoTSource code and detailed instructions available at:
25
William Markito@william_markito
Fred Melo@fredmelo_br
Follow us on GitHub!