virdata: lessons learned from the internet of things and m2m cloud services @ ibm big data...
DESCRIPTION
Presentation I gave at the IBM Big Data Developers meetup group in San Jose, CA. There is also a video available of this talk at: https://www.youtube.com/watch?v=TSt49yPBmW0&t=7m59sTRANSCRIPT
Big Data Developers - Virdata, Internet of Things #virdata
Big Data & IoT: lessons learned
Big Data Developers Meetup, San Jose, CA - June 5, 2014#virdata | @nathan_gs
Big Data Developers - Virdata, Internet of Things #virdata
Who is Technicolor?
Domains
● Media Services
● Entertainment Services
● Connected Home
● Emerging Ventures
● Technology & Innovations
Who We AreTechnicolor, a worldwide technology leader in the media and entertainment sector, is at the forefront of digital innovation. Our world class research and innovation laboratories and our creative talent pool enable us to lead the market in delivering advanced services to content creators and distributors. We also benefit from an extensive intellectual property portfolio focused on imaging and sound technologies, supporting our thriving licensing business.
Big Data Developers - Virdata, Internet of Things #virdata
Virdata – OUR CORE CLOUD SERVICES
Device Monitoring
Device Management
Big Data Analytics
Big DataQueries
Application Monitoring
Virdata Cloud APIs
MQTT
MQTT
MQTT
MQTTMQT
T
MQTT
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - 2 COMPONENTS: A CLOUD & A LIBRARY
★ Elastic and Scalable cutting edge technologies★ API’s for different types of information/data consumption★ Cloud agnostic thru self build monitoring tools★ Running on both public & private cloud infrastructure★ Bi-directional messaging★ High performance brokers architecture
★ Lightweight and portable library★ Multiple programming languages★ Supports multiple transport protocols★ Available for all HW and OS★ Supports any type of data in any format/syntax★ Payload is compressed and encrypted
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - SERVICE ARCHITECTURE
millions of simultaneous persistent bi-directional connectionsmillions of messages per second
Real-time Complex Event ProcessingDistributed Pub/Sub Messaging
Historical Data Archiving Pre-computed Data In-Memory real-time Data
REST APILaunch Queries - Launch Jobs
INTEGRATIONCUSTOMIZATION
NOC, OPERATIONS, MGMT REPORTS, TRENDSANALYTICS
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - VERTICAL INDUSTRIES
AUTOMOTIVE● Fleet Management● Insurance● Emergency Services
UTILITIES● Remote Meter Management● Monitor Energy Consumption● Optimize Subscription Plan
CONSUMER ELECTRONICS● Monitoring & Management● Upsell Services● Enhanced End User Experience
CUSTOMER CARE● Monitor Device & Application● One Button Care● Call Avoidance
RETAIL● Geo-location Based Adverts● Heat Mapping● Individualized Offering
HEALTH● Promote Patient Independence● Time-Series Analysis● Pro-active Responses
Big Data Developers - Virdata, Internet of Things #virdata
Live DemoContact us for a live demo at [email protected] or virdata.com.
Big Data Developers - Virdata, Internet of Things #virdata
Connected “Things”
Big Data Developers - Virdata, Internet of Things #virdata
Huge variety in devices and OSs.
Big Data Developers - Virdata, Internet of Things #virdata
Virdata Client Libraries
Big Data Developers - Virdata, Internet of Things #virdata
APIs
Big Data Developers - Virdata, Internet of Things #virdata
Northbound and Southbound API
Northbound API = Cloud API
● Messaging API○ REST○ PUB/SUB○ MQTT○ JMS
● Data Processing API○ SQL○ JobAPI○ Query/REST
Southbound API provided at the device level
Big Data Developers - Virdata, Internet of Things #virdata
Integration of Virdata into IBM BlueMix
Objectives• Show the strengths of the Virdata Internet of Things platform
• Scalability to supports millions of connected devices• Real-time and historical data processing• Cloud API’s powering new data drives services across vertical markets
• Demonstrate the power of the IBM BlueMix solution• Rapid development and deployment of new applications• Platform as a Service marketplace
• Highlight the value of combining both• Internet of Things platform as a service
Use-case• Virdata provides real-time car data• App acts upon car trouble codes• Invokes manufacturer analytics service• Initiates recommended actions, e.g. through
Maximo workflow service• Schedules car dealer appointment • Informs the car driver
Big Data Developers - Virdata, Internet of Things #virdata
Messaging & Broker
Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Device to Platform
Protocol Adapter
Protocol Adapter
Protocol Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data Processing
API
State
State
State
Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Device to Device(s)
Protocol Adapter
Protocol Adapter
Protocol Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data Processing
API
State
State
State
Big Data Developers - Virdata, Internet of Things #virdata
Messaging Architecture: Large Fan Out
Protocol Adapter
Protocol Adapter
Protocol Adapter
Kafka
Kafka
Kafka
Kafka
Storm
Storm
Storm
API
Data Processing
API
State
State
State
Big Data Developers - Virdata, Internet of Things #virdata
Horizontally scalable… and elastic as well.
Messaging
Big Data Developers - Virdata, Internet of Things #virdata
Persistent connections
Broker
Big Data Developers - Virdata, Internet of Things #virdata
Real-time bidirectional communication
Big Data Developers - Virdata, Internet of Things #virdata
MQTTPub/Sub
Protocol Adaptor
Big Data Developers - Virdata, Internet of Things #virdata
MQTT: QoS levelsQoS 0: best effort
QoS 1: at least onceQoS 2: Exactly once
Protocol Adaptor
Big Data Developers - Virdata, Internet of Things #virdata
Kafka
Queues
Big Data Developers - Virdata, Internet of Things #virdata
Storm
Messaging
Big Data Developers - Virdata, Internet of Things #virdata
Message passing
Storm
Big Data Developers - Virdata, Internet of Things #virdata
Stream/Message partitioning, as well as grouping.
Storm
Big Data Developers - Virdata, Internet of Things #virdata
Storm
Nimbus Zookeeper
Supervisor
Worker Node
Exec
uter
Exec
uter
Exec
uter
Supervisor
Worker Node
Exec
uter
Exec
uter
Exec
uter
Supervisor
Worker Node
Exec
uter
Exec
uter
Exec
uter
Big Data Developers - Virdata, Internet of Things #virdata
Storm
Tuple
Stream
Field 1 | Field 2 | Field 3| Field 4 | Field 5
TUPLE
TUPLE TUPLE TUPLE TUPLE
STREAM
Big Data Developers - Virdata, Internet of Things #virdata
Storm
Spout
Bolt
SPOUT BOLT
T
T T T
T T T BOLTT T T
T T T
T T T BOLT API
Big Data Developers - Virdata, Internet of Things #virdata
Storm
Grouping
S
B
B
B
B
B
GROUPING GROUPING
Big Data Developers - Virdata, Internet of Things #virdata
Data Processing
Big Data Developers - Virdata, Internet of Things #virdata
Events used to manipulate the master data.
Events: Before
Big Data Developers - Virdata, Internet of Things #virdata
Today, events are the master data.
Events: After
Big Data Developers - Virdata, Internet of Things #virdata
Let’s store everything.
Data System
Big Data Developers - Virdata, Internet of Things #virdata
Data is Immutable.
Data System
Big Data Developers - Virdata, Internet of Things #virdata
Data is Time Based.
Data System
Big Data Developers - Virdata, Internet of Things #virdata
The data you query is often transformed, aggregated, ...Rarely used in its original form.
Query
Big Data Developers - Virdata, Internet of Things #virdata
Query = function ( all data )
Query
Big Data Developers - Virdata, Internet of Things #virdata
Functional computation, based on immutable inputs, is idempotent.
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Query: Number of cars living in each city
Car Location Timestamp
BMW 1 Antwerp 2008-10-11
Aston Martin Cologne 2010-01-23
BMW 2 Antwerp 2012-09-12
BMW 1 Cologne 2014-04-29
Location Count
Antwerp 1
Cologne 2
Big Data Developers - Virdata, Internet of Things #virdata
Query
All Data QueryPrecomputed View
Big Data Developers - Virdata, Internet of Things #virdata
Layered Architecture
Batch Layer
Speed Layer
Serving Layer
Big Data Developers - Virdata, Internet of Things #virdata
Layered Architecture
Spark C*
Incoming Data
*
Que
ry
Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Incoming Data
Spark C*
Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
The batch layer can calculate anything, given enough time...
Unrestrained computation.
Big Data Developers - Virdata, Internet of Things #virdata
Keep the data in its original format.The batch layer stores the data normalized, the generated views are often, if not always denormalized.
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Horizontally scalable.
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Stores a master copy of the data set
Batch Layer
… append only
Big Data Developers - Virdata, Internet of Things #virdata
High Latency.Let’s for now pretend the update latency doesn’t matter.
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
In-memory storage
Spark
Big Data Developers - Virdata, Internet of Things #virdata
Advanced DAG execution engineCyclic data, in memory computing.
Spark
Big Data Developers - Virdata, Internet of Things #virdata
Multilanguage support, interactive shellsScala, Java & Python
Spark
Big Data Developers - Virdata, Internet of Things #virdata
Write programs in terms of transformations on distributed datasets.
RDD, are collections of objects, stored in RAM or on disk.Are build through parallel transformations,
and are automatically rebuild on failure.
Spark
Big Data Developers - Virdata, Internet of Things #virdata
map
Spark: API
reduce
Big Data Developers - Virdata, Internet of Things #virdata
map
filter
groupBy
sort
union
join
leftOuterJoin
rightOuterJoin
count
fold
reduceByKey
groupByKey
Spark: API
reduce
cogroup
cross
zip
sample
take
first
partitionBy
mapWith
pipe
save
...
Big Data Developers - Virdata, Internet of Things #virdata
Spark Ecosystem
Spark
HDFSTachyon
Mesos
Spark Streaming
Shark / Spark SQL GraphX MLlib Mahout
MRv1
BlinkDB Velox
YARN
Big Data Developers - Virdata, Internet of Things #virdata
Every iteration produces the views from scratch.
Batch Layer
Big Data Developers - Virdata, Internet of Things #virdata
Batch View Databases
We need a (read-only) database to store those views.
Big Data Developers - Virdata, Internet of Things #virdata
Example: the automotive market
Real Time TrackingEngine Block PerformanceFleet Management
3rd Party API integrationIntegration with InformixBig Data Visualization
3rd Party Application CreationBlueMix Platform as a ServiceProcess Integrations
The Open Source Route Enterprise Integration Bringing Analytics to the Data
Big Data Developers - Virdata, Internet of Things #virdata
Batch Layer
Data absorbed into Batch Views
Time Now
We are not done yet…
Not yet absorbed.
Just a few hours of data.
Big Data Developers - Virdata, Internet of Things #virdata
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
Speed Layer
Spark C*
Incoming Data
C*
Big Data Developers - Virdata, Internet of Things #virdata
Stream processing.
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
Continuous computation.
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
Storing a limited window of data.Compensating for the last few hours of data.
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
All the complexity is isolated in the Speed Layer.If anything goes wrong, it’s auto-corrected.
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
You have a choice between:● Availability
○ Queries are eventually consistent
● Consistency○ Queries are consistent
CAP
Consistency
Partition Tolerance
Availability
Big Data Developers - Virdata, Internet of Things #virdata
Eventual accuracy
Some algorithms are hard to implement in real-time. For those cases we could estimate the results.
Big Data Developers - Virdata, Internet of Things #virdata
Speed Layer
Big Data Developers - Virdata, Internet of Things #virdata
Spark Streaming
Micro batches
Big Data Developers - Virdata, Internet of Things #virdata
Spark Streaming
Stateful
Big Data Developers - Virdata, Internet of Things #virdata
Spark Streaming
Exactly once
Big Data Developers - Virdata, Internet of Things #virdata
Incremental algorithms
Spark Streaming
Big Data Developers - Virdata, Internet of Things #virdata
IBM Infosphere Streams
Big Data Developers - Virdata, Internet of Things #virdata
Serving Layer
Big Data Developers - Virdata, Internet of Things #virdata
Serving Layer
Spark C*
Incoming Data
C*
Que
ry
Big Data Developers - Virdata, Internet of Things #virdata
Serving Layer
Random reads.
Big Data Developers - Virdata, Internet of Things #virdata
This layer queries the batch & real-time views and merges it.
Serving Layer
Big Data Developers - Virdata, Internet of Things #virdata
Lambda Architecture
Big Data Developers - Virdata, Internet of Things #virdata
Lambda Architecture
The Lambda Architecture can discard any view, batch and real-time, and just recreate everything from the
master data.
Big Data Developers - Virdata, Internet of Things #virdata
Mistakes are corrected via recomputation.Write bad data? Remove the data & recompute.
Bug in view generation? Just recompute the view.
Lambda Architecture
Big Data Developers - Virdata, Internet of Things #virdata
Using a new schema?No problem, keep your data, keep your input F, change your output.
Lambda Architecture
Big Data Developers - Virdata, Internet of Things #virdata
Data storage is highly optimized.
Lambda Architecture
Big Data Developers - Virdata, Internet of Things #virdata
Control Plane
Big Data Developers - Virdata, Internet of Things #virdata
Cloud Agnostic
Control Plane
Big Data Developers - Virdata, Internet of Things #virdata
IBM SoftLayer
Experiences & Observations1. Smooth migration from SCE 2.2 to SoftLayer in 1 months time including:
■ Development of SoftLayer specific FOG abstraction layer expansion to accommodate Virdata’s Devops tooling (CHEF)
■ Complete on-boarding of the Virdata Platform■ Complete launch of simulation and emulation clusters■ Very exhaustive and complete API
2. Very constructive and professional support throughout the complete on-boarding process
3. Availability of bare metal seen as a differentiator
Big Data Developers - Virdata, Internet of Things #virdata
Cluster Management & Orchestration
Control Plane
RGOSSIP
Big Data Developers - Virdata, Internet of Things #virdata
Monitoring and Logging
Control Plane
Big Data Developers - Virdata, Internet of Things #virdata
Wrap-up
Big Data Developers - Virdata, Internet of Things #virdata
Virdata - SERVICE ARCHITECTURE
millions of simultaneous persistent bi-directional connectionsmillions of messages per second
Real-time Complex Event ProcessingDistributed Pub/Sub Messaging
Historical Data Archiving Pre-computed Data In-Memory real-time Data
REST APILaunch Queries - Launch Jobs
INTEGRATIONCUSTOMIZATION
NOC, OPERATIONS, MGMT REPORTS, TRENDSANALYTICS
Big Data Developers - Virdata, Internet of Things #virdata
Questions?@virdata_iot | #virdata
@nathan_gs
Big Data Developers - Virdata, Internet of Things #virdata
Acknowledgements
I would like to thank Nathan Marz for writing a very insightful book, where the idea of the Lambda Architecture comes from.
Lambda: Big Data - Nathan Marz published at Manning
Lambda, Storm: A real-time architecture using Hadoop & Storm - Nathan Bijnens & Geert Van Landeghem at FOSDEM 2013
Spark: Apache Spark website
Spark: Apache Spark - the light at the end of the tunnel? - Michael Hausenblas, MapR at Data Science Day Berlin 2014
Big Data Developers - Virdata, Internet of Things #virdata
Thank youvirdata.com | +1 (937) 569 4220 | [email protected]
#virdata | @virdata_iot@nathan_gs | [email protected]