codemotion milano 2014 - mongodb and the internet of things
DESCRIPTION
Time series are a classical example about the flexibility of the document approach. In this presentation you will see how to manipulate the documents to create a schema optimized for the time-series.TRANSCRIPT
Massimo Brignoli
Senior Solutions Architect
MongoDB Inc.
@massimobrignoli
MongoDB and
The Internet of Things
The Problem
• If you're thinking about designing an ideal data
structure for your Internet of Things application, then
here's what you should do:
don't do it.
The Problem
• The Internet of Things requires a huge deal of
flexibility.
Why?
• Because there are billions of heterogeneus objects
that will begin interacting with each other in ways we
can't predict.
• The structured and rigid tables offered by traditional
databases won't help us because they require a pre-
defined set of properties and tables, which again,
we can't predict.
The Problem
• Let's say we want to measure water levels in a large
number of wells. A simplified data architecture for
this application would look like this:
The Problem
• This looks just fine and should work perfectly using
a relational database. But then, 2 years after the
system has been up and running, someone has an
idea:
"Hey, now that we bought these new Internet-enabled
diesel generators to power the water pumps, let's see
their live data!”
The Problem
• To make this change, we would have to add a new
table called "Power Plants" and a new column to the
table "Wells”:
The Solution
• A great way of handling IoT data is the document-
oriented approach
• Instead of fixed tables, columns, and rows, you have
documents describing each object.
MongoDB
Document
DatabaseOpen-
Source
General
Purpose
Documents Are Core
Relational MongoDB{
first_name: "Paul",
surname: "Miller",
city: "London",
location: [45.123,47.232],
cars: [
{ model: "Bentley",
year: 1973,
value: 100000, … },
{ model: "Rolls Royce",
year: 1965,
value: 330000, … }
]
}
Modeling time series data
in MongoDB
Rexroth NEXO Cordless Nutrunner
Time series schema design goal
• Store event data
• Support Analytical Queries
• Find best compromise of:
- Memory utilization
- Write performance
- Read/Analytical Query Performance
• Accomplish with realistic amount of hardware
Modeling time series data
• Document per event
• Document per minute (average)
• Document per minute (second)
• Document per hour
Document per event
• Relational-centric approach
• Insert-driven workload
{
deviceId: "Test123",
timestamp: ISODate("2014-07-03T22:07:38.000Z"),
temperature: 21
}
Document per minute (average)
• Pre-aggregate to compute average per minutemore easily
• Update-driven workload
• Resolution at the minute level
{
deviceId: "Test123",
timestamp: ISODate("2014-07-03T22:07:00.000Z"),
temperature_num: 18,
temperature_sum: 357
}
Document per minute (by second)
• Store per-second data at the minute level
• Update-driven workload
• Pre-allocate structure to avoid document moves
{
deviceId: "Test123",
timestamp: ISODate("2014-07-03T22:07:00.000Z"),
temperature: { 0: 18, 1: 18, …, 58: 21, 59: 21 }
}
Document per hour (by second)
• Store per-second data at the hourly level
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 3599 steps
{
deviceId: "Test123",
timestamp: ISODate("2014-07-03T22:00:00.000Z"),
temperature: { 0: 18, 1: 18, …, 3598: 20, 3599: 20 }
}
Document per hour (by second)
• Store per-second data at the hourly level with nesting
• Update-driven workload
• Pre-allocate structure to avoid document moves
• Updating last second requires 59 + 59 steps
{
deviceId: "Test123",
timestamp: ISODate("2014-07-03T22:00:00.000Z"),
temperature: {
0: { 0: 18, …, 59: 18 },
…,
59: { 0: 21, …, 59: 20 }
}
}
Rexroth NEXO schema
{
_id: ObjectID("52ecf3d6bf1e623a52000001"),
assetId: "NEXO 109",
hour: ISODate("2014-07-03T22:00:00.000Z"),
status: "Online",
type: "Nutrunner",
serialNo : "100-210-ABC",
ip: "127.0.0.1",
positions: {
0: {
0: { x: "10", y:"40", zone: "itc-1", accuracy: "20” },
…,
59: { x: "15", y: "30", zone: "itc-1", accuracy: "25” }
},
…,
59: {
0: { x: "22", y: "27", zone: "itc-1", accuracy: "22” },
…,
59: { x: "18", y: "23", zone: "itc-1", accuracy: "24” }
}
}
}
Demo
How to scale
Scaling Up
Scaling Out
First Edition (1771)
3 Volumes
Fifteenth Edition (2010)
32 Volumes
Shards and Shard Keys
Shard
Shard key
range
Why is MongoDB a good fit for IoT?
• IoT processes are real-time
• Relational technologies can simply not compete
on cost, performance, scalability, and
manageability
• IoT data can come in any format, structured or
unstructured, ranging from text and numbers to
audio, picture and video
• Time series data is a natural fit
• IoT applications often require geographically
distributed systems
Thank you!