lab pratico per la progettazione di soluzioni mongodb in ambito internet of things e big data - by...
TRANSCRIPT
INNOVATION COMPANY
Roberto Contiero [email protected] @contieroroberto
“Learn quickly and Think Well!”Stefano Dindo
[email protected] @stefanodindo
“We are an Innovation Company. We design and develop cutting edge software to drive
our customers’ digital transformation, through Agile Methodologies and continuous
delivery”
WE HELP OUR CUSTOMERS TO
DESIGN IDEA
CREATE PRODUCTS
EXTRACT VALUE FROM DATA
We get powerful ideas to market fast
We design and develop innovative and better software solutions
We collect and analyze data to help your decisions
Discover Experiment Delivery
MVP
Continuous Design &
Integration
OUR APPROACH
MVP
MVP
MVP
??? ?? ?
?
End UsersIdeaCustomer & zero12 collaboration meeting
( CanvUX )
Customer feedback
“Internet of Things is a neologism referring to the extension of the Internet to the world of objects
and concrete places.”
2020 IoT Market Share
4 Billion
Connected People
$4 Trillion
Business Opportunity
25+ Billion
Integrated systems connected to the Web
50 Trillion 50GBs of data
Fonte: IDC
Broker MQTT
Authentication API
Business Logic API
PredictiveEngine
API
Application Frontend
MongoDB
IoT Architecture:
Users
Things
Predictive Algorithm
Data Operation
User Interaction
Authentication
What’s a document?
{ "name": "John", "surname": "Doe", "email": "[email protected]", "cell": 3281432896, "sport": ["swimming", "football"]}
{ _id: ‘ObjectId(“4b2b9…”)’, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}
Comparison Relational vs Document
Document
{ _id: ‘ObjectId(“4b2b9…”)’, first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}
Example Data Type
Null
Boolean
Number
String
Date
Array
Embedded documents
{ x: null }
{ x: true }
{ x: 3.14 } { x: 3 }
{ x: “zero12” }
{ x: new Date() }
{ x: [“a”,”b”, “c”] }
{ x: {y: “a” } }
Aggregation
op1 op2 opn……{ "name": "John", "surname": "Doe", "email": "[email protected]" }
Pipeline stages Documents
MongoDB 3.2
MongoDB Compass 3 Storage Engine• WiredTiger • NMAPv1.1 • In-Memory ( beta )
Data Encryption
Business Intelligence Connectors
$lookup operator
Document Validation
Definition
Set of values of a variable detected at different timestamps.
Timet0 t1 t2 t3
f ( t0 )
f ( t1 )
f ( t2 )
f ( t3 )
Time Series Data is Everywhere
1. Financial markets pricing
2. Sensors (temperature, pressure, proximity)
3. Industrial Fleets (Location, velocity, operational)
4. Social Networks (status update)
5. System (server logs, application logs)
6. Mobile devices (calls, texts)
Time Series Data at a Higher Level
1. Widely applicable data model
2. Various schema and modeling options
3. Application requirements drive schema design
Designing for writing and reading
1. One document per event
2. One document per minute (average)
3. One document per minute (second)
4. One document per hour
One document per event
{ server: "server1",load: 92,ts: ISODate("2014-10-16T22:07:38.000-0500")
}
1. Relational-centric approach
2. Insert-driven workload
3. Aggregations computed at application-level
One document per minute (average)
{ server: "server1",load_num: 92, load_sum: 4500,ts: ISODate("2014-10-16T22:07:00.000-0500")
}
1. Pre-aggregation to compute average per minute more easily
2. Update-driven workload
3. Minute-level resolution
One document per minute ( second )
{ server: "server1",load: { 0: 15, 1: 20, ..., 58: 45, 59: 40 }ts: ISODate("2014-10-16T22:07:00.000-0500")
}
1. Store per second data at minute level
2. Update-driven workload
3. Pre-allocate structure to avoid document moves
One document per hour ( by second )
{server: "server1",load: { 0: 15, 1: 20, ..., 3598: 45, 3599: 40 } ts: ISODate("2014-10-16T22:00:00.000-0500") }
1. Store per second data at hourly level
2. Update driven workload
3. Pre-allocate structure to avoid document moves
4. Updating the last second requires 3599 steps
One document per hour ( by second ){
server: "server1", load: {
0: {0: 15, ..., 59: 45}, .... 59: {0: 25, ..., 59: 75} }
ts: ISODate("2014-10-16T22:00:00.000-0500") }
1. Store per second data at hourly level with nesting
2. Update-driven workload
3. Pre-allocate structure to avoid document moves
4. Updating the last second requires 59+59 steps
Writing operation analysis
1. Example: data generated every second
2. Capturing data per minute requires:- One document per event: 60 writes
- One document per minute: 1 write, 59 updates
3. Transition from “insert-driven” to “update-driven”
- Individual writes are smaller
- Performance and concurrency benefits
1. Example: data generated every second
2. Reading data for a single hours requires:- One document per event: 3600 reads
- One document per minute: 60 reads
3. Read performance is greatly improved:- Fewer disk seeks
- Optimization with tuned block sizes and read ahead
Read operation analysis