webinar back to basics - sessione 5 - reportistica
DESCRIPTION
Appuntamento numero 5 del webinar Back to Basics. In questa puntata parleremo di Reportistica andando ad analizzare quali sono le migliori strategie di schema design per velocizzare l'esecuzione dei report.TRANSCRIPT
![Page 1: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/1.jpg)
Serie” Sviluppo di un’applicazione”Back to BasicsReportistica e Analitica
Senior Solutions Architect, MongoDB Inc.
Massimo Brignoli
#MongoDBBasics
![Page 2: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/2.jpg)
Agenda
• Riassunto della scorsa sessione
• Opzioni di Reportistica
• Map Reduce
• Introduzione all’Aggregation Framework
– Aggregation explain
• I Report dell’applicazione mycms
• Geospatial con Aggregation Framework
• Text Search con Aggregation Framework
![Page 3: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/3.jpg)
• Virtual Genius Bar
– Use the chat to post
questions
– EMEA Solution
Architecture / Support
team are on hand
– Make use of them
during the sessions!!!
Q & A
![Page 4: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/4.jpg)
Riassunto della scorsasessione…
![Page 5: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/5.jpg)
Indicizzazione
• Indici
• Multikey, compound,
‘dot.notation’
• Covered, sorting
• Text, GeoSpatial
• Btrees
>db.articles.ensureIndex( {
author : 1, tags : 1 } )
>db.user.find({user:"danr"},
{_id:0, password:1})
>db.articles.ensureIndex( {
location: “2dsphere” } )
>>db.articles.ensureIndex(
{ "$**" : “text”,
name : “TextIndex”} )
Opzioni db.col.ensureIndex({ key : type})
![Page 6: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/6.jpg)
Performance / Efficienza degli Indici
• Controllate i piani
degli indici
• Query lente
• Rapporto n /nscanned
• Quali indici sono usati
operatori .explain() , db profiler> db.articles.find(
{author:'Dan Roberts’})
.sort({date:-1}
).explain()
> db.setProfilingLevel(1,
100)
{ "was" : 0, "slowms" : 100,
"ok" : 1 }
> db.system.profile.find()
.pretty()
![Page 7: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/7.jpg)
Opzioni di Reportistica
![Page 8: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/8.jpg)
Opzioni di Accesso ai Dati
• Query Language
– Utilizzate documenti pre aggregati
• Aggregation Framework
– Calcolate nuovi valori dai dati che avete
– Ad esempio: visite medie, numero di commenti
• MapReduce
– Implementazione interna basata su Javascript
– Esterna con Hadoop, utilizzando il connettore di
MongoDB
• Un Insieme delle 3 opzioni
![Page 9: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/9.jpg)
Risultati istantanei– Semplici da un punto di vista delle query
– Usando la collection delle interazioni
Report Pre Aggregati
{
‘_id’ : ObjectId(..),
‘article_id’ : ObjectId(..),
‘section’ : ‘schema’,
‘date’ : ISODate(..),
‘daily’: { ‘views’ : 45,
‘comments’ : 150 }
‘hours’ : {
0 : { ‘views’ : 10 },
1 : { ‘views’ : 2 },
…
23 : { ‘views’ : 14,
‘comments’ : 10 }
}
}
> db.interactions.find(
{"article_id" : ObjectId(”…..")},
{_id:0, hourly:1}
)
![Page 10: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/10.jpg)
Usate il risultato della query per visualizzarlodirettamente nell’applicazione
– Create una nuova REST API
– D3.js library o similare nella UI
Report Pre Aggregati
{
"hourly" : {
"0" : {
"view" : 1
},
"1" : {
"view" : 1
},
……
"22" : {
"view" : 5
},
"23" : {
"view" : 3
}
}
}
![Page 11: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/11.jpg)
Map Reduce
![Page 12: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/12.jpg)
Map Reduce– MongoDB – JavaScript
Map Reduce Incrementale
Map Reduce
//Esempio di Map Reduce
> db.articles.mapReduce(
function() { emit(this.author, this.comment_count); },
function(key, values) { return Array.sum (values) },
{
query : {},
out: { merge: "comment_count" }
}
)
Output
{ "_id" : "Dan Roberts", "value" : 6 }
{ "_id" : "Jim Duffy", "value" : 1 }
{ "_id" : "Kunal Taneja", "value" : 2 }
{ "_id" : "Paul Done", "value" : 2 }
![Page 13: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/13.jpg)
MongoDB – Hadoop Connector
Integrazione con Hadoop
Primary
Secondary
Secondary
HDFS
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
HDFS HDFS HDFS
MapReduce MapReduce MapReduce MapReduce
MongoS MongoSMongoS
Application ApplicationApplication
Application
Dash Boards /
Reporting
1) Data Flow,
Input /
Output via
Application
Tier
![Page 14: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/14.jpg)
Aggregation Framework
![Page 15: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/15.jpg)
Pipeline Multi Fase– Come una pipe unix
• “ps -ef | grep mongod”
– Aggrega i dati,
– Trasforma I documenti
– Implementato nel core server
Aggregation Framework
//Find out which are the most popular tags…
db.articles.aggregate([
{ $unwind : "$tags" },
{ $group : { _id : "$tags" , number : { $sum : 1 } } },
{ $sort : { number : -1 } }
])
Output
{ "_id" : "mongodb", "number" : 6 }
{ "_id" : "nosql", "number" : 3 }
{ "_id" : "database", "number" : 1 }
{ "_id" : "aggregation", "number" : 1 }
{ "_id" : "node", "number" : 1 }
![Page 16: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/16.jpg)
Nella Nostra Applicazione mycms
//Our new python example
@app.route('/cms/api/v1.0/tag_counts', methods=['GET'])
def tag_counts():
pipeline = [ { "$unwind" : "$tags" },
{ "$group" : { "_id" : "$tags" , "number" : { "$sum" : 1 } }
},
{ "$sort" : { "number" : -1 } }]
cur = db['articles'].aggregate(pipeline, cursor={})
# Check everything ok
if not cur:
abort(400)
# iterate the cursor and add docs to a dict
tags = [tag for tag in cur]
return jsonify({'tags' : json.dumps(tags, default=json_util.default)})
![Page 17: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/17.jpg)
Pipeline and Expression operators
Operatori di Aggregazione
Pipeline
$match
$sort
$limit
$skip
$project
$unwind
$group
$geoNear
$text
$search
Tip: Other operators for date, time, boolean and string manipulation
Expression
$addToSet
$first
$last
$max
$min
$avg
$push
$sum
Arithmetic
$add
$divide
$mod
$multiply
$subtract
Conditional
$cond
$ifNull
Variables
$let
$map
![Page 18: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/18.jpg)
Report nell’Applicazione
Di quali report e analisi abbiamo bisogno nella nostra
applicazione?
– Tag più popolari
– Articoli più popolari
– Luoghi più popolari – integrazione con geospatial
– Visite media per ora e per giorno
![Page 19: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/19.jpg)
Tag Populari
• “Unwind” ogni array ‘tags’
• Raggruppateli e contateli, quindi ordinateli
• Scrivere il risultato in una nuova collection
– Fate le query dalla nuova collection, cosi’ non avete
bisogno di calcolarla tutte le volte
db.articles.aggregate([
{ $unwind : "$tags" },
{ $group : { _id : "$tags" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $out : "tags"}
])
![Page 20: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/20.jpg)
Articoli Popolari
• I 5 top articoli in base alle visite medie
– Usate l’operatore $avg
– Usate $match per restringere I dati letti
• Usatelo con gli operatori$gt e $lt
db.interactions.aggregate([
{
{$match : { date :
{ $gt : ISODate("2014-02-20T00:00:00.000Z")}}},
{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},
{$sort : { a : -1}},
{$limit : 5}
]);
![Page 21: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/21.jpg)
Aggregation Framework Explain
• Usate Explain per assicurarvi di fare un uso
efficiente degli indici
db.interactions.aggregate([
{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},
{$sort : { a : -1}},
{$limit : 5}
],
{explain : true}
);
![Page 22: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/22.jpg)
Explain output…
{
"stages" : [
{
"$cursor" : { "query" : … }, "fields" : { … },
"plan" : {
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false,
"allPlans" : [
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"scanAndOrder" : false
}
]
}
}
},
…
"ok" : 1
}
![Page 23: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/23.jpg)
Aggregazione Geo Spatial & Text Search
![Page 24: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/24.jpg)
Text Search
• L’operatore $text con l’ aggregation framework
– Tutti gli articoli con la parola “MongoDB”
– Raggruppati per autore, ordinati per numero commenti
db.articles.aggregate([
{ $match: { $text: { $search: "mongodb" } } },
{ $group: { _id: "$author", comments:
{ $sum: "$comment_count" } } }
{$sort : {comments: -1}},
])
![Page 25: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/25.jpg)
Utilizzo con Geo spatial
• L’operatore $geoNear con l’aggregation framework
– Usate l’operatore geo nella fase di $match
– Raggruppate per autore e numero di articoli
db.articles.aggregate([
{ $match: { location: { $geoNear :
{ $geometry :
{ type: "Point" ,coordinates : [-0.128, 51.507] } },
$maxDistance :5000}
}
},
{ $group: { _id: "$author", articleCount: { $sum: 1 } } }
])
![Page 26: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/26.jpg)
Riassunto
![Page 27: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/27.jpg)
Riassunto
• Per aggregare i dati:
– Map Reduce
– Hadoop
– Report Pre-Aggregati
– Aggregation Framework
• Aggiustate con il piano di Explain
• Compute on the fly or Compute and store
• Geospatial
• Text Search
![Page 28: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/28.jpg)
Prossima Sessione– 20 Maggio
– Gestire la vostra applicazione– Scalabilità
– Alta disponibilità
– Come preparare la produzione
– DImensionamento
![Page 29: Webinar Back To Basics - Sessione 5 - reportistica](https://reader034.vdocuments.site/reader034/viewer/2022042715/55945a5b1a28ab5e728b4639/html5/thumbnails/29.jpg)