mongodb and spring data - meetupfiles.meetup.com/4247302/ric mongo meetup.2014-02-20.pdf2014/02/20...
TRANSCRIPT
Who Am I
• Solutions Architect with CloudBees (www.cloudbees.com) • Blog: www.techsand.com • LinkedIn: www.linkedin.com/in/iamjimmyray • I spend my time with Java, Jenkins, CI/CD, Cloud Computing
and … MongoDB
© 2014 CloudBees, Inc. All Rights Reserved
2
Tonight’s Agenda
• Quick introduction to MongoDB and related tools • Introduction to Spring Data
– Configuration (Spring and MongoDB) – Templates and Repositories
• Metadata Mapping • Finder Methods • Custom Repos
© 2014 CloudBees, Inc. All Rights Reserved
3
Tonight’s Agenda (continued)
• Spring Data – Fluent API – Aggregation Functions
• Indexes • GridFS • MongoDB in the cloud
© 2014 CloudBees, Inc. All Rights Reserved
4
Why MongoDB
• Multiple platforms (Linux, Win, Solaris, Apple) • Language Drivers (C, C++, C#, Java, Erlang, JS, Ruby, etc.) • Explicitly de-normalized (schema-less) • Document-centric • Easy for developers and admins to get started.
– Because schema-less approach is more flexible, MongoDB is intrinsically ready for iterative (Agile) projects
© 2014 CloudBees, Inc. All Rights Reserved
5
Why MongoDB (continued)
• Ease of scalability (replica sets), auto-sharding • Manages complex and polymorphic data • Great for CDN and document-based SOA solutions • Great for location-based and geospatial data solutions • Fast
– (low latency) – Fast access to data – Low CPU overhead
© 2014 CloudBees, Inc. All Rights Reserved
6
Schema-less, Schema-free, Flexible-schema
• It means that MongoDB does not enforce a column data type on the fields within your document, nor does it confine your document to specific columns defined in a table definition.
• The schema is actually controlled via the application API layers and is implied by the “shape” (content) of your documents.
• This means that different documents in the same collection can have different fields.
• Only the _id field is mandatory in all documents.
© 2014 CloudBees, Inc. All Rights Reserved
7
Is MongoDB Really Schema-free • Technically no. • There is the System Catalog of system collections
– <database>.system.namespaces – <database>.system.indexes – <database>.system.profile – <database>.system.users
• And…because of the nature of how docs are stored in collections (JSON/BSON), field labels are stored in every doc*
© 2014 CloudBees, Inc. All Rights Reserved
8
MongoDB Schema Tips • MongoDB has ObjectID, can be placed in _id
– If you have a natural unique ID, use that instead • De-normalize when needed
– For example: Compound indexes cannot contain parallel arrays • Create indexes that cover queries
– Mongo only uses one index at a time for a query – Watch out for sorts – What out for field sequence in compound indexes.
• Reduce size of collections (watch out for label sizes)
© 2014 CloudBees, Inc. All Rights Reserved
9
MongoDB Data Modeling
• Understand your concerns • Document embedding (fastest and atomic) vs. references
(normalized) • Atomicity – Document Level Only • Data Durability
© 2014 CloudBees, Inc. All Rights Reserved
10
Why not MongoDB
• High speed and deterministic transactions (FIN): • Where SQL or joins are absolutely required • If your organization lacks the controls and rigor to place
schema and document definition at the application level without compromising data integrity
© 2014 CloudBees, Inc. All Rights Reserved
11
My Favorite MongoDB Design Features • Fast Querying (atomic operations, embedded data) • In place updates (physical writes lag in-memory changes) • Full Index support (including compound indexes) • Replication/High Availability (see CAP Theorem) • Auto Sharding (range-based portioning, based on shard key) for
scalability • BSON • GridFS
© 2014 CloudBees, Inc. All Rights Reserved
12
In-place Updates
• Physical disk writes lag in-memory changes. • MongoDB uses an adaptive allocation algorithm for storing
its objects.
© 2014 CloudBees, Inc. All Rights Reserved
13
“Keys” to Sharding
• Need to choose the right key – Easily divisible (“splittable”– see cardinality) so that Mongo can
distribute data among shards
• Enable distributed write operations between cluster nodes – Prevents single-shard bottle-necking
© 2014 CloudBees, Inc. All Rights Reserved
14
Cardinality • Higher cardinality is preferred (usually, except for range queries)
– Example: Address data components • State – Low Cardinality • Zip Code – Potentially low or high, depending population • Phone Number – High Cardinality
• High cardinality is a good start for sharding, but.. – …it does not guarantee query isolation – …it does not guarantee write scaling
• Consider computed keys (MD5, etc.)
© 2014 CloudBees, Inc. All Rights Reserved
15
Container Model (RDBMS vs. MongoDB)
• RDBMS: Servers > Databases > Schemas > Tables > Rows – Joins, Group By, ACID
• MongoDB: Servers > Databases > Collections > Documents – No Joins – Instead: Db References (Linking) and Nested Documents
(Embedding)
© 2014 CloudBees, Inc. All Rights Reserved
16
CAP Theorem
• Consistency – all nodes see the same data at the same time
• Availability – all requests receive responses, guaranteed • Partition Tolerance (network partition tolerance) • The theorem states that you can never have all three, so
you plan for two and make the best of the third.
© 2014 CloudBees, Inc. All Rights Reserved
17
MongoDB Collections
• Schema-less • Can have up to 24000
– 100 nesting levels (version 2.2)
• Are namespaces, like indexes • Can be “Capped”
– Limited in max size with rotating overwrites of oldest entries
• TTL Collections
© 2014 CloudBees, Inc. All Rights Reserved
19
MongoDB Documents
• JSON (what you see) – Actually BSON (Internal - Binary JSON - http://bsonspec.org/)
• Elements are name/value pairs • 16 MB maximum size • What you see is what is stored
– No default fields (columns)
© 2014 CloudBees, Inc. All Rights Reserved
20
JSON Syntax
• Curly braces are used for documents/objects – {…} • Square brackets are used for arrays – […] • Colons are used to link keys to values – key:value • Commas are used to separate multiple objects or elements
or key/value pairs – {ke1:value1, key2:value2…}
© 2014 CloudBees, Inc. All Rights Reserved
21
BSON
• Adds data types that JSON did not support • Optimized for performance • Adds compression • http://bsonspec.org/#/specification
© 2014 CloudBees, Inc. All Rights Reserved
22
MongoDB Shell
• Interactive JavaScript shell to mongod • Command-line interface to MongoDB (sort of like SQL*Plus
for Oracle) • JavaScript Interpreter, behaves like a read-eval-print loop • Can be run without database connection (use –nodb) • Uses a fluent API with lazy cursor evaluation
– db.locations.find({state:'MN'},{city:1,state:1,_id:0}).sort({city:-1}).limit(5).toArray();
© 2014 CloudBees, Inc. All Rights Reserved
23
MongoDB and Mac OS X
• Installed/upgraded with HomeBrew – brew install/upgrade mongodb – http://docs.mongodb.org/manual/tutorial/install-mongodb-on-os-
x/
• Run with shell command: exec mongod --port 29009 --rest
• Run MongoDB Shell: mongo –port 29009
© 2014 CloudBees, Inc. All Rights Reserved
24
MongoDB Tools for Mac OS X
• Install and run Genghis PHP application on Apache • Install MongoHub for Mac
– https://code.google.com/p/mongohub/
• Shutdown server from inside Mongo Shell – use admin – db.shutdownServer()
© 2014 CloudBees, Inc. All Rights Reserved
25
Other MongoDB Tools
• Edda – Log Visualizer – Requires Python
• MongoDB Monitoring Service – Cloud (or on premise) based service that monitors MongoDB
instances via configured agents. – Requires Python
© 2014 CloudBees, Inc. All Rights Reserved
26
MongoImport
• Syntax: mongoimport --stopOnError --port 29009 --db geo --collection geos --file C:\UserData\Docs\JUGs\TwinCities\zips.json
• Don’t use for backup or restore in production – Use mongodump and mongorestore
© 2014 CloudBees, Inc. All Rights Reserved
27
MongoDB Web Admin Interface
• Enabled with REST switch in startup config. • Port location is main mongod port + 1000 • Quick stats viewer • Run commands
© 2014 CloudBees, Inc. All Rights Reserved
28
Project Configuration
• MongoDB 2.4.9* • Java 1.6 • Maven 3 • Jackson JSON Processor 1.9.4 • Spring Framework 3.2.1.RELEASE • Spring Data 1.3.2.RELEASE • MongoDB Java API 2.11.3
© 2014 CloudBees, Inc.
All Rights Reserved 29
Project Code Location
• Git Hub – https://github.com/jimmyraywv/mongodb-spring-data
© 2014 CloudBees, Inc. All Rights Reserved
30
Spring Data
• Large Spring project with many subprojects – Category: Document Stores, Subproject MongoDB
• “…aims to provide a familiar and consistent Spring-based programming model…”
• Like other Spring projects, Data is POJO Oriented – BEANS
• Provides access to high-level and low-level APIs for managing MongoDB documents.
© 2014 CloudBees, Inc. All Rights Reserved
31
Spring Data (continued)
• Provides annotation-driven meta-mapping • Will allow you into bowels of API if you choose to hang
out there
© 2014 CloudBees, Inc. All Rights Reserved
32
Spring Framework Configuration Profiles
• Uses a system level property to choose the profile defined in the Spring Configuration – -Dspring.profiles.active=local – … <beans profile="local"> …
© 2014 CloudBees, Inc. All Rights Reserved
33
Spring Data Templates
• Main purpose is resource allocation and exception translation
• Implements MongoOperations (mongoOps) interface – mongoOps defines the basic set of MongoDB operations for the
Spring Data API.
• Wraps the lower-level MongoDB API – Provides access to the lower-level API – Provides foundation for upper-level Repository API.
© 2014 CloudBees, Inc. All Rights Reserved
35
Spring Data Repositories
• Convenience for data access • Spring does ALL the work (unless you customize) • Convention over configuration • Uses a method-naming convention that Spring interprets
during implementation • Hides complexities of Spring Data templates and
underlying API
© 2014 CloudBees, Inc. All Rights Reserved
36
Spring Data Repositories (continued)
• Builds implementation for you based on interface design – Implementation is built during Spring container load.
• Is typed (parameterized via generics) to the model objects you want to store. – When extending MongoRepository
• Otherwise uses @RepositoryDefinition annotation
© 2014 CloudBees, Inc. All Rights Reserved
37
Custom Repositories
• Hooks into Spring Data bean type hierarchy that allows you to add functionality to repositories
• Important: You must write the implementation for part of this custom repository
• And…your Spring Data repository interface must extend this custom interface, along with the appropriate Spring Data repository
© 2014 CloudBees, Inc. All Rights Reserved
38
Spring Data Metadata Mapping
• Annotation-driven mapping of model object fields to Spring Data elements in specific database dialect.
• Maps Java POJOs to MongoDB documents – Controls how POJO fields are mapped to MongoDB document fields – Maps document index settings – Maps Java types to MongoDB collections
• Handy when you consider that MongoDB field labels are stored in each document.*
© 2014 CloudBees, Inc. All Rights Reserved
40
Bulk Inserts
• All things being equal, bulk inserts in MongoDB are faster than inserting one record at a time.
• As of MongoDB 1.8, the max BSON size of a batch insert was increased from 4MB to 16MB – You can check this with the shell command: db.isMaster() or
mongo.getMaxBsonObjectSize() in the Java API
• Batch sizes can be tuned for performance
© 2014 CloudBees, Inc.
All Rights Reserved 41
Transformers
• Does the “heavy lifting” by preparing MongoDB objects for insertion
• Transforms Java domain objects into MongoDB DBObjects.
© 2014 CloudBees, Inc. All Rights Reserved
42
Converters
• For read and write, overrides default mapping of Java objects to MongoDB documents
• Implements the Spring…Converter interface • Registered with MongoDB configuration in Spring context • Handy when integrating MongoDB to existing application. • Can be used to manipulate fields inline with reads/writes
© 2014 CloudBees, Inc. All Rights Reserved
43
MongoDB DBRef
• Optional • Instead of nesting documents • Have to save the “referenced” document first, so that DBRef
exists before adding it to the “parent” document • Know the tradeoffs
© 2014 CloudBees, Inc. All Rights Reserved
44
MongoDB Queries
• In mongos using JS: db.collection.find( <query>, <projection> )
• Use the projection to limit fields returned, and therefore network traffic
• Examples: – db["employees"].find({"title":"Senior Engineer"}) – db.employees.find({"title":"Senior Engineer"},{"_id":0}) – db.employees.find({"title":"Senior Engineer"},{"_id":0,"title":1})
© 2014 CloudBees, Inc. All Rights Reserved
45
MongoDB Queries (continued)
• In Java use DBObject or Spring Data Query for mapping queries.
• You can include and exclude fields in the projection argument. – You either include (1) or exclude (0) – You can not include and exclude in the same projection, except for
the “_id” field.
© 2014 CloudBees, Inc.
All Rights Reserved 46
DBObject and BasicDBObject
• For the Mongo Java driver, DBObject is the Interface, BasicDBObject is the class – This is essentially a map with additional Mongo functionality – See partial objects when up-serting
• DBObject is used to build commands, queries, projections, and documents
• DBObjects are used to build out the JS queries that would normally run in the shell. Each {…} is a potential DBObject.
© 2014 CloudBees, Inc. All Rights Reserved
47
MongoDB Advanced Queries
• http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all
• May use Mongo Java driver and BasicDBObjectBuilder • Spring Data fluent API is much easier • Demo - $in, $nin, $gt ($gte), $lt ($lte), $all, ranges
© 2014 CloudBees, Inc. All Rights Reserved
48
Logical Queries Using and/or • Comma denotes “and”, and you can use $and
– db.employees.find({"title":"Senior Engineer","lastName":"Bashian"},{"_id":0,"title":1})
• For Or, you must use the $or operator – db.employees.find({$or:[{"lastName":"Bashian"},{"lastName":"Baik"}]},
{"_id":0,"title":1,"lastName":1}) • In Java, use DBObjects and ArrayLists…
– Nest or/and ArrayLists for compound queries – Or use the Spring Data Query and Criteria classes with “or” criteria
• Also see QueryBuilder class
© 2014 CloudBees, Inc. All Rights Reserved
49
Array Queries db.misc.insert({users:["jimmy", "griffin"]}) db.misc.find({users:"griffin"}) { "_id" : ObjectId("518a5b7e18aa54b5cf8fc333"), "users" : [ "jimmy", "griffin" ]} db.misc.find({users:{$elemMatch:{name:"jimmy",gender:"male"}}}) { "_id" : ObjectId("518a599818aa54b5cf8fc332"), "users" : [ { "name" : "jimmy", "gender" : "male" }, { "name" : "griffin", "gender": "male" } ] }
© 2014 CloudBees, Inc.
All Rights Reserved 50
Array Updates db.misc.insert({"users":[{"name":"jimmy","gender":"male"},{"name":"griffin","gender":"male"}]}) db.misc.update({"_id":ObjectId("518276054e094734807395b6"),"users.name":"jimmy"}, {$set:{"users.$.name":"george"}}) db.employees.update({products:"Softball"}, {$pull:{products:"Softball" }},false,true) db.employees.find({products:"Softball"}).count() 0
© 2014 CloudBees, Inc.
All Rights Reserved 51
Does Field Exist
• $exists db.locations.find({user:{$exists:false}}) • Type “it” for more – iterates over documents - paging
© 2014 CloudBees, Inc. All Rights Reserved
52
RegEx Queries
• In JS: db.employees.find({ "title" : { "$regex" : "seNior EngIneer" , "$options" : "i"}})
• In Java use java.util.regex.Pattern
© 2014 CloudBees, Inc. All Rights Reserved
53
Optimizing Queries
• Use $hint or hint() in JS to tell MongoDB to use specific index
• Use hint() in Java API with fluent API • Use $explain or explain() to see MongoDB query explain
plan – Number of scanned objects should be close to the number of
returned objects
© 2014 CloudBees, Inc.
All Rights Reserved 54
Aggregation Queries
• Aggregation Framework • Map/Reduce - Demo • Distinct - Demo • Group - Demo
– Similar to SQL Group By function
• Count
© 2014 CloudBees, Inc.
All Rights Reserved 55
Collection Callbacks
• MongoDB Java API provides callback functionality – This is implmented in Java via anonymous inner classes – Accessible via the Spring Data Template (MongoOperations)
• Can be used in lieu of converters for inline DBObject convertion.
© 2014 CloudBees, Inc. All Rights Reserved
56
Unwind
• $unwind • Useful command to convert arrays of objects, within
documents, into sub-documents that are then searchable by query.
db.depts.aggregate({"$project":{"employees":"$employees"}},{"$unwind":"$employees"},{"$match":{"employees.lname":"Vural"}});
© 2014 CloudBees, Inc. All Rights Reserved
57
GridFS
• “…specification for storing large files in MongoDB.” • As the name implies, “Grid” allows the storage of very large
files divided across multiple MongoDB documents. • Uses native BSON binary formats • 16MB per document • Large files added to GridFS get chunked and spread across
multiple documents.
© 2014 CloudBees, Inc. All Rights Reserved
58
Indexes
• Similar to RDBMS Indexes, Btree (support range queries) • Can have many and can be compound • Including indexes of array fields in document • Makes searches, aggregates, and group functions faster • Makes writes slower
– Sparse = true • Only include documents in this index that actually contain a value in
the indexed field.
© 2014 CloudBees, Inc. All Rights Reserved
59
Text Indexes
• Introduced in 2.4 • Requires enabled in mongod
– --setParameter textSearchEnabled=true
• In mongo (shell) – db["employees"].ensureIndex({"title":"text"})
• Index “title” field with text index
• At least 2x the storage space
© 2014 CloudBees, Inc. All Rights Reserved
60
GEO Spatial Ops
• One of MongoDB’s sweet spots • Used to store, index, search on geo-spatial data for GIS
operations. • Requires special indexes, 2d and 2dsphere (new with 2.4) • Requires Longitude and Latitude (in that order) coordinates
contained in double precision array within documents.
© 2014 CloudBees, Inc.
All Rights Reserved 61
Query Pagination
• Use Spring Data and QueryDSL - http://www.querydsl.com/ • Modify Spring Data repo extend
QueryDslPredicateExecutor • Add appropriate Maven POM entries for QueryDSL • Use Page and PageRequest objects to page through result
sets • QueryDSL will create Q<MODEL> Java classes • Precludes developers from righting pagination code
© 2014 CloudBees, Inc. All Rights Reserved
62
Save vs. Update
• Java driver save() saves entire document. • Use “update” to save time and bandwidth, and possibly
indexing. • Spring Data is slightly slower than lower level mongo Java
driver • Spring data fluent API is very helpful.
© 2014 CloudBees, Inc.
All Rights Reserved 63
MongoDB Security
• Default is trusted mode, no security – --auth – --keyfile
• Replica sets require this option
• New with 2.4: – Kerberos Support
© 2014 CloudBees, Inc. All Rights Reserved
64
MongoDB Auth Security
• Use –auth switch to enable • Create users with roles • Use db.authenticate in the code (if need be)
© 2014 CloudBees, Inc. All Rights Reserved
65
MongoDB Write Concerns
• Describes quality of writes (or write assurances) • Application (MongoDB client) is concerned with this
quality • Write concerns describe the durability of a write, and can
be tuned based on application and data needs • Adjusting write concerns can have an affect (maybe
deleterious) on write performance.
© 2014 CloudBees, Inc. All Rights Reserved
66
Encryption
• MongoDB does not support data encryption, per se • Or…use TDE (Transparent Data Encryption) from Gazzang • Use application-level encryption and store encrypted data
in BSON fields – **If you absolutely need encryption and you cannot get TDE**
© 2014 CloudBees, Inc. All Rights Reserved
67
New JavaScript Engine – V8
• MongoDB 2.4 uses the Google V8 JavaScript Engine – https://code.google.com/p/v8/ – Open source, written in C++, – High performance, with improved concurrency for multiple
JavaScript operations in MongoDB at the same time.
© 2014 CloudBees, Inc. All Rights Reserved
68
Some Useful Commands
• use <db> - connects to a DB • use admin; db.runCommand({top:1})
– Returns info about collection activity
• db.currentOp() – returns info about operations currently running in mongo db
• db.serverStatus() • use admin; db.shutdownServer();
© 2014 CloudBees, Inc. All Rights Reserved
69
More Useful Commands
• db.hostInfo() • db.isMaster() • db.runCommand({"buildInfo":1}) • it • db.runCommand({touch:"employees",data:true,index:true}) { "ok" : 1 }
© 2014 CloudBees, Inc.
All Rights Reserved 70
JS Benchmarking Harness
• Benchrun command • http://www.mongodb.org/about/contributors/js-
benchmarking-harness/ • “QA baseline perf measurement tool”
© 2014 CloudBees, Inc. All Rights Reserved
71
MongoDB in the Cloud
• Some of the top Service Providers – MongoHQ – Integrated Partner with CloudBees – Amazon (AWS) EC2 – MongoLab – Object Rocket
• REST APIs
• http://www.mongodb.com/partners/cloud
© 2014 CloudBees, Inc. All Rights Reserved
72
MongoHQ Info
• Integrated Partner with CloudBees • https://www.mongohq.com/home • http://www.cloudbees.com/platform-service-mongohq.cb • Works with CloudBees Weave@cloud services • Access it with DB URI just like other mongod • Has REST API
– Requires API key from account
© 2014 CloudBees, Inc.
All Rights Reserved 73
MongoLab Info
• https://mongolab.com/welcome/ • Access it with DB URI just like other mongod • Has REST API
– Requires API key from account
© 2014 CloudBees, Inc. All Rights Reserved
74