back to basics webinar 4: advanced indexing, text and geospatial indexes

Post on 10-Jan-2017

3.630 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MongoDB Europe 2016Old Billingsgate, London

15th November

Use my code JD20 for 20% off ticketsmongodb.com/europe

Back to Basics 2016 : Webinar 4

Advanced Indexing – Text and Geospatial Indexes

Joe DrumgooleDirector of Developer Advocacy, EMEA

@jdrumgoole

V1.1

3

Recap

• Webinar 1 – Introduction to NoSQL– The different types of NoSQL databases– What kind of database is MongoDB? A document database.

• Webinar 2 – My First Application– Creating databases and collections– CRUD operations– Indexes and Explain

• Webinar 3 – Schema Design– Dynamic schema– Embedding approaches– Examples

4

Indexing

• An efficient way to look up data by its value• Avoids table scans

5

Traditional Databases Use Btrees

• … and so does MongoDB

6

Queries, Inserts, Deletes O(Log(n) Time

7

Creating a Simple Index

db.coll.createIndex( { fieldName : <Direction> } )

Database Name

Collection Name

Command

Field Name to be indexed

Ascending : 1 Descending : -1

8

Two Other Kinds of Indexes

• Full Text Index– Allows searching inside the text of a field ( Lucene, Solr and Elastic

Search)• Geospatial Index

– Allows searching by location (e.g. people near me)• These indexes do not use Btrees

9

Full Text Indexes

• An “inverted index” on all the words inside a single field (only one text index per collection)

{ “comment” : “I think your blog post is very interesting and informative. I hope you will post more info like this in the future” }

>> db.posts.createIndex( { “comments” : “text” } )

MongoDB Enterprise > db.posts.find( { $text: { $search : "info" }} ){ "_id" : ObjectId(“…"), "comment" : "I think your blog post is very interesting and informative. I hope you will post more info like this in the future" }MongoDB Enterprise >

10

Results

MongoDB Enterprise > db.posts.getIndexes()...

{"v" : 1,"key" : {

"_fts" : "text","_ftsx" : 1

},"name" : "comment_text","ns" : "test.posts","weights" : {

"comment" : 1},"default_language" : "english","language_override" : "language","textIndexVersion" : 3

}

11

Dropping Text Indexes

• We drop text indexes by name rather than shapedb.posts.getIndexes()

{"v" : 1,"key" : {

"_fts" : "text","_ftsx" : 1

},"name" : "comment_text_text","ns" : "test.posts","weights" : {

"comment" : 5,"tags" : 10

},"default_language" : "english","language_override" : "language","textIndexVersion" : 3

}

12

Hence

MongoDB Enterprise > db.posts.dropIndex( "comment_text_tags_text" ){ "nIndexesWas" : 2, "ok" : 1 }MongoDB Enterprise >

• You can give an index an explict name to make this easier

MongoDB Enterprise > db.posts.createIndex( { "comments" : "text", "tags" : "text" }, { "name" : "text_index" } ){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}

13

On The Server

I INDEX [conn275] build index on: test.posts properties: { v: 1, key: { _fts: "text", _ftsx: 1 }, name: "comment_text", ns: "test.posts", weights: { comment: 1 }, default_language: "english", language_override: "language", textIndexVersion: 3 }}I INDEX [conn275] building index using bulk methodI INDEX [conn275] build index done. scanned 3 total records. 0 secs

14

More Detailed Example

>> db.posts.insert( { "comment" : "Red yellow orange green" } )>> db.posts.insert( { "comment" : "Pink purple blue" } )>> db.posts.insert( { "comment" : "Red Pink" } )

>> db.posts.find( { "$text" : { "$search" : "Red" }} ){ "_id" : ObjectId(“…”), "comment" : "Red yellow orange green" }{ "_id" : ObjectId(  »…"), "comment" : "Red Pink" }>> db.posts.find( { "$text" : { "$search" : "Red Green" }} ){ "_id" : ObjectId(« …"), "comment" : "Red Pink" }{ "_id" : ObjectId(« …"), "comment" : "Red yellow orange green" }>> db.posts.find( { "$text" : { "$search" : "red" }} ) # <- Case Insensitve{ "_id" : ObjectId(“…"), "comment" : "Red yellow orange green" }{ "_id" : ObjectId(«…”), "comment" : "Red Pink" }>>

15

Using Weights

• We can assign different weights to different fields in the text index• E.g. I want to favour tags over comments in searching• So I increase the weight for the the tags field

>> db.blog.createIndex( { comment: "text", tags : "text” }, { weights: { comment: 5, tags : 10 }} )• Now searches will favour tags

16

$textscore

• Weights impact $textscore:

>> db.posts.find( { "$text" : { "$search" : "Red" }}, { score: { $meta: "textScore" }} ).sort( { score: { $meta: "textScore" } } ){ "_id" : …, "comment" : "hello", "tags" : "Red green orange", "score" : 6.666666666666666 }{ "_id" : …, "comment" : "Red Pink", "score" : 3.75 }{ "_id" : …, "comment" : "Red yellow orange green", "score" : 3.125 }>>

17

Other Parameters

• Language : Pick the language you want to search in e.g. – $language : Spanish

• Support case sensitive searching– $caseSensitive : True (default false)

• Support accented characters (diacritic sensitive search e.g. café is distinguished from cafe )– $diacriticSensitive : True (default false)

Geospatial Indexes

19

Geospatial Indexes

• MongoDB supports 2D Sphere indexes• Allows a user to represent location on the earth (which is a sphere)• Coordinates are stored in GeoJSON format• The Geospatial index supports subset of the GeoJSON operations• The index is based on a QuadTree representation• Index is based on WGS 84 standard

20

Coordinates

• Coordinates are represented as longitude, latitude• longitude

– Measured from Greenwich meridian in London (0 degrees) locations east (up to 180 degrees)

– For locations west we specify as negative • Latitude

– Measured from equator north and south (0 to 90 north, 0 to -90 south)• Coordinates in MongoDB are stored on Longitude/Latitude order• Coordinates in Google are stored in Latitude/Longitude order

21

2DSphere Versions

• Three versions of 2dSphere index in MongoDB• Version 1 : Up to MongoDB 2.4• Version 2 : From MongoDB 2.6 onwards• Version 3 : From MongoDB 3.2 onwards• We will only be talking about Version 3 in this webinar

22

Creating a 2dSphere Index

db.collection.createIndex ( { <location field> : "2dsphere" } )

• Location field must be coordinate or GeoJSON data

23

Example

>> db.test.createIndex( { loc : "2dsphere" } ){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}

24

Output

>> db.test.getIndexes()[

{"v" : 1,"key" : {

"loc" : "2dsphere"},"name" : "loc_2dsphere","ns" : "geo.test","2dsphereIndexVersion" : 3

}]>>

25

Use a Simple Dataset to investigate Geo Queries

• Lets search for restaurants in Manhattan• Using two candidate collections

–  https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/neighborhoods.json– https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/restaurants.json

• Import them into MongoDB– mongoimport –c neighborhoods –d geo neighborhoods.json– mongoimport –c restaurants –d geo restaurants.json

26

Neighborhood Document

MongoDB Enterprise > db.neighborhoods.findOne(){

"_id" : ObjectId("55cb9c666c522cafdb053a1a"),"geometry" : {"coordinates" : [[[-73.94193078816193,40.70072523469547],

...[-73.94409591260093,40.69897295461309],

]

"type" : "Polygon"},"name" : "Bedford"

}

27

Restaurant Document

MongoDB Enterprise > db.restaurants.findOne(){

"_id" : ObjectId("55cba2476c522cafdb053adf"),"location" : {

"coordinates" : [-73.98241999999999,40.579505

],"type" : "Point"

},"name" : "Riviera Caterer"

}MongoDB Enterprise >

You can type this into google maps but

remember to reverse the coordinate order

28

Add Indexes

MongoDB Enterprise > db.restaurants.createIndex({ location: "2dsphere" }){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}MongoDB Enterprise > db.neighborhoods.createIndex({ geometry: "2dsphere" }){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}MongoDB Enterprise >

29

Use $geoIntersects to find our Neighborhood

• Assume we are at -73.93414657, 40.82302903• What neighborhood are we in? Use $geoIntersects

db.neighborhoods.findOne({ geometry: { $geoIntersects: { $geometry: { type: "Point", coordinates: [ -73.93414657, 40.82302903 ]}}}})

30

Results

{"geometry" : {

”coordinates" : [[

-73.9338307684026,40.81959665747723

], ...

[-73.93383000695911,40.81949109558767

] ]

"type" : "Polygon"},"name" : "Central Harlem North-Polo Grounds"

}

31

Find All Restaurants within 0.35 km

db.restaurants.find({ location: { $geoWithin: { $centerSphere: [ [ -73.93414657, 40.82302903 ], 5 / 6,378.1 ] } } })

Distance in km Divide by radius of earth to convert to radians

32

Results – (Projected)

{ "name" : "Gotham Stadium Tennis Center Cafe" }{ "name" : "Chuck E. Cheese'S" }{ "name" : "Red Star Chinese Restaurant" }{ "name" : "Tia Melli'S Latin Kitchen" }{ "name" : "Domino'S Pizza" }

• Without projection

{ "_id" : ObjectId("55cba2476c522cafdb0550aa"), "location" : { "coordinates" : [ -73.93795159999999, 40.823376 ], "type" : "Point" }, "name" : "Domino'S Pizza" }

33

Summary of Operators

• $geoIntersect: Find areas or points that overlap or are adjacent

• $geoWithin: Find areas on points that lie within a specific area• $geoNear: Returns locations in order from nearest to furthest

away

34

Summary

• Text Indexes : Full text searching of all the text items in a collection

• Geospatial Indexes : Search by location, by intersection or by distance from a point

35

Q & A

37

• This is slide content

41

42

LOREM IPSUM

LOREM IPSUM

LOREM IPSUM

LOREM IPSUM

Sollicitudin VenenatisLOREM IPSUM

LOREM IPSUM

LOREM IPSUM

LOREM IPSUM

Graphic Element Examples

Porta Ultricies

Commodo Porta

Graph Examples

Category 1 Category 2 Category 3 Category 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Series 1Series 2

Category 1 Category 2 Category 3 Category 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Series 1Series 2

{ _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type :  "Health", plan : "PPO Plus" }, { type :   "Dental", plan : "Standard" }

] }

Code/Highlight Example

Aggregation Framework Agility Backup Big Data Briefcase

Buildings Business Intelligence Camera Cash Register Catalog

Chat Checkmark Checkmark Cloud Commercial Contract

Computer Content Continuous Development Credit Card Customer Success

Data Center Data Variety Data Velocity Data Volume Data Warehouse Database

Dialogue Directory Documents Downloads Drivers Dynamic Schema

EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop

Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development

Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass

Man Mobile Phone Meter Monitoring Music New Apps

New Data Types Online Open Source Parachute Personalization Pin

Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying

Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel

Stopwatch Text Search Tick Data Training Transmission Tower Trophy

Woman World

top related