2016-01 lucene solr spatial in 2015, nyc meetup

25
Lucene/Solr Spatial in 2015 David Smiley Search Engineer/Consultant (Freelance)

Upload: david-smiley

Post on 16-Apr-2017

264 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Lucene/Solr Spatial in 2015David Smiley

Search Engineer/Consultant (Freelance)

Page 2: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

About David Smiley

Freelance Search Developer/ConsultantExpert Lucene/Solr development skills,advise (consulting), trainingJava, spatial, and full-stack experience

Apache Lucene/Solr committer & PMC memberPrimary author of “Apache Solr Enterprise Search Server”

Page 3: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

More Spatial Contributors!

Spatial4j Lucene Solr

David Smiley ✔️ ✔️ ✔️

Ryan McKinley ✔️

Justin Deoliveira ✔️

Mike McCandless ✔️

Nick Knize ✔️

Karl Wright ✔️

Ishan Chattopadhyaya ✔️

Page 4: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Agenda

New Features / CapabilitiesNew ApproachesImprovementsPending

Page 5: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Lucene’s Spatial Module• Multiple approaches to index spatial dataabstract class SpatialStrategy

(5+ concrete implementations)• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile

• Grid based

• Uses Spatial4j lib for shapes, distance calculations, and WKT• Uses JTS Topology Suite lib for polygons

Shape

SpatialPrefixTree / Cell PrefixTreeStrategyIntersectsPrefixTreeFilterContains…Within…Geohash | Quad

Page 6: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Topic: New Features

Heatmaps / grid faceting — Lucene, SolrSurface-of-sphere shapes (Geo3d) — LuceneAccurate indexed geometries — Lucene, SolrGeoJSON read/write — Spatial4j

Page 7: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Heatmaps: Spatial Grid Faceting

Spatial density summary grid faceting,also useful for point-plotting search results

Usually rendered with a gradient radiusLucene & Solr APIsScalable & fast usually…

v5.2

Page 8: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Heatmaps Under the Hood

Requires a PrefixTreeStrategy Lucene field — grid basedAlgorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid

Conceptually facet.method=enum for spatialWorks on non-point indexed shapes tooComplexity: O(cells * cellDepthFactor) not O(docs)No/low memory; mainly the grid of integers

Solr will distribute to shards and mergeCould be faster still; a BFS (vs DFS) layout would be perfect

Page 9: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Solr Heatmap Faceting

On an RPT field (SpatialRecursivePrefixTreeFieldType)

prefixTree=“packedQuad” (optional)Query: /select?facet=true&facet.heatmap=geo_rpt&facet.heatmap.geom= ["-180 -90" TO "180 90”]facet.heatmap.format=ints2D or png

// Normal Solr response..."facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]]...

Page 10: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Solr Heatmap Resources

Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+SearchJack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.htmlLive Demo: http://worldwidegeoweb.comOpen-source JavaScript Solr Heatmap Libraries

https://github.com/spacemansteve/SolrHeatmapLayerhttps://github.com/mejackreed/leaflet-solr-heatmaphttps://github.com/voyagersearch/leaflet-solr-heatmap

Page 11: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Geo3D: Shapes on the Surface of a Sphere

… or Ellipsoid of configurable axisNot a general 3D space geometry libInternally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematicsShapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional bufferDistance computations: Arc (angular or surface), Linear (straight-line), Normal

Page 12: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

All 2D Maps of the Earth Distort Straight Lines

A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!

Page 13: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Geo3D, continued…

BenefitsInherently more accurate than 2D projected spatial

especially for big shapes or near polesMany computations are fast; no expensive trigonometryAn alternative to JTS without the LGPL license (still)

Has own Lucene module (spatial3d), thus jar fileMaven groupId: org.apache.lucene, artifact: lucene-spatial3d

No Solr integration yet; pending more Spatial4j integrationIn progress!

Page 14: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Index & Search Geo3D Geometries

Spatial4j Geo3dShape wrapper with RPT

In Lucene-spatial for nowIndex Geo3d shapes

Limited to grid accuracy

Query by Geo3d shapeLimited distance sortHeatmaps

Geo3DPointField & PointInGeo3DShapeQuery

Based on a 3D BKD index

In spatial3d moduleIndex points-onlyQuery by Geo3d shapeNo distance sortLeaner & faster than RPT?

v5.4v5.2

Page 15: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

RPT/SpatialPrefixTrees and Accuracy

RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree

Thus represents shapes as grid cells of varying precision by prefix

Example, a point shape:D, DR, DRT, DRT2, DRT2YMore accuracy scales

Example, a polygon shape:Too many to list… 508 cellsMore accuracy does NOT scale

Page 16: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Combining RPT with Serialized Geometry

RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)SDV (SerializedDVStrategy) stores serialized geometry (accurate)RPT + SDV → CompositeSpatialStrategy

Accuracy & speed & smaller indexesOptimized intersects predicate avoids some geometry checks> 80% faster intersects queries, 75% smaller index

Solr adapter: RptWithGeometrySpatialFieldCompatible with the Heatmaps featureIncludes a shape cache (per-segment); configurable

v5.2

Page 17: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Topic: New Approaches

LuceneDimensionalValues (BKD Tree Indexes)GeoPointField

Page 18: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

New Lucene index type for numeric valuesIncluding multi-dimensional values!Old: IntField, FloatField etc., trie indexing is now legacyNew: DimensionalIntField, DimensonalFloatField, etc. with DimensionalRangeQuery, …

Implemented using a BKD IndexPaper: https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdfMuch faster and compact than trie/prefix-tree based indexes

Wither term auto-prefixing? LUCENE-5879 Defunct?

v6.0DimensionalValues (BKD Index)

Page 19: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Multiple Fields/Queries using this:(1D) DimensionalIntField(2D) DimensionalLatLonField(3D) Geo3DPointField (previously described)And you can write your own

…continued

Page 20: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Efficient range search on single/multi-valued numbers or termsCould be used for numbers, dates, IPV6 bytes, …Alternatives: LegacyIntField etc. (trie), DateRangeField (RPT)

Would love to see a benchmark!How-To:

Dimensional___Field: Int, Long, Float, Double, BinaryDimensionalRangeQuery (or DimensionalQuery?)

v5.3DimensionalValues 1D

Page 21: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Efficient 2D geospatial point indexAlternative to RPT or GeoPointFieldIn lucene-sandboxNo Lucene-spatial module SpatialStrategy wrappers yet, thus no Spatial4j Shape integration nor Solr integration yet

How-To:Index: DimensionalLatLonFieldQuery:

DimensionalPointInBBoxQueryDimensionalPointInPolygonQuerypoint-radius (circle) — in-progress LUCENE-6698

v5.3DimensionalValues 2D: DimensionalLatLonField

Cool video: https://www.youtube.com/watch?v=x9WnzOvsGKs

Page 22: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

GeoPointField

2D geospatial point fieldIndexed point-only data, single/multi-valuedSpatial 2D Trie/PrefixTree terms index

But not affiliated with Lucene-spatial SpatialPrefixTree/RPTConfigurable 2x grid size (defaults to 512)Compact bit interleaved Z-order encodingRe-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic2-phase grid/postings then doc-values algorithm

v5.3

Page 23: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

…continued

Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategyNo Heatmaps, No custom Shape implementationsNo Solr support yetNo dependencies

Easy to use compared to RPT; simpler internally tooHow-To:

doc.add(new GeoPointField(name, lon, lat, Store.YES))GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery or GeoPointDistanceRangeQuery

Cool video: https://www.youtube.com/watch?v=l2zB9TDUAL4

Page 24: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Topic: Some Pending Spatial TODOs

Spatial4jJTS-free polygon API (in-progress)Geo3D adapter

LuceneFlexPrefixTree — LUCENE-4922Heatmap optimized FlexPrefixTree (Breadth First Search layout)SpatialStrategy adapters for GeoPointField, DimensionalLatLonField, Geo3DPointField

SolrBetter spatial Solr QParsers — SOLR-4242GeoJSON parsingMore FieldType adapters for latest Lucene spatialNearest-neighbor searchDateRangeField faceting

Page 25: 2016-01 Lucene Solr spatial in 2015, NYC Meetup

That’s all for now; thanks for coming!

Need Lucene/Solr guidance or custom development?

Contact me!Email: [email protected]: http://www.linkedin.com/in/davidwsmileyG+: +DavidSmileyTwitter: @DavidWSmiley