sqlbits x sql server 2012 spatial indexing
DESCRIPTION
SQLBits X Training Day Presentation on SQL Server 2012 Spatial IndexingCopyright (c) Microsoft Corp.TRANSCRIPT
Taking SQL Server Beyond Relational: Deep Dive into Spatial Performance and Spatial Indexing
Michael RysPrincipal Program Manager@SQLServerMike
Q: Why is my Query so Slow?
A: Usually because the index isn’t being used.
Q: How do I tell?A: SELECT * FROM T WHERE g.STIntersects(@x) =
1
NO INDEX
INDEX!
Hinting the Index
Spatial indexes can be forced if needed.
SELECT * FROM T WHERE g.STIntersects(@x) = 1
Use SQL Server 2008 SP1 or later!
WITH(INDEX(T_g_idx))
But Why Isn't My Index Used?
Plan choice is cost-basedQO uses various information, including cardinality
When can we estimate cardinality?Variables: neverLiterals: not for spatial since they are not literals under the coversParameters: yes, but cached, so first call matters
DECLARE @x geometry = 'POINT (0 0)'SELECT *FROM TWHERE T.g.STIntersects(@x) = 1
SELECT *FROM TWHERE T.g.STIntersects('POINT (0 0)') = 1
EXEC sp_executesql N'SELECT * FROM T WHERE T.g.STIntersects(@x) = 1', N'@x geometry', N'POINT (0 0)'
Spatial Indexing Basics
In general, split predicates in twoPrimary filter finds all candidates, possibly with false positives (but never false negatives)Secondary filter removes false positives
The index provides our primary filterOriginal predicate is our secondary filterSome tweaks to this scheme
Sometimes possible to skip secondary filter
A B
C
D A BD A BPrimary Filter (Index lookup)
Secondary Filter (Original predicate)E
Using B+-Trees for Spatial Index
SQL Server has B+-TreesSpatial indexing is usually done through other structures
Quad tree, R-TreeChallenge: How do we repurpose the B+-Tree to handle spatial queries?
Add a level of indirection!
Mapping to the B+-Tree
B+-Trees handle linearly ordered sets wellWe need to somehow linearly order 2D space
Either the plane or the globeWe want a locality-preserving mapping from the original space to the line
i.e., close objects should be close in the indexCan’t be done, but we can approximate it
SQL Server Spatial Indexing Story
Requires bounding boxOnly one grid
Planar Index Geographic IndexNo bounding boxTwo top-level projection grids
3.
2.
1.1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1 2 15 16
4 3 14 13
5 8 9 12
6 7 10 11
1. Overlay a grid on the spatial object2. Identify grids for spatial object to store in index3. Identify grids for query object(s)4. Intersecting grids identifies candidates
Indexing PhasePrimary Filter
Secondary Filter
5. Apply actual CLR method on candidates to find matches
SQL Server Spatial Indexing Story
Multi-Level GridMuch more flexible than a simple gridHilbert numberingModified adaptable QuadTree
Grid index features4 levelsCustomizable grid subdivisionsCustomizable maximum number of cells per object (default 16)NEW IN SQL Server 2012: New Default tessellation with 8 levels of cell nesting
Multi-Level Grid
Deepest-cell Optimization: Only keep the lowest level cell in indexCovering Optimization: Only record higher level cells when all lower cells are completely covered by the object
/ (“cell 0”)
Cell-per-object Optimization: User restricts max number of cells per object
/4/2/3/1
Implementation of the Index
Persist a table-valued functionInternally rewrite queries to use the table
Prim_key geography
1 g1
2 g2
3 g3
Prim_key cell_id srid cell_attr
1 0x00007 42 0
3 0x00007 42 1
3 0x0000A 42 2
3 0x0000B 42 0
3 0x0000C 42 1
1 0x0000D 42 0
2 0x00014 42 1
Base Table T
Internal Table for sixdCREATE SPATIAL INDEX sixdON T(geography)
0 – cell at least touches the object (but not 1 or 2)1 – guarantee that object partially covers cell2 – object covers cell 15 columns and 895 byte limitation
Spatial Reference IDHave to be the same to produce match
Varbinary(5) encoding of grid cell id
Auto Grid Spatial Index
New spatial index Tessellations:
geometry_auto_gridgeography_auto_grid
Uses 8 Grid levels instead of the previous 4No GRIDS parameter needed (or available)
Fixed at HLLLLLLLdefault number of cells per object:
8 for geometry 12 for geography
More stable performance for windows of different sizefor data with different spatial density
For default values:Up to 2x faster for longer queries > 500 ms
More efficient primary filter Fewer rows returned
10ms slower for very fast queries < 50 ms
Increased tessellation time which is constant
Spatial Index Performance
New grid gives much stable performance for query windows of different sizeBetter grid coverage gives fewer high peaks
14
Index Creation and MaintenanceCreate index example GEOMETRY:
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)WITH (BOUNDING_BOX = (0, 0, 500, 500),
GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20)
Create index example GEOGRAPHY:CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column)USING GEOGRAPHY_GRIDWITH (GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT = 20)
NEW IN SQL Server 2012 (equivalent to default creation):CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)USING GEOGRAPHY_AUTO_GRIDWITH (CELLS_PER_OBJECT = 20)
Use ALTER and DROP INDEX for maintenance.
DEMOIndexing and Performance
Spatial queries supported by index in SQL Server
Geometry:STIntersects() = 1
STOverlaps() = 1
STEquals()= 1
STTouches() = 1
STWithin() = 1
STContains() = 1
STDistance() < val
STDistance() <= val
Nearest Neighbor
Filter() = 1
Geography• STIntersects() = 1 • STOverlaps() = 1• STEquals()= 1• STWithin() = 1• STContains() = 1• STDistance() < val • STDistance() <= val• Nearest Neighbor• Filter() = 1
New in SQL Server 2012
How Costing is Done
• The stats on the index contain a trie constructed on the string form of the packed binary(5) typed Cell ID.
• When a window query is compiled with a sniffable window object, the tessellation function on the window object is run at compile time. The results are used to construct a trie for use during compilation.
May lead to wrong compilation for later objects
• No costing on:Local variables, constants, results of expressions
• Use different indices and different stored procs to account for different query characteristics
Understanding the Index Query Plan
Seeking into a Spatial Index
Minimize I/O and random I/OIntuition: small windows should touch small portions of the indexA cell 7.2.4 matches
ItselfAncestorsDescendants
Spatial Index S
7 7.2 7.2.4
Understanding the Index Query Plan
T(@g)
Spatial Index Seek
Ranges
Remove dup ranges
Optional Sort
Spatial index tessellation
Better and more continuous coverage
64 cells 128 cells 256 cells
Fully contained
cellsPartially contained
cells
Query window number of cells
Typical spatial query performanceOptimal value (theoretical) is
somewhere between two extremes
Time needed to process false
positives
Default values:512 - Geometry AUTO grid768 - Geography AUTO grid1024 - MANUAL grids
SELECT * FROM table t WITH (SPATIAL_WINDOW_MAX_CELLS=256)WHERE t.geom.STIntersects(@window)=1;
Query Window Hinting (SQL Server 2012)• SELECT * FROM table t
with(SPATIAL_WINDOW_MAX_CELLS=1024)WHERE t.geom.STIntersects(@window)=1
• Used if an index is chosen (does not force an index)• Overwrites the default (512 for geometry, 768 for geography)• Rule of thumb:
• Higher value makes primary filter phase longer but reduces work in secondary filter phase
• Set higher for dense spatial data • Set lower for sparse spatial data
Index Hinting
• FROM T WITH (INDEX (<Spatial_idxname>))• Spatial index is treated the same way a non-
clustered index is• the order of the hint is reflected in the order of the indexes
in the plan• multiple index hints are concatenated• no duplicates are allowed
• The following restrictions exist:• The spatial index must be either first in the first index hint
or last in the last index hint for a given table.• Only one spatial index can be specified in any index hint for
a given table.
Query Hinting
demo
Additional Query Processing Support
• Index intersectionEnables efficient mixing of spatial and non-spatial predicates
• MatchingNew in SQL Server 2012: Nearest Neighbor queryDistance queries: convert to STIntersectsCommutativity: a.STIntersects(b) = b.STIntersects(a)Dual: a.STContains(b) = b.STWithin(a)Multiple spatial indexes on the same column
Various bounding boxes, granularities
Outer references as window objectsEnables spatial join to use one index
Other Spatial Performance Improvements in SQL Server 2012
• Spatial index build time for point data can be as much as four to five times faster
• Optimized spatial query plan for STDistance and STIntersects like queries
• Faster point data queries• Optimized STBuffer, lower memory footprint
Spatial Nearest Neighbor
Main scenarioGive me the closest 5 Italian restaurants
Execution plan SQL Server 2008/2008 R2: table scanSQL Server 2012: uses spatial index
Specific query pattern requiredSELECT TOP(5) *FROM Restaurants rWHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULLORDER BY r.pos.STDistance(@me)
Spatial Performance in SQL Server 2012
demo
Nearest Neighbor Performance
NN query vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms
Find the closest 50 business points to a specific location (out of 22 million in total)
Limitations of Spatial Plan Selection
• Off whenever window object is not a parameter:
Spatial join (window is an outer reference)Local variable, string constant, or complex expression
• Has the classic SQL Server parameter-sensitivity problem
SQL compiles once for one parameter value and reuses the plan for all parameter valuesDifferent plans for different sizes of window require application logic to bucketize the windows
Error 8635: Cannot find a plan
Error: The query processor could not produce a query plan for a query with a spatial index hint. Reason: %S_MSG. Try removing the index hints or removing SET FORCEPLAN.Possible Reasons (%S_MSG):
The spatial index is disabled or offlineThe spatial object is not defined in the scope of the predicateSpatial indexes do not support the comparand supplied in the predicateSpatial indexes do not support the comparator supplied in the predicateSpatial indexes do not support the method name supplied in the predicateThe comparand references a column that is defined below the predicateThe comparand in the comparison predicate is not deterministicThe spatial parameter references a column that is defined below the predicateCould not find required binary spatial method in a conditionCould not find required comparison predicate
Index Support
• Can be built in parallel• Can be hinted• File groups/Partitioning
• Aligned to base table or Separate file group• Full rebuild only
• New catalog views, DDL Events• DBCC Checks• Supportability stored procedures• New in SQL Server 2012: Index Page and Row
Compression• Ca. 50% smaller indices, 0-15% slower queries
• Not supported• Online rebuild• Database Tuning advisor
SET Options
Spatial indexes requires:ANSI_NULLS: ONANSI_PADDING: ONANSI_WARNINGS: ONCONCAT_NULL_YIELDS_NULL: ONNUMERIC_ROUNDABORT: OFFQUOTED_IDENTIFIER: ON
Spatial Indices and Partitions and Filegroups
Default partitioned to the same filegroups as the base table. Overwrite with: [ ON { filegroup_name | "default" } ]
If filegroup_name is specified, the index will be placed on the specified filegroup regardless of the table’s partitioning scheme. If “default” is specified, the base table’s default filegroup/partitioning scheme is applied.
Altering the base table’s partition scheme is not allowed unless the spatial index was created with the “ON filegroup” option (and is hence not aligned with the partitioning anyway). The index has to be dropped and then the base table repartitioned.
Spatial Catalog Views
• sys.spatial_indexes catalog view• sys.spatial_index_tessellations catalog view• Entries in sys.indexes for a spatial index:
• A clustered index on the internal table of the spatial index
• A spatial index (type = 4) for spatial index
• An entry in sys.internal_tables• An entry to sys.index_columns
sp_spatial_help_geometry_histogramsp_spatial_help_geography_histogramUsed for spatial data and index analysis
New Spatial Histogram Helpers
Histogram of 22 million business points over USLeft: SSMS view of a histogramRight: Custom drawing on top of Bing Maps
Indexing Support Procedures
• sys.sp_help_spatial_geometry_index• sys.sp_help_spatial_geometry_index_xml• sys.sp_help_spatial_geography_index• sys.sp_help_spatial_geography_index_xml
• Provide information about index:• 64 properties• 10 of which are considered core
sys.sp_help_spatial_geometry_indexArguments
Results in property name/value pair table of the format:
Parameter Type Description
@tabname nvarchar(776) the name of the table for which the index has been specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported1 all properties are being reported
@query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.
PropName: nvarchar(256) PropValue: sql_variant
sys.sp_help_spatial_geography_index_xml
ArgumentsParameter Type Description
@tabname nvarchar(776) the name of the table for which the index has been specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported1 all properties are being reported
@query_sample geography A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window.
@xml_output xml This is an output parameter that contains the returned properties in an XML fragment
Some of the returned Properties
Property Type DescriptionBase_Table_Rows Bigint All Number of rows in the base table
Index properties - All index properties: bounding box, grid densities, cell per object
Total_Primary_Index_Rows
Bigint All Number of rows in the index
Total_Primary_Index_Pages
Bigint All Number of pages in the index
Total_Number_Of_ObjectCells_In_Level0_For_QuerySample
Bigint Core Indicates whether the representative query sample falls outside of the bounding box of the geometry index and into the root cell (level 0 cell). This is either 0 (not in level 0 cell) or 1. If it is in the level 0 cell, then the investigated index is not an appropriate index for the query sample.
Total_Number_Of_ObjectCells_In_Level0_In_Index
Bigint Core Number of cell instances of indexed objects that are tessellated in level 0. For geometry indexes, this will happen if the bounding box of the index is smaller than the data domain. A high number of objects in level 0 may require a costly application of secondary filters if the query window falls partially outside the bounding box. If the query window falls inside the bounding box, having a high number of objects in level 0 may actually improve the performance.
Some of the returned Properties
Property Type DescriptionNumber_Of_Rows_Selected_By_Primary_Filter
bigint Core P = Number of rows selected by the primary filter.
Number_Of_Rows_Selected_By_Internal_Filter
bigint Core S = Number of rows selected by the internal filter. For these rows, the secondary filter is not called.
Number_Of_Times_Secondary_Filter_Is_Called
bigint Core Number of times the secondary filter is called.
Percentage_Of_Rows_NotSelected_By_Primary_Filter
float Core Suppose there are N rows in the base table, suppose P are selected by the primary filter. This is (N-P)/N as percentage.
Percentage_Of_Primary_Filter_Rows_Selected_By_Internal_Filter
float Core This is S/P as a percentage. The higher the percentage, the better is the index in avoiding the more expensive secondary filter.
Number_Of_Rows_Output bigint Core O=Number of rows output by the query.
Internal_Filter_Efficiency float Core This is S/O as a percentage.
Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter.
Spatial Tips on index settingsSome best practice recommendations (YMMV):• Start out with new default tesselation• Point data: always use HIGH for all 4 level. CELL_PER_OBJECT
are not relevant in the case.• Simple, relatively consistent polygons: set all levels to LOW or
MEDIUM, MEDIUM, LOW, LOW • Very complex LineString or Polygon instances:
• High number of CELL_PER_OBJECT (often 8192 is best)• Setting all 4 levels to HIGH may be beneficial
• Polygons or line strings which have highly variable sizes: experimentation is needed.
• Rule of thumb for GEOGRAPHY: if MMMM is not working, try HHMM
What to do if my Spatial Query is slow?• Make sure you are running SQL Server 2008 SP1, 2008 R2 or
2012• Check query plan for use of index• Make sure it is a supported operation• Hint the index (and/or a different join type)• Do not use a spatial index when there is a highly selective non-
spatial predicate• Run above index support procedure:
• Assess effectiveness of primary filter (Primary_Filter_Efficiency)• Assess effectiveness of internal filter (Internal_Filter_Efficiency)• Redefine or define a new index with better characteristics
• More appropriate bounding box for GEOMETRY• Better grid densities
Summary: Spatial Index Improvements in SQL Server 2012
Auto Grid Spatial Index
Spatial Index Hint
More supported Operations
Spatial Index Compression
Improved “Create Spatial Index” Time For Point Data
Related ContentSQL Server and SQL Azure Whitepapers and information:
http://www.sqlserverlaunch.com/http://sqlcat.com/sqlCat/b/whitepapers/archive/2011/08/08/new-spatial-features-in-sql-server-code-named-denali-community-technology-preview-3.aspx http://social.technet.microsoft.com/wiki/contents/articles/4136.aspxhttp://social.technet.microsoft.com/wiki/contents/articles/updated-spatial-features-in-the-sql-azure-q4-2011-service-release.aspxSIGMOD 2008 Paper: Spatial Indexing in Microsoft SQL Server 2008
Spatial Tools:SQL Server Spatial Codeplex site: http://sqlspatialtools.codeplex.com/http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspxhttp://www.codeplex.com/ProjNET http://www.geoquery2008.com/
Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1
Find Us Later At…On Twitter: @SQLServerMike, @Spatial_EdBlogs: http://sqlblog.com/blogs/michael_rys, http://blogs.msdn.com/b/edkatibah/