© 2011 MicrosoftLast Updated: Monday, April 10, 2023
SQL SERVER 2012 MOMENTUM WORKSHOP: COLUMNSTORE INDEX DEEP DIVE
IS THIS YOUR DATA WAREHOUSE EXPERIENCE?
Waiting…
COLUMNSTORE INDEXES
• Interactive experiences with data• Near instant response times on
large data sets• Ad hoc and reporting queries
• Easy to use• Reduced need for summary
tables, indexed views, OLAP cubes
• Fewer indexes to design and maintain
• Reduced need to manually tune queries
• Lower TCO• Reduced people costs• Lower hardware costs
Waiting
AGENDA
• ColumnStore Definition• How Does ColumnStore Accelerate Queries?• What is a ColumnStore Index?
• How Do I Use It?• When Should I Use It?• When Should I Not Use It?
• Design Schemas and Queries to Take Best Advantage of the Power of ColumnStore Indexes
• How to Add Data to a Table with a ColumnStore Index• Anticipate and Troubleshoot Challenges Related to Memory
Usage and Query Plans when Using ColumnStore Indexes• When to/When Not to Build ColumnStore Indexes• Best Practices for Creating ColumnStore Indexes
COLUMNSTORE DEFINITION
• New type of indexColumnStore
offering in SQL Server
• Processes data in batchesNew query processing mode
• Typical data warehouse queries• SQL Server relational data warehouses, up to
10s of TB
Accelerates targeted workloads
COLUMNSTOREDEMO
HOW DOES COLUMNSTORE SPEED UP QUERIES? (1)
…
C1 C2
C3 C5 C6C4
Heaps, B-trees store data row-wise
ColumnStore indexes store data column-wise
• Each page stores data from a single column
Highly compressed• About 2x better than PAGE compression• More data fits in memory
Each column can be accessed independently
• Fetch only needed columns• Can dramatically decrease IO
HOW DOES COLUMNSTORE SPEED UP QUERIES? (1)
ColumnStore Index Structure• Segment contains values from
one column for a set of rows• Segments for the same set of rows
comprise a row group• Segments are compressed• Each segment stored in a separate
LOB• Segment is unit of transfer
between disk and memory
C1 C2
C3 C5 C6C4
Row group
Segment
COLUMNSTORE INDEX EXAMPLE
OrderDateKey ProductKey
StoreKey RegionKey Quantity SalesAmount
20101107 106 01 1 6 30.00
20101107 103 04 2 1 17.00
20101107 109 04 2 2 20.00
20101107 103 03 2 1 17.00
20101107 106 05 3 4 20.00
20101108 106 02 1 5 25.00
20101108 102 02 1 1 14.00
20101108 106 03 2 5 25.00
20101108 109 01 1 1 10.00
20101109 106 04 2 4 20.00
20101109 106 04 2 5 25.00
20101109 103 01 1 1 17.00
HORIZONTALLY PARTITION (ROW GROUPS)OrderDateKey ProductKe
yStoreKey RegionKey Quantity SalesAmount
20101107 106 01 1 6 30.00
20101107 103 04 2 1 17.00
20101107 109 04 2 2 20.00
20101107 103 03 2 1 17.00
20101107 106 05 3 4 20.00
20101108 106 02 1 5 25.00
OrderDateKey ProductKey
StoreKey RegionKey Quantity SalesAmount
20101108 102 02 1 1 14.00
20101108 106 03 2 5 25.00
20101108 109 01 1 1 10.00
20101109 106 04 2 4 20.00
20101109 106 04 2 5 25.00
20101109 103 01 1 1 17.00
VERTICALLY PARTITION (SEGMENTS)OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
ProductKey
106
103
109
103
106
106
StoreKey
01
04
04
03
05
02
RegionKey
1
2
2
2
3
1
Quantity
6
1
2
1
4
5
SalesAmount
30.00
17.00
20.00
17.00
20.00
25.00
OrderDateKey
20101108
20101108
20101108
20101109
20101109
20101109
ProductKey
102
106
109
106
106
103
StoreKey
02
03
01
04
04
01
RegionKey
1
2
1
2
2
1
Quantity
1
5
1
4
5
1
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
COMPRESS EACH SEGMENT*
OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
ProductKey
106
103
109
103
106
106
StoreKey
01
04
04
03
05
02
RegionKey
1
2
2
2
3
1
Quantity
6
1
2
1
4
5
SalesAmount
30.00
17.00
20.00
17.00
20.00
25.00
Some segments will compress more than others
OrderDateKey
20101108
20101108
20101108
20101109
20101109
20101109
ProductKey
102
106
109
106
106
103
StoreKey
02
03
01
04
04
01
RegionKey
1
2
1
2
2
1
Quantity
1
5
1
4
5
1
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
*Encoding and reordering not shown
HOW DOES COLUMNSTORE SPEED UP QUERIES? (2)
Fetches only needed columns from disk• Less IO• Better buffer hit rates
C1
C2
C4 C5 C6
C3
SELECT region, sum (sales) …
EXAMPLE OF FETCHING ONLY NEEDED COLUMNS
SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108
StoreKey
01
04
04
03
05
02
StoreKey
02
03
01
04
04
01
RegionKey
1
2
2
2
3
1
RegionKey
1
2
1
2
2
1
Quantity
6
1
2
1
4
5
Quantity
1
5
1
4
5
1
OrderDateKey
20101107
20101107
20101107
20101107
20101107
20101108
OrderDateKey
20101108
20101108
20101108
20101109
20101109
20101109
ProductKey
106
103
109
103
106
106
ProductKey
102
106
109
106
106
103
SalesAmount
30.00
17.00
20.00
17.00
20.00
25.00
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
HOW DOES COLUMNSTORE SPEED UP QUERIES? (3)
Advanced query-processing technology• Batch-mode execution of some operations• Processes column data in batches• Groups of batch operations in query plan
• Compact data representation• Highly efficient algorithms• Better parallelism
WHAT IS COLUMNSTORE INDEX?
• ColumnStore (CS) index is nonclustered (secondary)
• Base table can be clustered index or heap
• One CS index per table• Multiple other nonclustered (B-tree)
indexes allowed• But may not be needed
• CS index must be partition-aligned if table is partitionedIndexed
viewFiltered index
- No CS index on indexed view- No CS as filtered index
Base table OR
Clustered index
Heap
Nonclustered index
Nonclustered index
Nonclustered ColumnStore
index
WHAT IS COLUMNSTORE INDEX? RESTRICTIONS
• Only on common business data types• int, real, string, money, datetime, decimal
<= 18 digitsData Types
• 1024 columns• NC index onlyIndex Creation
• Limited operations• Can read but cannot update the data• Can switch partitions in and out
Maintain table
• All read-only T-SQL queries run• Some queries are accelerated more than
others
Process queries
HOW DO I USE IT? INDEX CREATION
Create a ColumnStore index• Create the table• Load data into the table• Create a non-clustered ColumnStore index on all, or some, columnsCREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON myTable(OrderDate, ProductID, SaleAmount)
Object Explorer
HOW DO I USE IT? RUNNING QUERIES
Let the query optimizer do the work
Optimizer makes a cost-based decision
• Columnstore index• Clustered (row-based) index• Nonclustered (row-based) index• Heap
Data access method
• Batch mode• Row mode
Processing mode
HOW DO I USE IT? MEMORY MANAGEMENT
• Memory management is automatic• ColumnStore is persisted on disk• Needed columns fetched into memory• ColumnStore segments flow between disk and memory
SELECT C2, SUM(C4)FROM TGROUP BY C2;
T.C2
T.C4
T.C2
T.C4
T.C2
T.C2
T.C2
T.C1 T.C
1
T.C1
T.C1
T.C1
T.C3
T.C3
T.C3
T.C3
T.C3
T.C4
T.C4
T.C4
HOW DO I USE IT? RUNNING QUERIES
Use the ColumnStore
index
Use a different index
Ignore ColumnStore
index
select distinct (SalesTerritoryKey)from dbo.FactResellerSales with (index (ncci))
select distinct (SalesTerritoryKey)from dbo.FactResellerSales with (index (ci))
select distinct (SalesTerritoryKey)from dbo.FactResellerSales option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
Using Index Hints with ColumnStore Queries
HOW DO I USE IT? RUNNING QUERIES
RUNNING QUERIESDEMO
HOW TO ADD DATA TO A TABLE WITH A COLUMNSTORE INDEX
Method 1: Disable the ColumnStore index
Disable (or drop) the index• ALTER INDEX my_index ON MyTable
DISABLE
Update the table
Rebuild the columnstore index• ALTER INDEX my_index ON MyTable
REBUILD
HOW TO ADD DATA TO A TABLE WITH A COLUMNSTORE INDEX
Switch the partition into the table
ALTER TABLE StagingT SWITCH TO T PARTITION 5
Build a ColumnStore indexCREATE NONCLUSTERED COLUMNSTORE INDEX my_indexON StagingT(OrderDate, ProductID, SaleAmount)
Load new data into a staging table
Method 2: Use Partitioning
HOW TO ADD DATA TO A TABLE WITH A COLUMNSTORE INDEX
Build ColumnStore
index on partitioned
primary table
Create staging table with no ColumnStore
index
Insert new data into (row-based)
staging table
Query both tables; UNION
ALL to combine results
Periodically build CS index on staging table• Switch staging table into an empty partition of the
primary table
Method 3: Union All
TROUBLESHOOTING: CREATING THE COLUMNSTORE INDEX
• Memory• Ensure enough memory• Memory requirement related to #columns, data, DOP• Memory available ≠ memory on the box with concurrent activity• By default, query is restricted to 25% even when RG not enabled• Check Showplan XML for memory grant info• Rough estimate:
Memory grant request in MB = [(4.2 * Num of columns in the CS index) + 68] * DOP + (Num of string cols * 34)
TROUBLESHOOTING: CREATING THE COLUMNSTORE INDEX
Parallelism• Index build is parallel only if table has > 1 M
rowsSize and other info: check new catalog views• Sys.column_store_segments• Sys.column_store_dictionaries
TROUBLESHOOTING: QUERY PERFORMANCE
• All needed columns present?• Cardinality estimate? If selective, optimizer
will choose a B-Tree
Is the ColumnStore
index being used?
• Too many indexes + bad statistics can confuse the optimizer
• Consider using hints and/or disabling other indexes
Are other non-clustered indexes
being used?
• Sorts, spills, table spools?• Is a lot of data being returned to the client?• Not all bottlenecks are query processing
Are there issues unrelated to the
ColumnStore index?
TROUBLESHOOTING: QUERY PERFORMANCE
• Is batch mode being used to process most of the data?• ColumnStore index?• Outer joins?• DOP?• Loop join? Check cardinality estimate• Other operators? o Batch-enabled:
– Scan, filter, project– Local hash partial aggregation– Hash inner join, hash table build
TROUBLESHOOTING: QUERY PERFORMANCE
Filters or joins on strings?• Filters on strings are not pushed into
storage engine• Joins on integers are more efficient
Filter with “OR”?• IN-lists but not OR filters pushed down
Hash tables don’t fit into memory?• Usually due to small memory grant
based on CE error, not physical memory limitation
• Fall back to row mode processing• Slower than a row mode join
WHEN TO BUILD A COLUMNSTORE INDEX
Read-mostly workload
Most updates are appending new data
• Typically a nightly load window
Your workflow permits using partitioning (or drop rebuild index) to handle new data
Most queries fit a star join pattern or entail scanning and aggregating large amounts of data
Build a ColumnStore index on your large fact tables
Consider a ColumnStore index if you have a very large dimension table
WHEN NOT TO BUILD A COLUMNSTORE INDEX
Frequent updates
You need to update data and partition switching or
rebuilding index does not fit your workflow
Frequent small look up queries• B-tree indexes may give better
performance
Your workload does not benefit
BEST PRACTICES FOR CREATING A COLUMNSTORE INDEX
• Build CS index on fact tables• Consider for large dimension tables
Use a star schema when possible
• Don’t use to seek into a row• Order of listed columns not important
Include all the columns in the CS index
Convert decimal to precision <= 18 if possible
Use integer types whenever possible
BEST PRACTICES FOR CREATING A COLUMNSTORE INDEX
• Ensure enough memory to build the CS index• Consider table partitioning to facilitate updates• Consider modifying queries to hit the “sweet spot”
• Star joins• Inner joins• Group By
• Keep statistics up to date• Use MAXDOP > 1• Check query plans for use of ColumnStore index and batch
mode processing• Consider creating the CS index from a clustered index
• Better segment elimination when predicate on key• Slightly better compression (no RID)
CREATE THE CS INDEX FROM A CLUSTERED INDEX
Min 2011-01-01Max 2011-01-25
CREATE TABLE T2 (TxDate DATE, CustId INT, ProdId INT, Amt FLOAT);
CREATE CLUSTERED INDEX ci ON T2 (TxDate, CustId);
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON T2 (TxDate, CustId, ProdId, Amt);
SELECT CustId, sum(Amt) FROM T2 WHERE TxDate < '2011-01-15' GROUP BY CustId;
Min 2011-01-26Max 2011-02-14
Min 2011-02-14Max 2011-03-02
Min 1Max 415
Min 5Max 378
Min 19Max 392
Min 18Max 230
Min 10.65Max 88.62
Min 165Max 400
Min 8Max 258
Min 22.63Max 120.41
Min 5.95Max 96.25
CREATE THE CS INDEX FROM A CLUSTERED INDEX
Min 2011-01-01Max 2011-01-25
CREATE TABLE T2 (TxDate DATE, CustId INT, ProdId INT, Amt FLOAT);
CREATE CLUSTERED INDEX ci ON T2 (TxDate, CustId);
CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON T2 (TxDate, CustId, ProdId, Amt);
SELECT CustId, sum(Amt) FROM T2 WHERE TxDate < '2011-01-15' GROUP BY CustId;
Only need to read 3 segments
Min 2011-01-26Max 2011-02-14
Min 2011-02-14Max 2011-03-02
Min 1Max 415
Min 5Max 378
Min 19Max 392
Min 18Max 230
Min 10.65Max 88.62
Min 165Max 400
Min 8Max 258
Min 22.63Max 120.41
Min 5.95Max 96.25
CREATE THE CS INDEX FROM A CLUSTERED INDEX
Min 2011-01-01Max 2011-01-25
What if the query is:
SELECT CustId, sum(Amt) FROM T2 WHERE ProdId IN (365, 385, 391, 393) GROUP BY CustId;
Or
SELECT CustId, ProdId, sum(Amt) FROM T2 WHERE CustId BETWEEN 50 AND 100GROUP BY CustId, ProdId;
Min 2011-01-26Max 2011-02-14
Min 2011-02-14Max 2011-03-02
Min 1Max 415
Min 5Max 378
Min 19Max 392
Min 18Max 230
Min 10.65Max 88.62
Min 165Max 400
Min 8Max 258
Min 22.63Max 120.41
Min 5.95Max 96.25
SUMMARY: COLUMNSTORE IN A NUTSHELL
Astonishing speedup for DW queriesGreat compression
ColumnStore technology+
Advanced query processing