Download - SQL Server 2012 In Memory Column Store Index Deep Dive

© 2011 MicrosoftLast Updated: Monday, April 10, 2023

SQL SERVER 2012 MOMENTUM WORKSHOP: COLUMNSTORE INDEX DEEP DIVE

IS THIS YOUR DATA WAREHOUSE EXPERIENCE?

Waiting…

COLUMNSTORE INDEXES

• Interactive experiences with data• Near instant response times on

large data sets• Ad hoc and reporting queries

• Easy to use• Reduced need for summary

tables, indexed views, OLAP cubes

• Fewer indexes to design and maintain

• Reduced need to manually tune queries

• Lower TCO• Reduced people costs• Lower hardware costs

Waiting

AGENDA

• ColumnStore Definition• How Does ColumnStore Accelerate Queries?• What is a ColumnStore Index?

• How Do I Use It?• When Should I Use It?• When Should I Not Use It?

• Design Schemas and Queries to Take Best Advantage of the Power of ColumnStore Indexes

• How to Add Data to a Table with a ColumnStore Index• Anticipate and Troubleshoot Challenges Related to Memory

Usage and Query Plans when Using ColumnStore Indexes• When to/When Not to Build ColumnStore Indexes• Best Practices for Creating ColumnStore Indexes

COLUMNSTORE DEFINITION

• New type of indexColumnStore

offering in SQL Server

• Processes data in batchesNew query processing mode

• Typical data warehouse queries• SQL Server relational data warehouses, up to

10s of TB

Accelerates targeted workloads

COLUMNSTOREDEMO

HOW DOES COLUMNSTORE SPEED UP QUERIES? (1)

…

C1 C2

C3 C5 C6C4

Heaps, B-trees store data row-wise

ColumnStore indexes store data column-wise

• Each page stores data from a single column

Highly compressed• About 2x better than PAGE compression• More data fits in memory

Each column can be accessed independently

• Fetch only needed columns• Can dramatically decrease IO


ColumnStore Index Structure• Segment contains values from

one column for a set of rows• Segments for the same set of rows

comprise a row group• Segments are compressed• Each segment stored in a separate

LOB• Segment is unit of transfer

between disk and memory

C1 C2

C3 C5 C6C4

Row group

Segment

COLUMNSTORE INDEX EXAMPLE

OrderDateKey ProductKey

StoreKey RegionKey Quantity SalesAmount

20101107 106 01 1 6 30.00

20101107 103 04 2 1 17.00

20101107 109 04 2 2 20.00

20101107 103 03 2 1 17.00

20101107 106 05 3 4 20.00

20101108 106 02 1 5 25.00

20101108 102 02 1 1 14.00

20101108 106 03 2 5 25.00

20101108 109 01 1 1 10.00

20101109 106 04 2 4 20.00

20101109 106 04 2 5 25.00

20101109 103 01 1 1 17.00

HORIZONTALLY PARTITION (ROW GROUPS)OrderDateKey ProductKe

yStoreKey RegionKey Quantity SalesAmount

20101107 106 01 1 6 30.00

20101107 103 04 2 1 17.00

20101107 109 04 2 2 20.00

20101107 103 03 2 1 17.00

20101107 106 05 3 4 20.00

20101108 106 02 1 5 25.00

OrderDateKey ProductKey

StoreKey RegionKey Quantity SalesAmount

20101108 102 02 1 1 14.00

20101108 106 03 2 5 25.00

20101108 109 01 1 1 10.00

20101109 106 04 2 4 20.00

20101109 106 04 2 5 25.00

20101109 103 01 1 1 17.00

VERTICALLY PARTITION (SEGMENTS)OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

ProductKey

106

103

109

103

106

106

StoreKey

01

04

04

03

05

02

RegionKey

1

2

2

2

3

1

Quantity

6

1

2

1

4

5

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

102

106

109

106

106

103

StoreKey

02

03

01

04

04

01

RegionKey

1

2

1

2

2

1

Quantity

1

5

1

4

5

1

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

COMPRESS EACH SEGMENT*

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

ProductKey

106

103

109

103

106

106

StoreKey

01

04

04

03

05

02

RegionKey

1

2

2

2

3

1

Quantity

6

1

2

1

4

5

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

Some segments will compress more than others

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

102

106

109

106

106

103

StoreKey

02

03

01

04

04

01

RegionKey

1

2

1

2

2

1

Quantity

1

5

1

4

5

1

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00

*Encoding and reordering not shown


Fetches only needed columns from disk• Less IO• Better buffer hit rates

C1

C2

C4 C5 C6

C3

SELECT region, sum (sales) …

EXAMPLE OF FETCHING ONLY NEEDED COLUMNS

SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < 20101108

StoreKey

01

04

04

03

05

02

StoreKey

02

03

01

04

04

01

RegionKey

1

2

2

2

3

1

RegionKey

1

2

1

2

2

1

Quantity

6

1

2

1

4

5

Quantity

1

5

1

4

5

1

OrderDateKey

20101107

20101107

20101107

20101107

20101107

20101108

OrderDateKey

20101108

20101108

20101108

20101109

20101109

20101109

ProductKey

106

103

109

103

106

106

ProductKey

102

106

109

106

106

103

SalesAmount

30.00

17.00

20.00

17.00

20.00

25.00

SalesAmount

14.00

25.00

10.00

20.00

25.00

17.00


Advanced query-processing technology• Batch-mode execution of some operations• Processes column data in batches• Groups of batch operations in query plan

• Compact data representation• Highly efficient algorithms• Better parallelism

WHAT IS COLUMNSTORE INDEX?

• ColumnStore (CS) index is nonclustered (secondary)

• Base table can be clustered index or heap

• One CS index per table• Multiple other nonclustered (B-tree)

indexes allowed• But may not be needed

• CS index must be partition-aligned if table is partitionedIndexed

viewFiltered index

- No CS index on indexed view- No CS as filtered index

Base table OR

Clustered index

Heap

Nonclustered index

Nonclustered index

Nonclustered ColumnStore

index

WHAT IS COLUMNSTORE INDEX? RESTRICTIONS

• Only on common business data types• int, real, string, money, datetime, decimal

<= 18 digitsData Types

• 1024 columns• NC index onlyIndex Creation

• Limited operations• Can read but cannot update the data• Can switch partitions in and out

Maintain table

• All read-only T-SQL queries run• Some queries are accelerated more than

others

Process queries

HOW DO I USE IT? INDEX CREATION

Create a ColumnStore index• Create the table• Load data into the table• Create a non-clustered ColumnStore index on all, or some, columnsCREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON myTable(OrderDate, ProductID, SaleAmount)

Object Explorer

HOW DO I USE IT? RUNNING QUERIES

Let the query optimizer do the work

Optimizer makes a cost-based decision

• Columnstore index• Clustered (row-based) index• Nonclustered (row-based) index• Heap

Data access method

• Batch mode• Row mode

Processing mode

HOW DO I USE IT? MEMORY MANAGEMENT

• Memory management is automatic• ColumnStore is persisted on disk• Needed columns fetched into memory• ColumnStore segments flow between disk and memory

SELECT C2, SUM(C4)FROM TGROUP BY C2;

T.C2

T.C4

T.C2

T.C4

T.C2

T.C2

T.C2

T.C1 T.C

1

T.C1

T.C1

T.C1

T.C3

T.C3

T.C3

T.C3

T.C3

T.C4

T.C4

T.C4


Use the ColumnStore

index

Use a different index

Ignore ColumnStore

index

select distinct (SalesTerritoryKey)from dbo.FactResellerSales with (index (ncci))

select distinct (SalesTerritoryKey)from dbo.FactResellerSales with (index (ci))

select distinct (SalesTerritoryKey)from dbo.FactResellerSales option (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)

Using Index Hints with ColumnStore Queries

RUNNING QUERIESDEMO

HOW TO ADD DATA TO A TABLE WITH A COLUMNSTORE INDEX

Method 1: Disable the ColumnStore index

Disable (or drop) the index• ALTER INDEX my_index ON MyTable

DISABLE

Update the table

Rebuild the columnstore index• ALTER INDEX my_index ON MyTable

REBUILD


Switch the partition into the table

ALTER TABLE StagingT SWITCH TO T PARTITION 5

Build a ColumnStore indexCREATE NONCLUSTERED COLUMNSTORE INDEX my_indexON StagingT(OrderDate, ProductID, SaleAmount)

Load new data into a staging table

Method 2: Use Partitioning


Build ColumnStore

index on partitioned

primary table

Create staging table with no ColumnStore

index

Insert new data into (row-based)

staging table

Query both tables; UNION

ALL to combine results

Periodically build CS index on staging table• Switch staging table into an empty partition of the

primary table

Method 3: Union All

TROUBLESHOOTING: CREATING THE COLUMNSTORE INDEX

• Memory• Ensure enough memory• Memory requirement related to #columns, data, DOP• Memory available ≠ memory on the box with concurrent activity• By default, query is restricted to 25% even when RG not enabled• Check Showplan XML for memory grant info• Rough estimate:

Memory grant request in MB = [(4.2 * Num of columns in the CS index) + 68] * DOP + (Num of string cols * 34)

TROUBLESHOOTING: CREATING THE COLUMNSTORE INDEX

Parallelism• Index build is parallel only if table has > 1 M

rowsSize and other info: check new catalog views• Sys.column_store_segments• Sys.column_store_dictionaries

TROUBLESHOOTING: QUERY PERFORMANCE

• All needed columns present?• Cardinality estimate? If selective, optimizer

will choose a B-Tree

Is the ColumnStore

index being used?

• Too many indexes + bad statistics can confuse the optimizer

• Consider using hints and/or disabling other indexes

Are other non-clustered indexes

being used?

• Sorts, spills, table spools?• Is a lot of data being returned to the client?• Not all bottlenecks are query processing

Are there issues unrelated to the

ColumnStore index?


• Is batch mode being used to process most of the data?• ColumnStore index?• Outer joins?• DOP?• Loop join? Check cardinality estimate• Other operators? o Batch-enabled:

– Scan, filter, project– Local hash partial aggregation– Hash inner join, hash table build


Filters or joins on strings?• Filters on strings are not pushed into

storage engine• Joins on integers are more efficient

Filter with “OR”?• IN-lists but not OR filters pushed down

Hash tables don’t fit into memory?• Usually due to small memory grant

based on CE error, not physical memory limitation

• Fall back to row mode processing• Slower than a row mode join

WHEN TO BUILD A COLUMNSTORE INDEX

Read-mostly workload

Most updates are appending new data

• Typically a nightly load window

Your workflow permits using partitioning (or drop rebuild index) to handle new data

Most queries fit a star join pattern or entail scanning and aggregating large amounts of data

Build a ColumnStore index on your large fact tables

Consider a ColumnStore index if you have a very large dimension table

WHEN NOT TO BUILD A COLUMNSTORE INDEX

Frequent updates

You need to update data and partition switching or

rebuilding index does not fit your workflow

Frequent small look up queries• B-tree indexes may give better

performance

Your workload does not benefit

BEST PRACTICES FOR CREATING A COLUMNSTORE INDEX

• Build CS index on fact tables• Consider for large dimension tables

Use a star schema when possible

• Don’t use to seek into a row• Order of listed columns not important

Include all the columns in the CS index

Convert decimal to precision <= 18 if possible

Use integer types whenever possible

BEST PRACTICES FOR CREATING A COLUMNSTORE INDEX

• Ensure enough memory to build the CS index• Consider table partitioning to facilitate updates• Consider modifying queries to hit the “sweet spot”

• Star joins• Inner joins• Group By

• Keep statistics up to date• Use MAXDOP > 1• Check query plans for use of ColumnStore index and batch

mode processing• Consider creating the CS index from a clustered index

• Better segment elimination when predicate on key• Slightly better compression (no RID)

CREATE THE CS INDEX FROM A CLUSTERED INDEX

Min 2011-01-01Max 2011-01-25

CREATE TABLE T2 (TxDate DATE, CustId INT, ProdId INT, Amt FLOAT);

CREATE CLUSTERED INDEX ci ON T2 (TxDate, CustId);

CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON T2 (TxDate, CustId, ProdId, Amt);

SELECT CustId, sum(Amt) FROM T2 WHERE TxDate < '2011-01-15' GROUP BY CustId;

Min 2011-01-26Max 2011-02-14

Min 2011-02-14Max 2011-03-02

Min 1Max 415

Min 5Max 378

Min 19Max 392

Min 18Max 230

Min 10.65Max 88.62

Min 165Max 400

Min 8Max 258

Min 22.63Max 120.41

Min 5.95Max 96.25


Min 2011-01-01Max 2011-01-25

CREATE TABLE T2 (TxDate DATE, CustId INT, ProdId INT, Amt FLOAT);

CREATE CLUSTERED INDEX ci ON T2 (TxDate, CustId);

CREATE NONCLUSTERED COLUMNSTORE INDEX ncci ON T2 (TxDate, CustId, ProdId, Amt);

SELECT CustId, sum(Amt) FROM T2 WHERE TxDate < '2011-01-15' GROUP BY CustId;

Only need to read 3 segments

Min 2011-01-26Max 2011-02-14

Min 2011-02-14Max 2011-03-02

Min 1Max 415

Min 5Max 378

Min 19Max 392

Min 18Max 230

Min 10.65Max 88.62

Min 165Max 400

Min 8Max 258

Min 22.63Max 120.41

Min 5.95Max 96.25


Min 2011-01-01Max 2011-01-25

What if the query is:

SELECT CustId, sum(Amt) FROM T2 WHERE ProdId IN (365, 385, 391, 393) GROUP BY CustId;

Or

SELECT CustId, ProdId, sum(Amt) FROM T2 WHERE CustId BETWEEN 50 AND 100GROUP BY CustId, ProdId;

Min 2011-01-26Max 2011-02-14

Min 2011-02-14Max 2011-03-02

Min 1Max 415

Min 5Max 378

Min 19Max 392

Min 18Max 230

Min 10.65Max 88.62

Min 165Max 400

Min 8Max 258

Min 22.63Max 120.41

Min 5.95Max 96.25

SUMMARY: COLUMNSTORE IN A NUTSHELL

Astonishing speedup for DW queriesGreat compression

ColumnStore technology+

Advanced query processing

Download - SQL Server 2012 In Memory Column Store Index Deep Dive

Top Related