1 2 3 the result is all sales territory country

Post on 04-Jan-2016

243 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Querying in DAX

Marco Russo

DBI-B315

How Many of You…

Have used MDX on a solution?Love MDX? Well… at least like it! Understand cell-by-cell versus block mode evaluation?Can write a medium complex MDX query in a textbox?Understand the FREEZE statement?Believe that MDX is utterly complicated?

Why Another Language?SQL Server 2012 Analysis Services

Multidimensional: MDXTabular: new engineRequires a new languageDAX is the answer

DAX is designed to be SimpleFastNot necessarily easy

Evolution of DAXPowerPivot 2010

Introduction of DAX as a programming languageCalculated Columns, MeasuresFilter Context / Evaluation Context

Excel is the preferred clientMDX is the query language

PowerPivot 2012 / Analysis Services 2012DAX becomes a query languageSQL Server Profiler fires Query Plan eventsOptimization is now feasible, even for normal human beings!

Agenda• Basic DAX Query Language Features• Query Functions in DAX• Handling Complex Filters• Basic Information on the DAX Query Plan• SQL Server Profiler• Optimization Examples

DAX Query Syntax

DAX as a query language

Only one statement

EVALUATEFollowed by any DAX table expressionReturns the result as a dataset

Looks simple!

EVALUATE syntax[DEFINE

{ MEASURE <table>[<col>] = <expression> }]

EVALUATE <Table Expression>

[ORDER BY {<expression> [{ASC | DESC}]} [, …]

[START AT {<value>|<parameter>} [, …]] ]

EVALUATE exampleEVALUATE

'Sales Territory'ORDER BY 'Sales Territory'[Sales Territory Group] ASC, 'Sales Territory'[Sales Territory Country] ASC, 'Sales Territory'[Sales Territory Region]START AT

"North America", "United States"

EVALUATE and FILTEREVALUATE

FILTER ( FILTER ( 'Internet Sales', [Ship Date] > [Due Date] ), 'Internet Sales'[Unit Price] > 2 )

CALCULATETABLEEVALUATE

CALCULATETABLE ( 'Sales Territory', [Sales Territory Country] = "United States"

)

ORDER BY 'Sales Territory'[Sales Territory Country]

CALCULATE evaluation orderEVALUATE

CALCULATETABLE ( CALCULATETABLE ( 'Sales Territory', ALL ( [Sales Territory Country] ) ), [Sales Territory Country] = "United States" )

1

2

3

The result is ALL Sales Territory Country

ADDCOLUMNSEVALUATE

ADDCOLUMNS ( 'Internet Sales', "Margin", ([Sales Amount] - [Total Product Cost]))

SUMMARIZEEVALUATE

SUMMARIZE( 'Internet Sales', 'Internet Sales'[Order Date], "Sales Amount", SUM ( 'Internet Sales'[Sales Amount] ) )

Source Table

GROUP BY

Expression to evaluate

SUMMARIZE with ROLLUPEVALUATE

SUMMARIZE( 'Internet Sales', ROLLUP ( 'Date'[Calendar Year] ), "Sales Amount", SUM('Internet Sales'[Sales Amount]) )

Compute sub-totals at the Year

level

CROSSJOINEVALUATE

ADDCOLUMNS ( CROSSJOIN ( DISTINCT ( 'Product'[Color] ), DISTINCT ( 'Product'[Size] ) ), "Products", COUNTROWS ( RELATEDTABLE ( Product ) ) )

Query measuresDEFINE MEASURE 'Product Category'[Subcategories Count] = COUNTROWS ( RELATEDTABLE ('Product Subcategory' ) ) MEASURE 'Product Category'[Products Count] = COUNTROWS ( RELATEDTABLE ('Product' ) )EVALUATE ADDCOLUMNS ( 'Product Category', "SubCategories", [Subcategories Count], "Products Count", 'Product Category'[Products Count] )

Query measures are fast!MultidimensionalCalculated members defined in the query can be very slowLimitations in the cache usage

TabularNo performance lossCache always active, no query vs global cacheIf Formula Engine is involved, no parallelism happens

Very useful during development

GENERATE and a complex exampleEVALUATE GENERATE ( VALUES ( 'Product Category'[Product Category Name] ), FILTER ( CALCULATETABLE ( ADDCOLUMNS ( TOPN ( 3, VALUES ( 'Product'[Product Name] ), [Internet Total Sales] ), "Sales", [Internet Total Sales] ) ), [Sales] > 0 ) )

Querying Tabular – First recap

DAX as a query languageVery simple: one single instruction makes everything: EVALUATEA query is a function call that returns a tableAny DAX function can be used in the queryQuery-defined measures work very well

At this point, we know DAX as a query language…

Optimizing DAX

Optimizing DAX is important

Data ModelTabular is a simple data modelNot many options to optimize the modelDAX is the key to gain optimal performances

Query PlanMultidimensional has no query plansTabular makes the query plan visible

DAX query architecture

Analysis Services 2012Tabular Model

In-Memory Mode VertiPaq StorageQuery

Storage Engine Query

DAX / MDX query

DirectQuery Mode External Data Sources

Query SQL Query

Process = Read Data from External Sources

Query engines

FE - Formula EngineStill single threadedHandles Complex Expressions

SE - Storage EngineWhat?Storage Engine in an in-memory database?Yes, we like to call it this way! xVelocity in-memory Analytics Engineaka VertiPaq

Optimizing means using the SE and avoiding FE

Microsoft view of Tabular: Rich & Fast DAX VertiPaq Query

Rich

Single thread per query

Designed for expressivity

Simple

One core per segment

Optimized for speed

Formula engine

Formula engine Vertipaq

28

DAX query evaluation flowBuild DAX Expression Tree

Buld DAX Logical Plan

Simplify DAX Logical Plan

Build DAX Physical Plan

Execute DAX Physical Plan

Some Vertipaq queries can be executedin order to gather additional informationabout data that is to be queried

Fire Logical Plan Event

Fire Physical Plan Event

SQL Server Profiler

Catches events from SSASQueries Events Query Processing

DAX Studio – daxstudio.codeplex.comFuture release will capture Dax Query Plan

Monitoring a queryEVALUATE

CALCULATETABLE ( SUMMARIZE ( 'Internet Sales', Geography[State Province Code], "Sales", SUM ( 'Internet Sales'[Sales Amount] ) ), FILTER ( Customer, Customer[Last Name] = "Anand" ))

1° VertiPaq querySELECT [Customer].[CustomerKey]FROM [Customer]WHERE

( PFDATAID( [Customer].[LastName] ) = 81 )

2° VertiPaq querySELECT [Geography].[StateProvinceCode] FROM [Internet Sales]

LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey] = [Customer].[CustomerKey]

LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey] = [Geography].[GeographyKey]WHERE

[Customer].[CustomerKey] IN (11096, 11989, 17005, 22513, 28899, 15054, 19626, 20344, 25918, 27141... [74 total values, not all displayed]);

DAX logical query planCalculateTable AddColumns Scan_VertiPaq GroupBy_VertiPaq Scan_VertiPaq Sum_VertiPaq Scan_VertiPaq 'Internet Sales'[Sales Amount] Filter_VertiPaq Scan_VertiPaq 'Customer'[Last Name] = Anand

Query plans side by side

CalculateTable AddColumns Scan_VertiPaq GroupBy_VertiPaq Scan_VertiPaq Sum_VertiPaq Scan_VertiPaq [Sales Amount] Filter_VertiPaq Scan_VertiPaq 'Customer'[Last Name] = Anand

EVALUATECALCULATETABLE( SUMMARIZE( 'Internet Sales', Geography[State Province Code], "Sales", SUM( [Sales Amount] ) ), FILTER( Customer, Customer[Last Name] = "Anand" ))

Query running – Step 1SELECT [Geography].[StateProvinceCode], SUM([Internet Sales].[SalesAmount])FROM [Internet Sales]

LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey]=[Customer].[CustomerKey]

LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey]=[Geography].[GeographyKey]WHERE

[Customer].[CustomerKey] IN (11096, 11989, ...[74 total values, not all displayed]) VAND

[Geography].[StateProvinceCode] IN ('VIC', 'BC', ...[21 total values, not all displayed]);

Query running – Step 2SELECT [Geography].[StateProvinceCode] FROM [Internet Sales]

LEFT OUTER JOIN [Customer] ON [Internet Sales].[CustomerKey] = [Customer].[CustomerKey]

LEFT OUTER JOIN [Geography] ON [Customer].[GeographyKey] = [Geography].[GeographyKey]WHERE

[Customer].[CustomerKey] IN (11096, 11989, 17005, 22513, 28899, 15054, 19626, 20344, 25918, 27141... [74 total values, not all displayed]);

Simple query plan4 VertiPaq Queries

2 before execution2 during execution

1 performed GROUPBY and SUM in VertiPaq1 hit the cache

Intermediate results are spooledFinal JOIN performed by Formula Engine

This is a very good query plan

Place extended columns in ADDCOLUMNSSlower query

EVALUATESUMMARIZE ( Product, Product[Product Category Name], "Products", COUNTROWS ( Product ))

Optimized queryEVALUATEADDCOLUMNS ( SUMMARIZE ( Product, Product[Product Category Name] ), "Products", CALCULATE( COUNTROWS( Product ) ))

Always put extended columns in ADDCOLUMNS parameters.Just use SUMMARIZE for its group by and when the evaluation context ofSUMMARIZE is required (i.e. when a TOPN has to be executed for each group)http://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/

Currency conversion SUMX ( DimCurrency, SUMX ( DimDate, CALCULATE ( VALUES ( CurrencyRate[AverageRate] ) * SUM ( FactInternetSales[SalesAmount] ) ) ))

Currency conversion optimizedSUMX( CROSSJOIN( DimCurrency, DimDate ), CALCULATE( VALUES( CurrencyRate[AverageRate] ) * SUM( FactInternetSales[SalesAmount] ) ))

Using one SUMX, the optimizer pushes the

calculation in VertiPaq

Slow measures and filtersDEFINE MEASURE 'Internet Sales'[Sales] = CALCULATE (ROUND (SUM ('Internet Sales'[Sales Amount] ), 0 ) ) MEASURE 'Internet Sales'[YTD Sales] = TOTALYTD ( [Sales] , 'Date'[Date] ) MEASURE 'Internet Sales'[QTD Sales] = TOTALQTD ( [Sales] , 'Date'[Date] )EVALUATE

FILTER ( ADDCOLUMNS ( CROSSJOIN ( VALUES ( 'Date'[Calendar Year] ), VALUES ( 'Date'[Month] ), VALUES ( 'Date'[Month Name] ) ), "Sales", [Sales], "YTD Sales", [YTD Sales], "QTD Sales", [QTD Sales] ), NOT ISBLANK( [Sales] ))ORDER BY 'Date'[Calendar Year], 'Date'[Month]

These two measures are slow to compute

The filter is applied after computation

Filter as soon as you canDEFINE MEASURE 'Internet Sales'[Sales] = CALCULATE (ROUND (SUM ('Internet Sales'[Sales Amount] ), 0 ) ) MEASURE 'Internet Sales'[YTD Sales] = TOTALYTD ( [Sales] , 'Date'[Date] ) MEASURE 'Internet Sales'[QTD Sales] = TOTALQTD ( [Sales] , 'Date'[Date] )EVALUATE

ADDCOLUMNS ( FILTER ( CROSSJOIN ( VALUES ( 'Date'[Calendar Year] ), VALUES ( 'Date'[Month] ), VALUES ( 'Date'[Month Name] ) ), NOT ISBLANK ( [Sales] ) ), "Sales", [Sales], "YTD Sales", [YTD Sales], "QTD Sales", [QTD Sales])ORDER BY 'Date'[Calendar Year], 'Date'[Month]44

This query runs 7 times faster, even if it computes [Sales]

twice

Joins are expensiveEach join in the data model has a cost

But… can you remove joins?From snowflakesFrom lookup tablesAnd most important: from junk dimensions!

Don’t build junk dimensions, store strings in fact table!

Cache usageDAX Query PlansCache all VertiPaq Query Results

MDX QueriesIdentical to SSAS MultidimensionalCaches FE calculated members

Using MDX to query TabularBoth caches work

Using DAX to query TabularOnly VertiPaq results are cached

If you heavily hit Formula Engine, the

cache will not be really useful.

General rulesUse the simplest formulasPrefer ADDCOLUMNS to SUMMARIZEAvoid complex calculations in measuresDon’t use error-handling functions extensivelyThey are sloooooowGood only in measures

Build a correct data modelReducing distinct count of values for columns is the main targetKeep it simple, relational, clear

Push calculations down to the VertiPaq engineNot an easy task to do looking at query plansBut, after some experience, you will learn it

ConclusionsDAX is a simple languageVery effective in expressing complex queriesLooks strange, at the beginning, but it is easier than MDX

Optimizing Tabular means optimizing DAXOptimizing DAX means pushing calculations to VertiPaqSaid in other words… trust the Rain Man, he’s your best friend!

Sometimes, you will change the data model for this

DAX is simple, it is not easy…but this is the fun part of it!

msdn

Resources for Developers

http://microsoft.com/msdn

Learning

Microsoft Certification & Training Resources

www.microsoft.com/learning

TechNet

Resources

Sessions on Demand

http://channel9.msdn.com/Events/TechEd

Resources for IT Professionals

http://microsoft.com/technet

Complete an evaluation on CommNet and enter to win!

Evaluate this session

Scan this QR code to evaluate this session and be automatically entered in a drawing to win a prize

© 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related