microsoft sql server analysis services multidimensional

201

Upload: tiago-henrique-ribeiro-ferreira

Post on 26-Jan-2015

120 views

Category:

Technology


1 download

DESCRIPTION

Microsoft SQL Server Analysis Services Multidimensional

TRANSCRIPT

  • 1. Microsoft SQL Server Analysis Services Multidimensional Performance and Operations Guide Thomas Kejser and Denny Lee Contributors and Technical Reviewers: Peter Adshead (UBS), T.K. Anand, KaganArca, Andrew Calvett (UBS), Brad Daniels, John Desch, Marius Dumitru, WillfriedFrber (Trivadis), Alberto Ferrari (SQLBI), Marcel Franke (pmOne), Greg Galloway (Artis Consulting), Darren Gosbell (James & Monroe), DaeSeong Han, Siva Harinath, Thomas Ivarsson (Sigma AB), Alejandro Leguizamo (SolidQ), Alexei Khalyako, Edward Melomed, AkshaiMirchandani, Sanjay Nayyar (IM Group), TomislavPiasevoli, Carl Rabeler (SolidQ), Marco Russo (SQLBI), Ashvini Sharma, Didier Simon, John Sirmon, Richard Tkachuk, Andrea Uggetti, Elizabeth Vitt, Mike Vovchik, Christopher Webb (Crossjoin Consulting), SedatYogurtcuoglu, Anne Zorner Summary: Download this book to learn about Analysis Services Multidimensional performance tuning from an operational and development perspective. This book consolidates the previously published SQL Server 2008 R2 Analysis Services Operations Guide and SQL Server 2008 R2 Analysis Services Performance Guide into a single publication that you can view on portable devices. Category: Guide Applies to: SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012 Source: White paper (link to source content, link to source content) E-book publication date: May 2012 200 pages

2. This page intentionally left blank 3. Copyright 2012 by Microsoft Corporation All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means without the written permission of the publisher. Microsoft and the trademarks listed at http://www.microsoft.com/about/legal/en/us/IntellectualProperty/Trademarks/EN-US.aspx are trademarks of the Microsoft group of companies. All other marks are property of their respective owners. The example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. This book expresses the authors views and opinions. The information contained in this book is provided without any express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will be held liable for any damages caused or alleged to be caused either directly or indirectly by this book. 4. 4 Contents 1 Introduction..........................................................................................................................................5 2 Part1:BuildingaHighPerformanceCube...........................................................................................6 2.1 DesignPatternsforScalableCubes...........................................................................................6 2.2 TestingAnalysisServicesCubes..............................................................................................32 2.3 TuningQueryPerformance.....................................................................................................39 2.4 TuningProcessingPerformance..............................................................................................76 2.5 SpecialConsiderations............................................................................................................93 3 Part2:RunningaCubeinProduction...............................................................................................105 3.1 ConfiguringtheServer...........................................................................................................106 3.2 MonitoringandTuningtheServer........................................................................................133 3.3 SecurityandAuditing............................................................................................................143 3.4 HighAvailabilityandDisasterRecovery................................................................................147 3.5 DiagnosingandOptimizing....................................................................................................150 3.6 ServerMaintenance..............................................................................................................189 3.7 SpecialConsiderations..........................................................................................................192 4 Conclusion.........................................................................................................................................200 Sendfeedback...........................................................................................................................................200 5. 5 1 Introduction ThisbookconsolidatestwopreviouslypublishedguidesintooneessentialresourceforAnalysisServices developersandoperationspersonnel.AlthoughthetitlesoftheoriginalpublicationsindicateSQLServer 2008R2,mostoftheknowledgethatyougainfromthisbookiseasilytransferredtootherversionsof AnalysisServices,includingmultidimensionalmodelsbuiltusingSQLServer2012. Part1isfromtheSQLServer2008R2AnalysisServicesPerformanceGuide.PublishedinOctober 2011,thisguidewascreatedfordevelopersandcubedesignerswhowanttobuildhighperformance cubesusingbestpracticesandinsightslearnedfromrealworlddevelopmentprojects.InPart1,youll learnproventechniquesforbuildingsolutionsthatarefastertoprocessandquery,minimizingtheneed forfurthertuningdowntheroad. Part2isfromtheSQLServer2008R2AnalysisServicesOperationsGuide.Thisguide,publishedinJune 2011,isintendedfordevelopersandoperationsspecialistswhomanagesolutionsthatarealreadyin production.Part2showsyouhowtoextractperformancegainsfromaproductioncube,including changingserverandsystemproperties,andperformingsystemmaintenancethathelpyouavoid problemsbeforetheystart. Whileeachguidetargetsadifferentpartofasolutionlifecycle,havingbothinasingleportableformat givesyouanintellectualtoolkitthatyoucanaccessonmobiledeviceswhereveryoumaybe.Wehope youfindthisbookhelpfulandeasytouse,butitisonlyoneofseveralformatsavailableforthiscontent. YoucanalsogetprintableversionsofbothguidesbydownloadingthemfromtheMicrosoftwebsite. 6. 6 2 Part 1: Building a High-Performance Cube ThissectionprovidesinformationaboutbuildingandtuningAnalysisServicescubesforthebestpossible performance.Itisprimarilyaimedatbusinessintelligence(BI)developerswhoarebuildinganewcube fromscratchoroptimizinganexistingcubeforbetterperformance. Thegoalofthissectionistoprovideyouwiththenecessarybackgroundtounderstanddesigntradeoffs andwithtechniquesanddesignpatternsthatwillhelpyouachievethebestpossibleperformanceof evenlargecubes. Cubeperformancecanbedividedintotwotypesofworkload:queryperformanceandprocessing performance.Becausetheseworkloadsareverydifferent,thissectionisorganizedintofourmain groups. DesignPatternsforScalableCubesNoamountofquerytuningandoptimizationcanbeatthebenefits ofawelldesigneddatamodel.Thissectioncontainsguidancetohelpyougetthedesignrightthefirst time.Ingeneral,goodcubedesignfollowsKimballmodelingtechniques,andifyouavoidsometypical designmistakes,youareinverygoodshape. TestingAnalysisServicesCubesIneveryITproject,preproductiontestingisacrucialpartofthe developmentanddeploymentcycle.Evenwiththemostcarefuldesign,testingwillstillbeabletoshake outerrorsandavoidproductionissues.Designingandrunningatestrunofanenterprisecubeistime wellinvested.Hence,thissectionincludesadescriptionofthetestmethodsavailabletoyou. TuningQueryPerformanceQueryperformancedirectlyimpactsthequalityoftheenduser experience.Assuch,itistheprimarybenchmarkusedtoevaluatethesuccessofanonlineanalytical processing(OLAP)implementation.AnalysisServicesprovidesavarietyofmechanismstoaccelerate queryperformance,includingaggregations,caching,andindexeddataretrieval.Thissectionalso providesguidanceonwritingefficientMultidimensionalExpressions(MDX)calculationscripts. TuningProcessingPerformanceProcessingistheoperationthatrefreshesdatainanAnalysisServices database.Thefastertheprocessingperformance,thesooneruserscanaccessrefresheddata.Analysis Servicesprovidesavarietyofmechanismsthatyoucanusetoinfluenceprocessingperformance, includingparallelizedprocessingdesigns,relationaltuning,andaneconomicalprocessingstrategy(for example,incrementalversusfullrefreshversusproactivecaching). SpecialConsiderationsSomefeaturesofAnalysisServicessuchasdistinctcountmeasuresandmany tomanydimensionsrequiremorecarefulattentiontothecubedesignthanothers.AttheendofPart1, youwillfindasectionthatdescribesthespecialtechniquesyoushouldapplywhenusingthesefeatures. 2.1 Design Patterns for Scalable Cubes CubespresentauniquechallengetotheBIdeveloper:theyareadhocdatabasesthatareexpectedto respondtomostqueriesinshorttime.Thefreedomoftheenduserislimitedonlybythedatamodel youimplement.Achievingabalancebetweenuserfreedomandscalabledesignwilldeterminethe 7. 7 successofacube.Eachindustryhasspecificdesignpatternsthatlendthemselveswelltovalueadding reportingandadetailedtreatmentofoptimal,industryspecificdatamodelisoutsidethescopeofthis book.However,therearealotofcommondesignpatternsyoucanapplyacrossallindustriesthis sectiondealswiththesepatternsandhowyoucanleveragethemforincreasedscalabilityinyourcube design. 2.1.1 Building Optimal Dimensions AwelltuneddimensiondesignisoneofthemostcriticalsuccessfactorsofahighperformingAnalysis Servicessolution.Thedimensionsofthecubearethefirststopfordataanalysisandtheirdesignhasa deepimpactontheperformanceofallmeasuresinthecube. Dimensionsarecomposedofattributes,whicharerelatedtoeachotherthroughhierarchies.Efficient useofattributesisakeydesignskilltomaster,andstudyingandimplementingtheattribute relationshipsavailableinthebusinessmodelcanhelpimprovecubeperformance. Inthissection,youwillfindguidanceonbuildingoptimizeddimensionsandproperlyusingboth attributesandhierarchies. 2.1.1.1 Using the KeyColumns, ValueColumn, and NameColumn Properties Effectively Whenyouaddanewattributetoadimension,threepropertiesareusedtodefinetheattribute.The KeyColumnspropertyspecifiesoneormoresourcefieldsthatuniquelyidentifyeachinstanceofthe attribute. TheNameColumnpropertyspecifiesthesourcefieldthatwillbedisplayedtoendusers.Ifyoudonot specifyavaluefortheNameColumnproperty,itisautomaticallysettothevalueoftheKeyColumns property. ValueColumnallowsyoutocarryfurtherinformationabouttheattributetypicallyusedfor calculations.Unlikememberproperties,thispropertyofanattributeisstronglytypedproviding increasedperformancewhenitisusedincalculations.Thecontentsofthispropertycanbeaccessed throughtheMemberValueMDXfunction. UsingbothValueColumnandNameColumntocarryinformationeliminatestheneedforextraneous attributes.Thisreducesthetotalnumberofattributesinyourdesign,makingitmoreefficient. Itisabestpracticetoassignanumericsourcefield,ifavailable,totheKeyColumnspropertyratherthan astringproperty.Furthermore,useasinglecolumnkeyinsteadofacomposite,multicolumnkey.Not onlydothesepracticesthisreduceprocessingtime,theyalsoreducethesizeofthedimensionandthe likelihoodofusererrors.Thisisespeciallytrueforattributesthathavealargenumberofmembers,that is,greaterthanonemillionmembers. 8. 8 2.1.1.2 Hiding Attribute Hierarchies Formanydimensions,youwillwanttheusertonavigatehierarchiescreatedforeaseofaccess.For example,acustomerdimensioncouldbenavigatedbydrillingintocountryandcitybeforereachingthe customername,orbydrillingthroughagegroupsorincomelevels.Suchhierarchies,coveredinmore detaillater,makenavigationofthecubeeasierandmakequeriesmoreefficient. Inadditiontouserhierarchies,AnalysisServicesbydefaultcreatesaflathierarchyforeveryattributein adimensiontheseareattributehierarchies.Hidingattributehierarchiesisoftenagoodidea,because alotofhierarchiesinasingledimensionwilltypicallyconfuseusersandmakeclientqueriesless efficient.ConsidersettingAttributeHierarchyVisible=falseformostattributehierarchiesanduseuser hierarchiesinstead. 2.1.1.2.1 Hiding the Surrogate Key Itisoftenagoodideatohidethesurrogatekeyattributeinthedimension.Ifyouexposethesurrogate keytotheclienttoolsasaValueColumn,thosetoolsmayrefertothekeyvaluesinreports.The surrogatekeyinaKimballstarschemadesignholdsnobusinessinformation,andmayevenchangeif youremodeltype2history.Afteryoucreateadependencytothekeyintheclienttools,youcannot changethekeywithoutbreakingreports.Becauseofthis,youdontwantenduserreportsreferringto thesurrogatekeydirectlyandthisiswhywerecommendhidingit. Thebestdesignforasurrogatekeyistohideitfromusersinthedimensiondesignbysettingthe AttributeHierarchyVisible=falseandbynotincludingtheattributeinanyuserhierarchies.This preventsendusertoolsfromreferencingthesurrogatekey,leavingyoufreetochangethekeyvalueif requirementschange. 2.1.1.3 Setting or Disabling Ordering of Attributes Inmostcases,youwantanattributetohaveanexplicitordering.Forexample,youwillwantaCity attributetobesortedalphabetically.YoushouldexplicitlysettheOrderByorOrderByAttribute propertyoftheattributetoexplicitlycontrolthisordering.Typically,thisorderingisbyattributename orkey,butitmayalsobeanotherattribute.Ifyouincludeanattributeonlyforthepurposeofordering anotherattribute,makesureyousetAttributeHierarchyEnabled=falseand AttributeHierarchyOptimizedState=NotOptimizedtosaveonprocessingoperations. Therearefewcaseswhereyoudontcareabouttheorderingofanattribute,yetthesurrogatekeyis onesuchcase.Forsuchhiddenattributethatyouusedonlyforimplementationpurposes,youcanset AttributeHierarchyOrdered=falsetosavetimeduringprocessingofthedimension. 2.1.1.4 Setting Default Attribute Members Anyquerythatdoesnotexplicitlyreferenceahierarchywillusethecurrentmemberofthat hierarchy.ThedefaultbehaviorofAnalysisServicesistoassigntheAllmemberofadimensionasthe defaultmember,whichisnormallythedesiredbehavior.Butforsomeattributes,suchasthecurrent 9. 9 dayinadatedimension,itsometimesmakessensetoexplicitlyassignadefaultmember.Forexample, youmaysetadefaultdateintheAdventureWorkscubelikethis. ALTERCUBE [Adventure Works]UPDATE DIMENSION [Date], DEFAULT_MEMBER='[Date].[Date].&[2000]' However,defaultmembersmaycauseissuesintheclienttool.Forexample,MicrosoftExcel2010will notprovideavisualindicationthatadefaultmemberiscurrentlyselectedandhenceimplicitlyinfluence thequeryresult.ThismayconfuseuserswhoexpecttheAllleveltobethecurrentmemberwhenno othermembersareimpliedbythequery.Also,ifyousetadefaultmemberinadimensionwithmultiple hierarchies,youwilltypicallygetresultsthatarehardforuserstointerpret. Ingeneral,preferexplicitlydefaultmembersonlyondimensionswithsinglehierarchiesorinhierarchies thatdonothaveanAlllevel. 2.1.1.5 Removing the All Level MostdimensionsrolluptoacommonAlllevel,whichistheaggregationofalldescendants.Butthere aresomeexceptionswhereisdoesnotmakesensetoqueryattheAlllevel.Forexample,youmayhave acurrencydimensioninthecubeandaskingforthesumofallcurrenciesisameaninglessquestion. ItcanevenbeexpensivetoaskfortheAlllevelofdimensionifthereisnotgoodaggregatetorespondto thequery.Forexample,ifyouhaveacubepartitionedbycurrency,askingfortheAlllevelofcurrency willcauseascanofallpartitions,whichcouldbeexpensiveandleadtoauselessresult. InordertopreventusersfromqueryingmeaninglessAlllevels,youcandisabletheAllmemberina hierarchy.YoudothisbysettingtheIsAggregateable=falseontheattributeatthetopofthehierarchy. NotethatifyoudisabletheAlllevel,youshouldalsosetadefaultmemberasdescribedintheprevious sectionifyoudont,AnalysisServiceswillchooseoneforyou. 2.1.1.6 Identifying Attribute Relationships Attributerelationshipsdefinehierarchicaldependenciesbetweenattributes.Inotherwords,ifAhasa relatedattributeB,writtenAB,thereisonememberinBforeverymemberinA,andmanymembers inAforagivenmemberinB.Forexample,givenanattributerelationshipCityState,ifthecurrent cityisSeattle,weknowtheStatemustbeWashington. Often,therearerelationshipsbetweenattributesthatmightormightnotbemanifestedintheoriginal dimensiontablethatcanbeusedbytheAnalysisServicesenginetooptimizeperformance.Bydefault, allattributesarerelatedtothekey,andtheattributerelationshipdiagramrepresentsabushwhere relationshipsallstemfromthekeyattributeandendateachothersattribute. 10. 10 Figure 1 Youcano amodeln otherwor relationsh Figure 2 Attribute C sa A re A Considert attribute 11: Bushy a optimizeperfo nameidentifie rds,asingles hipsintheatt 22: Redefin relationships rossproducts avesCPUtime ggregationsb esourcesduri utoExistcan thecrosspro relationships attribute r ormancebyd estheproduc subcategoryis tributerelatio ned attribu shelpperform sbetweenlev eduringquer builtonattrib ngprocessing moreefficie oductbetwee havebeene elationship defininghiera ctlineandsu snotfoundin onshipeditor, ute relation manceinthre velsinthehie ries. butescanber gandforque ntlyeliminate nSubcategor xplicitlydefin ps rchicalrelatio bcategory,an nmorethano ,therelation nships eesignificant erarchydono reusedforqu eries. eattributeco ryandCatego ned,theengin onshipssupp ndthesubcat onecategory shipsareclea ways: otneedtogo ueriesonrelat ombinationst oryinthetwo nemustfirstf ortedbythe tegoryidentif .Ifyouredefi arer. throughthe tedattributes thatdonotex ofigures.Int findwhichpr data.Inthisc fiesacategor inethe keyattribute s.Thissaves xistinthedat hefirst,wher roductsarein case, ry.In .This ta. reno n 11. 11 eachsubcategoryandthendeterminewhichcategorieseachoftheseproductsbelongsto.Forlarge dimensions,thiscantakealongtime.Iftheattributerelationshipisdefined,theAnalysisServices engineknowsbeforehandwhichcategoryeachsubcategorybelongstoviaindexesbuiltatprocesstime. 2.1.1.6.1 Flexible vs. Rigid Relationships Whenanattributerelationshipisdefined,therelationcaneitherbeflexibleorrigid.Aflexibleattribute relationshipisonewherememberscanmovearoundduringdimensionupdates,andarigidattribute relationshipisonewherethememberrelationshipsareguaranteedtobefixed.Forexample,the relationshipbetweenmonthandyearisfixedbecauseaparticularmonthisntgoingtochangeitsyear whenthedimensionisreprocessed.However,therelationshipbetweencustomerandcitymaybe flexibleascustomersmove. Whenachangeisdetectedduringprocessinaflexiblerelationship,allindexesforpartitionsreferencing theaffecteddimension(includingtheindexesforattributethatarenotaffected)mustbeinvalidated. ThisisanexpensiveoperationandmaycauseProcessUpdateoperationstotakeaverylongtime. IndexesinvalidatedbychangesinflexiblerelationshipsmustberebuiltafteraProcessUpdateoperation withaProcessIndexontheaffectedpartitions;thisaddsevenmoretimetocubeprocessing. Flexiblerelationshipsarethedefaultsetting.Carefullyconsidertheadvantagesofrigidrelationshipsand changethedefaultwherethedesignallowsit. 2.1.1.7 Using Hierarchies Effectively AnalysisServicesenablesyoutobuildtwotypesofuserhierarchies:naturalandunnaturalhierarchies. Eachtypehasdifferentdesignandperformancecharacteristics. Inanaturalhierarchy,allattributesparticipatingaslevelsinthehierarchyhavedirectorindirect attributerelationshipsfromthebottomofthehierarchytothetopofthehierarchy. Inanunnaturalhierarchy,thehierarchyconsistsofatleasttwoconsecutivelevelsthathavenoattribute relationships.Typicallythesehierarchiesareusedtocreatedrilldownpathsofcommonlyviewed attributesthatdonotfollowanynaturalhierarchy.Forexample,usersmaywanttoviewahierarchyof GenderandEducation. Figure 33: Natural and unnatural hierarchies 12. 12 Fromaperformanceperspective,naturalhierarchiesbehaveverydifferentlythanunnaturalhierarchies do.Innaturalhierarchies,thehierarchytreeismaterializedondiskinhierarchystores.Inaddition,all attributesparticipatinginnaturalhierarchiesareautomaticallyconsideredtobeaggregationcandidates. Unnaturalhierarchiesarenotmaterializedondisk,andtheattributesparticipatinginunnatural hierarchiesarenotautomaticallyconsideredasaggregationcandidates.Rather,theysimplyprovide userswitheasytousedrilldownpathsforcommonlyviewedattributesthatdonothavenatural relationships.Byassemblingtheseattributesintohierarchies,youcanalsouseavarietyofMDX navigationfunctionstoeasilyperformcalculationslikepercentofparent. Totakeadvantageofnaturalhierarchies,definecascadingattributerelationshipsforallattributesthat participateinthehierarchy. 2.1.1.8 Turning Off the Attribute Hierarchy Memberpropertiesprovideadifferentmechanismtoexposedimensioninformation.Foragiven attribute,memberpropertiesareautomaticallycreatedforeverydirectattributerelationship.Forthe primarykeyattribute,thismeansthateveryattributethatisdirectlyrelatedtotheprimarykeyis availableasamemberpropertyoftheprimarykeyattribute. Ifyouonlywanttoaccessanattributeasmemberproperty,afteryouverifythatthecorrectrelationship isinplace,youcandisabletheattributeshierarchybysettingtheAttributeHierarchyEnabledproperty toFalse.Fromaprocessingperspective,disablingtheattributehierarchycanimproveperformanceand decreasecubesizebecausetheattributewillnolongerbeindexedoraggregated.Thiscanbeespecially usefulforhighcardinalityattributesthathaveaonetoonerelationshipwiththeprimarykey.High cardinalityattributessuchasphonenumbersandaddressestypicallydonotrequiresliceanddice analysis.Bydisablingthehierarchiesfortheseattributesandaccessingthemviamemberproperties, youcansaveprocessingtimeandreducecubesize. Decidingwhethertodisabletheattributeshierarchyrequiresthatyouconsiderboththequeryingand processingimpactsofusingmemberproperties.Memberpropertiescannotbeplacedonaqueryaxisin anMDXqueryinthesamemannerasattributehierarchiesanduserhierarchies.Toqueryamember property,youmustquerytheattributethatcontainsthatmemberproperty. Forexample,ifyourequiretheworkphonenumberforacustomer,youmustquerythepropertiesof customerandthenrequestthephonenumberproperty.Asaconvenience,mostfrontendtoolseasily displaymemberpropertiesintheiruserinterfaces. Ingeneral,filteringmeasuresusingmemberpropertiesisslowerthanfilteringusingattribute hierarchies,becausememberpropertiesarenotindexedanddonotparticipateinaggregations.The actualimpacttoqueryperformancedependsonhowyouusetheattribute. Forexample,ifyouruserswanttosliceanddicedatabybothaccountnumberandaccountdescription, fromaqueryingperspectiveyoumaybebetteroffhavingtheattributehierarchiesinplaceand removingthebitmapindexesifprocessingperformanceisanissue. 13. 13 2.1.1.9 Reference Dimensions Referencedimensionsallowyoutobuildadimensionalmodelontopofasnowflakerelationaldesign. Whilethisisapowerfulfeature,youshouldunderstandtheimplicationsofusingit. Bydefault,areferencedimensionisnonmaterialized.Thismeansthatquerieshavetoperformthejoin betweenthereferenceandtheouterdimensiontableatquerytime.Also,filtersdefinedonattributesin theouterdimensiontablearenotdrivenintothemeasuregroupwhenthebitmapstherearescanned. Thismayresultinreadingtoomuchdatafromdisktoansweruserqueries.Leavingadimensionasnon materializedprioritizesmodelingflexibilityoverqueryperformance.Considercarefullywhetheryoucan affordthistradeoff:cubesaretypicallyintendedtobefastadhocstructures,andputtingthe performanceburdenontheenduserisrarelyagoodidea. AnalysisServiceshastheabilitytomaterializethereferencesdimension.Whenyouenablethisoption, memoryanddiskstructuresarecreatedthatmakethedimensionbehavejustlikeadenormalizedstar schema.Thismeansthatyouwillretainalltheperformancebenefitsofaregular,nonreference dimension.However,becarefulwithmaterializedreferencedimensionifyourunaprocessupdateon theintermediatedimension,anychangesintherelationshipsbetweentheouterdimensionandthe referencewillnotbereflectedinthecube.Instead,theoriginalrelationshipbetweentheouter dimensionandthemeasuregroupisretainedwhichismostlikelynotthedesiredresult.Inaway,you canconsiderthereferencetabletobearigidrelationshiptoattributesintheouterattributes.Theonly waytoreflectchangesinthereferencetableistofullyprocessthedimension. 2.1.1.10 Fast-Changing Attributes Somedatamodelscontainattributesthatchangeveryfast.Dependingonwhichtypeofhistorytracking youneed,youmayfacedifferentchallenges. Type2FastChangingAttributesIfyoutrackeverychangetoafastchangingattribute,thismaycause thedimensioncontainingtheattributetogrowverylarge.Type2attributesaretypicallyaddedtoa dimensionwithaProcessAddcommand.Atsomepoint,runningProcessAddonalargedimensionand runningalltheconsistencycheckswilltakealongtime.Also,havingahugedimensionisunwieldy becauseuserswillhavetroublequeryingitandtheserverwillhavetroublekeepingitinmemory.A goodexampleofsuchamodelingchallengeistheageofacustomerthiswillchangeeveryyearand causethecustomerdimensiontogrowdramatically. Type1FastChangingAttributesEvenifyoudonottrackeverychangetotheattribute,youmaystill runintoissueswithfastchangingattributes.Toreflectachangeinthedatasourcetothecube,you havetorunProcessUpdateonthechangeddimension.Asthecubeanddimensiongrowslarger, runningProcessUpdatebecomesexpensive.Anexampleofsuchamodelingchallengeistotrackthe statusattributeofaserverinahostingenvironment(Running,Shutdown,Overloadedandsoon). Astatusattributelikethismaychangeseveraltimesperdayorevenperhour.Runningfrequent ProcessUpdatesonsuchadimensiontoreflectchangescanbeanexpensiveoperation,anditmaynot befeasiblewiththelockingimplementationofAnalysisServicesinaproductionenvironment. 14. 14 Inthefollowingsections,wewilllookatsomemodelingoptionsyoucanusetoaddresstheseproblems. 2.1.1.10.1Type 2 Fast-Changing Attributes Ifhistorytrackingisarequirementofafastchangingattribute,thebestoptionisoftentousethefact tabletotrackhistory.Thisisbestillustratedwithanexample.Consideragainthecustomerdimension withtheageattribute.ModelingtheAgeattributedirectlyinthecustomerdimensionproducesadesign likethis. Figure 44: Age in customer dimension NoticethateverytimeThomashasabirthday,anewrowisaddedinthedimensiontable.The alternativedesignapproachsplitsthecustomerdimensionintotwodimensionslikethis. 15. 15 Figure 55: Age in its own dimension Notethattherearesomerestrictionsonthesituationwherethisdesigncanbeapplied.Itworksbest whenthechangingattributetakesonasmall,distinctsetofvalues.Italsoaddscomplexitytothe design;byaddingmoredimensionstothemodel,itcreatesmoreworkfortheETLdeveloperswhenthe facttableisloaded.Also,considerthestorageimpactonthefacttable:Withthealternativedesign,the facttablebecomeswider,andmorebyteshavetobestoredperrow. 2.1.1.10.2Type 1 Fast-Changing Attributes Yourbusinessrequirementmaybeupdatinganattributeofadimensionathighfrequency,daily,or evenhourly.Forasmallcube,runningProcessUpdatewillhelpyouaddressthisissue.Butasthecube growslarger,theruntimeofProcessUpdatecanbecometoolongforthebatchwindoworthereal timerequirementsofthecube(youcanreadmoreabouttuningprocessupdateintheprocessing section). Consideragaintheserverhostingexample:Youmaywanttotrackthestatus,whichchangesfrequently, ofallservers.Fortheexample,letussaythattheserverdimensionisusedbyafacttabletracking performancecounters.Assumeyouhavemodeledlikethis. 16. 16 Figure 66: Status column in server dimension TheproblemwiththismodelistheStatuscolumn.IftheFactCounterislargeandstatuschangesalot, ProcessUpdatewilltakeaverylongtimetorun.Tooptimize,considerthisdesigninstead. Figure 77: Status column in its own dimension IfyouimplementDimServerastheintermediatereferencetabletoDimServerStatus,AnalysisServices nolongerhastokeeptrackofthemetadataintheFactCounterwhenyourunProcessUpdateon DimServerStatus.Butasdescribedearlier,thismeansthatthejointoDimServerStatuswillhappenat runtime,increasingCPUcostandquerytimes.Italsomeansthatyoucannotindexattributesin DimServerbecausetheintermediatedimensionisnotmaterialized.Youhavetocarefullybalancethe tradeoffbetweenprocessingtimeandqueryspeeds. 17. 17 2.1.1.11 Large Dimensions InSQLServer2005,SQLServer2008,andSQLServer2008R2,AnalysisServiceshassomebuiltin limitationsthatlimitthesizeofthedimensionsyoucancreate.Firstofall,ittakestimetoupdatea dimensionthisisexpensivebecauseallindexesonfacttableshavetobeconsideredforinvalidation whenanattributechanges.Second,stringvaluesindimensionattributesarestoredonadiskstructure calledthestringstore.Thisstructurehasasizelimitationof4GB.Ifadimensioncontainsattributes wherethetotalsizeofthestringvalues(thisincludestranslations)exceeds4GB,youwillgetanerror duringprocessing.ThenextversionofSQLServerAnalysisServices,codenamedDenali,isexpectedto removethislimitation. Considerforamomentadimensionwithtensorevenhundredsofmillionsofmembers.Sucha dimensioncanbebuiltandaddedtoacube,evenonSQLServer2005,SQLServer2008,andSQLServer 2008R2.Butwhatdoessuchadimensionmeantoanadhocuser?Howwilltheusernavigateit?Which hierarchieswillgroupthemembersofthisdimensionintoreasonablesizesthatcanberenderedona screen?Whileitmaymakesenseforsomereportingpurposestosearchforindividualmembersinsuch adimension,itmaynotbetherightproblemtosolvewithacube. Whenyoubuildcubes,askyourself:isthisacubeproblem?Forexample,thinkofthistypicaltelco modelofcalldetailrecords. Figure 88: Call detail records (CDRs) Inthisparticularexample,thereare300millioncustomersinthedatamodel.Thereisnogoodwayto groupthesecustomersandallowadhocaccesstothecubeatreasonablespeeds.Evenifyoumanageto optimizethespaceusedtofitinthe4GBstringstore,howwouldusersbrowseacustomerdimension likethis? Ifyoufindyourselfinasituationwhereadimensionbecomestoolargeandunwieldy,considerbuilding thecubeontopofanaggregate.Forthetelcoexample,imagineatransformationlikethefollowing. 18. 18 Figure 99: Cube built on aggregate Usinganaggregatedfacttable,thisturnsa300millionrowdimensionprobleminto100,000row dimensionproblem.Youcanconsideraggregatingthefactstosavestoragetooalternatively,youcan addademographicskeydirectlytotheoriginalfacttable,processontopofthisdatasource,andrelyon MOLAPcompressiontoreducedatasizes. 2.1.2 Partitioning a Cube Partitionsseparatemeasuregroupdataintophysicalstorageunits.Effectiveuseofpartitionscan enhancequeryperformance,improveprocessingperformance,andfacilitatedatamanagement.This sectionspecificallyaddresseshowyoucanusepartitionstoimprovequeryperformance.Youmustoften makeatradeoffbetweenqueryandprocessingperformanceinyourpartitioningstrategy. Youcanusemultiplepartitionstobreakupyourmeasuregroupintoseparatephysicalcomponents.The advantagesofpartitioningforimprovingqueryperformancearepartitioneliminationandaggregation design. PartitioneliminationPartitionsthatdonotcontaindatainthesubcubearenotqueriedatall,thus avoidingthecostofreadingtheindex(orscanningatableiftheserverisinROLAPmode).Whilereading apartitionindexandfindingnoavailablerowsisacheapoperation,asthenumberofconcurrentusers grows,thesereadsbegintoputastraininthethreadpool.Also,forqueriesthatdonothaveindexesto supportthem,AnalysisServiceswillhavetoscanallpotentiallymatchingpartitionsfordata. AggregationdesignEachpartitioncanhaveitsownorsharedaggregationdesign.Therefore,partitions queriedmoreoftenordifferentlycanhavetheirowndesigns. 19. 19 Figure 1010: Intelligent querying by partitions Figure10displaystheprofilertraceofqueryrequestingResellerSalesAmountbyBusinessTypefrom AdventureWorks.TheResellerSalesmeasuregroupoftheAdventureWorkscubecontainsfour partitions:oneforeachyear.Becausethequerysliceson2003,thestorageenginecangodirectlytothe 2003ResellerSalespartitionandignoreotherpartitions. 2.1.2.1 Partition Slicing Partitionsareboundtoasourcetable,view,orsourcequery.Whentheformulaenginerequestsa subcube,thestorageenginelooksatthemetadataofpartitionfortherelevantmeasuregroup.Each partitionmaycontainaslicedefinition,ahighleveldescriptionoftheminimumandmaximumattribute DataIDsthatexistinthatdimension.Ifitcanbedeterminedfromtheslicedefinitionthattherequested subcubedataisnotpresentinthepartition,thatpartitionisignored.Iftheslicedefinitionismissingorif theinformationinthesliceindicatesthatrequireddataispresent,thepartitionisaccessedbyfirst lookingattheindexes(ifany)andthenscanningthepartitionsegments. Thesliceofapartitioncanbesetintwoways: AutoslicewhenAnalysisServicesreadsthedataduringprocessing,itkeepstrackofthe minimumandmaximumattributeDataIDreads.Thesevaluesareusedtosettheslicewhenthe indexesarebuiltonthepartition. ManualslicerTherearecaseswhereautoslicewillnotworkthesearedescribedinthenext section.Forthosesituations,youcanmanuallysettheslice.Manualslicesaretheonlyavailable sliceoptionforROLAPpartitionsandproactivecachingpartitions. 2.1.2.1.1 Auto Slice DuringprocessingofMOLAPpartitions,AnalysisServicesinternallyidentifiestherangeofdatathatis containedineachpartitionbyusingtheMinandMaxDataIDsofeachattributetocalculatetherangeof datathatiscontainedinthepartition.Thedatarangeforeachattributeisthencombinedtocreatethe slicedefinitionforthepartition. 20. 20 TheMinandMaxDataIDscanspecifyaeitherasinglememberorarangeofmembers.Forexample, partitioningbyyearresultsinthesameMinandMaxDataIDslicefortheyearattribute,andqueriestoa specificmomentintimeonlyresultinpartitionqueriestothatyearspartition. ItisimportanttorememberthatthepartitionsliceismaintainedasarangeofDataIDsthatyouhaveno explicitcontrolover.DataIDsareassignedduringdimensionprocessingasnewmembersare encountered.BecauseAnalysisServicesjustlooksattheminimumandmaximumvalueoftheDataID, youcanendupreadingpartitionsthatdontcontainrelevantdata. Forexample:ifyouhaveapartition,P2003_4,thatcontainsboth2003and2004data,youarenot guaranteedthattheminimumandmaximumDataIDintheslidecontainvaluesnexttoeachother(even thoughtheyearsareadjacent).Inourexample,letussaytheDataIDfor2003is42andtheDataIDfor 2004is45.BecauseyoucannotcontrolwhichDataIDgetsassignedtowhichmembers,youcouldbeina situationwheretheDataIDfor2005is44.Whenauserrequestsdatafor2005,AnalysisServiceslooksat thesliceforP2003_4,seesthatitcontainsdataintheinterval42to45andthereforeconcludesthatthis partitionhastobescannedtomakesureitdoesnotcontainthevaluesforDataID44(because44is between42and45). Becauseofthisbehavior,autoslicetypicallyworksbestifthedatacontainedinthepartitionmapstoa singleattributevalue.Whenthatisthecase,themaximumandminimumDataIDcontainedintheslice willbeequalandtheslicewillworkefficiently. Notethattheautosliceisnotdefinedandindexesarenotbuiltforpartitionswithfewerrowsthan IndexBuildThreshold(whichhasadefaultvalueof4096). 2.1.2.1.2 Manually Setting Slices NometadataisavailabletoAnalysisServicesaboutthecontentofROLAPandproactivecaching partitions.Becauseofthis,youmustmanuallyidentifythesliceinthepropertiesofthepartition.Itisa bestpracticetomanuallysetslicesinROLAPandproactivecachingpartitions. However,asshownintheprevioussection,therearecaseswhereautoslicewillnotgiveyouthe desiredpartitioneliminationbehavior.Inthesecasesyoucanbenefitfromdefiningthesliceyourselffor MOLAPpartitions.Forexample,ifyoupartitionbyyearwithsomepartitionscontainingarangeofyears, definingthesliceexplicitlyavoidstheproblemofoverlappingDataIDs.Thiscanonlybedonewith knowledgeofthedatawhichiswhereyoucanaddsomeoptimizationasaBIdeveloper. Itisgenerallynotabestpracticetocreatepartitionsbeforeyouarereadytofillthemwithdata.Butfor realtimecubes,itissometimesagoodideatocreatepartitionsinadvancetoavoidlockingissues. Whenyoutakethisapproach,itisalsoagoodideatosetamanualsliceonMOLAPpartitionstomake surethestorageenginedoesnotspendtimescanningemptypartitions. 21. 21 2.1.2.2 Partition Sizing Fornondistinctcountmeasuregroups,testswithpartitionsizesintherangeof200MBtoupto3GB indicatethatpartitionsizealonedoesnothaveasubstantialimpactonqueryspeeds.Infact,wehave successfullydeployedgoodqueryperformanceonpartitionslargerthan3GB. Thefollowinggraphshowsfourdifferentqueryrunswithdifferentpartitionsizes(theverticalaxisis totalruntimeinhours).Performanceiscomparablebetweenpartitionsizesandisonlyaffectedbythe designofthesecurityfeaturesinthisparticularcustomercube. Figure 1111: Throughput by partition size (higher is better) Thepartitioningstrategyshouldbebasedonthesefactors: Increasingprocessingspeedandflexibility Increasingmanageabilityofbringinginnewdata Increasingqueryperformancefrompartitioneliminationasdescribedearlier Supportfordifferentaggregationdesigns Asyouaddmorepartitions,themetadataoverheadofmanagingthecubegrowsexponentially.This affectsProcessUpdateandProcessAddoperationsondimensions,whichhavetotraversethemetadata dependenciestoupdatethecubewhendimensionschange.Asaruleofthumb,youshouldtherefore seektokeepthenumberofpartitionsinthecubeinthelowthousandswhileatthesametime balancingtherequirementsdiscussedhere. Forlargecubes,preferlargerpartitionsovercreatingtoomanypartitions.Thisalsomeansthatyoucan safelyignoretheAnalysisManagementObjects(AMO)warninginMicrosoftVisualStudiothatpartition sizesshouldnotexceed20millionrows. 2.1.2.3 Partition Strategy Fromguidanceonpartitionsizing,wecandevelopsomecommondesignpatternsforpartition strategies. 22. 22 2.1.2.3.1 Partition by Date Mostcubesarebuiltonatleastonecolumncontainingadate.Becausedataoftenarrivesinmonthly, weekly,daily,orevenhourlyslices,itmakessensetopartitionthecubeondate.Partitioningondate allowsyoutoreplaceafulldayincaseyouloadfaultydata.Itallowsyoutoselectivelyarchiveolddata bymovingthepartitiontocheapstorage.Andfinally,itallowsyoutoeasilygetridofdata,byremoving anentirepartition.Typically,adatepartitioningschemelookssomewhatlikethis. Figure 1212: Partitioning by Date Notethatinordertomovethepartitiontocheaperstorage,youwillhavetochangethedatalocation andreprocessesthepartition.Thisdesignworksverywellforsmalltomediumsizedcubes.Itis reasonablysimpletoimplementandthenumberofpartitionsiskeptlow.However,itdoessufferfroma fewdrawbacks: 1. Ifthegranularityofthepartitioningissmallenough(forexample,hourly),thenumberof partitionscanquicklybecomeunmanageable. 2. Assumingdataisaddedonlytothelatestpartition,partitionprocessingislimitedtooneTCP/IP connectionreadingfromthedatasource.Ifyouhavealotofdata,thiscanbeascalabilitylimit. Ad1)Ifyouhavealotofdatebasedpartitions,itisoftenagoodideatomergetheolderonesintolarge partitions.YoucandothiseitherbyusingtheAnalysisServicesmergefunctionalityorbydroppingthe oldpartitions,creatinganew,largerpartition,andthenreprocessingit.Reprocessingwilltypicallytake 23. 23 longerthanmerging,butwehavefoundthatcompressionofthepartitioncanoftenincreaseifyou reprocess.Amodified,datepartitioningschememaylooklikethis. Figure 1313: Modified Date Partitioning Thisdesignaddressesthemetadataoverheadofhavingtoomanypartitions.Butitisstillbottlenecked bythemaximumspeedoftheProcessAddorProcessFullforthelatestpartition.Ifyourdatasourceis SQLServer,thespeedofasingledatabaseconnectioncanbehundredsofthousandsofrowsevery secondwhichworkswellformostscenarios.Butifthecuberequiresevenfasterprocessingspeeds, considermatrixpartitioning. 2.1.2.3.2 Matrix Partitioning Forlargecubes,itisoftenagoodideatoimplementamatrixpartitioningscheme:partitiononboth dateandsomeotherkey.Thedatepartitioningisusedtoselectivelydeleteormergeoldpartitionsas describedearlier.Theotherkeycanbeusedtoachieveparallelismduringpartitionprocessingandto restrictcertainuserstoasubsetofthepartitions.Forexample,consideraretailerthatoperatesinUS, Europe,andAsia.Youmightdecidetopartitionlikethis. 24. 24 Figure 1414: Example of matrix partitioning Iftheretailergrows,theymaychoosetosplittheregionpartitionsintosmallerpartitionstoincrease parallelismofloadfurtherandtolimittheworstcasescansthatausercanperform.Forcubesthatare expectedtogrowdramatically,itisagoodideatochooseapartitionkeythatgrowswiththebusiness andgivesyouoptionsforextendingthematrixpartitioningstrategyappropriately.Thefollowingtable containsexamplesofsuchpartitioningkeys. Industry Examplepartitionkey Sourceofdataproliferation Webretail Customerkey Addingcustomersandtransactions Storeretail Storekey Addingnewstores Datahosting HostIDorracklocation Addinganewserver 25. 25 Telecommunications SwitchID,countrycode,orarea code Expandingintonewgeographical regionsoraddingnewservices Computerized manufacturing ProductionlineIDormachineID Addingproductionlinesor(for machines)sensors Investmentbanking Stockexchangeorfinancial instrument Addingnewfinancialinstruments, products,ormarkets Retailbanking Creditcardnumberorcustomer key Increasingcustomertransactions Onlinegaming Gamekeyorplayerkey Addingnewgamesorplayers Ifyouimplementamatrixpartitioningscheme,youshouldpayspecialattentiontouserqueries.Queries touchingseveralpartitionsforeverysubcuberequest,suchasaquerythatasksforahighlevel aggregateofthepartitionbusinesskey,resultinahighthreadusageinthestorageengine.Becauseof this,werecommendthatyoupartitionthebusinesskeysothatsinglequeriestouchnomorethanthe numberofcoresavailableonthetargetserver.Forexample,ifyoupartitionbyStoreKeyandyouhave 1,000stores,queriestouchingtheaggregationofallstoreswillhavetotouch1,000partitions.Insucha design,itisagoodideatogroupthestoresintoanumberofbuckets(thatis,groupthestoresoneach partition,ratherthanhavingindividualpartitionsforeachstore).Forexample,ifyourunona16core server,youcangroupthestoreintobucketsofaround62storesforeachpartition(1,000storesdivided into16buckets). 2.1.2.3.3 Hash Partitioning Sometimesitisnotpossibletocomeupwithagooddistributionofbusinesskeysforpartitioningthe cube.Perhapsyoujustdonthaveagoodkeycandidatethatfitsthedescriptionintheprevioussection, orperhapsthedistributionofthekeyisunknownatdesigntime.Insuchcases,abruteforceapproach canbeused:Partitiononthehashvalueofakeythathasahighenoughcardinalityandwherethereis littleskew.Ifyouexpecteveryquerytotouchmanypartitions,itisimportantthatyoupayspecial attentiontotheCoordinatorQueryBalancingFactorandtheCoordinatorQueryMaxThreadsettings, whicharedescribedinPart2. 2.1.3 Relational Data Source Design Cubesaretypicallybuiltontopofrelationaldatasourcestoserveasdatamarts.Throughthedesign surface,AnalysisServicesallowsyoutocreatepowerfulabstractionsontopoftherelationalsource. Computedcolumnsandnamedqueriesareexamplesofthis.Thisallowsfastprototypingandalso enabledyoutocorrectpoorrelationaldesignwhenyouarenotincontroloftheunderlyingdatasource. ButtheAnalysisServicesdesignsurfaceisnopanaceaawelldesignedrelationaldatasourcecanmake queriesandprocessingofacubefaster.Inthissection,weexploresomeoftheoptionsthatyoushould considerwhendesigningarelationaldatasource.Afulltreatmentofrelationaldatawarehousingisout ofscopeforthisdocument,butwewillprovidereferenceswhereappropriate. 26. 26 2.1.3.1 Use a Star Schema for Best Performance Itiswidelydebatedwhatthemostefficientadreportmodelingtechniqueis:starschema,snowflake schema,orevenathirdtofifthnormalformordatavaultmodels(inorderoftheincreased normalization).Allareconsideredbywarehousedesignersascandidatesforreporting. NotethattheAnalysisServicesUnifiedDimensionalModel(UDM)isadimensionalmodel,withsome additionalfeatures(referencedimensions)thatsupportsnowflakesandmanytomanydimensions.No matterwhichmodelyouchooseastheenduserreportingmodel,performanceoftherelationalmodel boilsdowntoonesimplefact:joinsareexpensive!ThisisalsopartiallytruefortheAnalysisServices engineitself.Forexample:Ifasnowflakeisimplementedasanonmaterializedreferencedimension, userswillwaitlongerforqueries,becausethejoinisdoneatruntimeinsidetheAnalysisServices engine. Thelargestimpactofsnowflakesoccursduringprocessingofthepartitiondata.Forexample:Ifyou implementafacttableasajoinoftwobigtables(forexample,separatingorderlinesandorderheaders insteadofstoringthemasprejoinedvalues),processingoffactswilltakelonger,becausetherelational enginehastocomputethejoin. ItispossibletobuildanAnalysisServicescubeontopofahighlynormalizedmodel,butbepreparedto paythepriceofjoinswhenaccessingtherelationalmodel.Inmostcases,thatpriceispaidatprocessing time.InMOLAPdatamodels,materializedreferencedimensionshelpyoustoretheresultofthejoined tablesondiskandgiveyouhighspeedqueriesevenonnormalizeddata.However,ifyouarerunning ROLAPpartitions,querieswillpaythepriceofthejoinatquerytime,andyouruserresponsetimesor yourhardwarebudgetwillsufferifyouareunabletoresistnormalization. 2.1.3.2 Consider Moving Calculations to the Relational Engine SometimescalculationscanbemovedtotheRelationalEngineandbeprocessedassimpleaggregates withmuchbetterperformance.Thereisnosinglesolutionhere;butifyoureencounteringperformance issues,considerwhetherthecalculationcanberesolvedinthesourcedatabaseordatasourceview (DSV)andprepopulated,ratherthanevaluatedatquerytime. Forexample,insteadofwritingexpressionslikeSum(Customer.City.Members, cint(Customer.City.Currentmember.properties(Population))),considerdefiningaseparatemeasure groupontheCitytable,withasummeasureonthePopulationcolumn. Asasecondexample,youcancomputetheproductofrevenue*ProductsSoldattheleavesinthecube andaggregatewithcalculations.Butcomputingthisresultinthesourcedatabaseinsteadcanprovide superiorperformance. 2.1.3.3 Use Views ItisgenerallyagoodideatobuildyourUDMontopofdatabaseviews.Amajoradvantageofviewsis thattheyprovideanabstractionlayerontopofthephysical,relationalmodel.Ifthecubeisbuiltontop ofviews,therelationaldatabasecan,tosomedegree,beremodeledwithoutbreakingthecube. 27. 27 Considerarelationalsourcethathaschosentonormalizetwotablesyouneedtojointoobtainafact tableforexample,adatamodelthatsplitsasalesfactintoorderlinesandorders.Ifyouimplement thefacttableusingquerybinding,yourUDMwillcontainthefollowing. Figure 1515: Using named queries in UDM Inthismodel,theUDMnowhasadependencyonthestructureoftheLineItemsandOrderstables alongwiththejoinbetweenthem.IfyouinsteadimplementaSalesviewinthedatabase,youcanmodel likethis. Figure 1616: Implementing UDM on top of views 28. 28 ThismodelgivestherelationaldatabasethefreedomtooptimizethejoinedresultsofLineItemsand Order(forexamplebystoringitdenormalized),withoutanyimpactonthecube.Itwouldbetransparent forthecubedeveloperiftheDBAoftherelationaldatabaseimplementedthischange. Figure 1717: Implementing UDM on top of pre-joined tables Viewsprovideencapsulation,anditisgoodpracticetousethem.Iftherelationaldatamodelersinsist onnormalization,givethemachancetochangetheirmindsanddenormalizewithoutbreakingthecube model. Viewsalsoprovideeasyofdebugging.YoucanissueSQLqueriesdirectlyonviewstocomparethe relationaldatawiththecube.Hence,viewsaregoodwaytoimplementbusinesslogicthatcouldyou couldmimicwithquerybindingintheUDM.WhiletheUDMsyntaxissimilartotheSQLviewsyntax,you cannotissueSQLstatementsagainsttheUDM. 2.1.3.3.1 Query Binding Dimensions QuerybindingfordimensionsdoesnotexistinSQLServer2008AnalysisServices,butyoucan implementitbyusingaview(insteadoftables)foryourunderlyingdimensiondatasource.Thatway, youcanusehints,indexedviews,orotherrelationaldatabasetuningtechniquestooptimizetheSQL statementthataccessesthedimensiontablesthroughyourview.Thisalsoallowsyoutoturna snowflakedesignintherelationalsourceintoaUDMthatisapurestarschema. 2.1.3.3.2 Processing Through Views Dependingontherelationalsource,viewscanoftenprovidemeanstooptimizethebehaviorofthe relationaldatabase.Forexample,inSQLServeryoucanusetheNOLOCKhintintheviewdefinitionto removetheoverheadoflockingrowsastheviewisscanned,balancingthiswiththepossibilityofgetting dirtyreads.ViewscanalsobeusedtopreaggregatelargefacttablesusingaGROUPBYstatement;the relationaldatabasemodelercanevenchoosetomaterializeviewsthatusealotofhardwareresources. 29. 29 2.1.4 Calculation Scripts Thecalculationscriptinthecubeallowsyoutoexpresscomplexfunctionalityofthecube,conferringthe abilitytodirectlymanipulatethemultidimensionalspace.Inafewlinesofcode,youcanelegantlybuild highlyvaluablebusinesslogic.Butconversely,ittakesonlyafewlinesofpoorlywrittencalculationcode tocreateabigperformanceimpactonusers.Ifyouplantodesignacubewithalargecalculationscript, wehighlyrecommendthatyoulearnthebasicsofwritinggoodMDXcodethelanguageusedfor calculations.Thereferencessectioncontainsresourcesthatwillgetyouofftoagoodstart. Thequerytuningsectioninthisbookprovideshighlevelguidanceontuningindividualqueries.Buteven atdesigntime,therearesomebestpracticesyoushouldapplytothecubethatavoidcommon performancemistakes.Thissectionprovidesyouwithsomebasicrules;thesearethebareminimum youshouldapplywhenbuildingthecubescript. References: MDXhasarichcommunityofcontributorsontheweb.Herearesomelinkstogetyoustarted: Pearson,Bill:StairwaytoMDX o http://www.sqlservercentral.com/stairway/72404/ Piasevoli,Tomislav:MDXwithMicrosoftSQLServer2008R2AnalysisServicesCookbook o http://www.packtpub.com/mdxwithmicrosoftsqlserver2008r2analysis services/book Russo,Marco:MDXBlog: o http://sqlblog.com/blogs/marco_russo/archive/tags/MDX/default.aspx Pasumansky,Mosha:Blog o http://sqlblog.com/blogs/mosha/ Piasevoli,Tomislav:Blog o http://tomislav.piasevoli.com Webb,Christopher:Blog o http://cwebbbi.wordpress.com/category/mdx/ Spofford,George,SivakumarHarinath,ChristopherWebb,DylanHaiHuang,andFrancesco Civardi,:MDXSolutions:WithMicrosoftSQLServerAnalysisServices2005andHyperionEssbase, ISBN:9780471748083 2.1.4.1 Use Attributes Instead of Sets Whenyouneedtorefertoafixedsubsetofdimensionmembersinacalculation,useanattribute insteadofaset.Attributesenableyoutotargetaggregationstothesubset.Attributesarealsoevaluated fasterthansetsbytheformulaengine.Usinganattributeforthispurposealsoallowsyoutochangethe setbyupdatingthedimensioninsteadofdeployinganewcalculationscripts. Example:Insteadofthis: 30. 30 CREATESET[CurrentDay]ASTAIL([Date].[Calendar].members,1) CREATESET[PreviousDay]ASHEAD(TAIL(Date].[Calendar].members),2),1) Dothis(assumingtodayis20110616): CalendarKeyAttribute DayTypeAttribute (Flexiblerelationshiptokey) 20110613 OldDates 20110614 OldDates 20110615 PreviousDay 20110616 CurrentDay ProcessUpdatethedimensionwhenthedaychanges.Userscannowrefertothecurrentdayby addressingtheDayTypeattributeinsteadoftheset. 2.1.4.2 Use SCOPE Instead of IIF When Addressing Cube Space Sometimes,youwantacalculationtoonlyapplyforaspecificsubsetofcubespace.SCOPEisabetter choicethanIIFinthiscase.Hereisanexampleofwhatnottodo. CREATEMEMBERCurrentCube.[Measures].[SixMonthRollingAverage]AS IIF([Date].[Calendar].CurrentMember.Level Is[Date].[Calendar].[Month] ,Sum([Date].[Calendar].CurrentMember.Lag(5) :[Date].[Calendar].CurrentMember ,[Measures].[InternetSalesAmount])/6 ,NULL) Instead,usetheAnalysisServicesSCOPEfunctionforthis. CREATEMEMBERCurrentCube.[Measures].[SixMonthRollingAverage] ASNULL,FORMAT_STRING="Currency",VISIBLE=1; SCOPE([Measures].[SixMonthRollingAverage],[Date].[Calendar].[Month].Members); THIS=Sum([Date].[Calendar].CurrentMember.Lag(5) :[Date].[Calendar].CurrentMember ,[Measures].[InternetSalesAmount])/6; 31. 31 ENDSCOPE; 2.1.4.3 Avoid Mimicking Engine Features with Expressions SeveralnativefeaturescanbemimickedwithMDX: Unaryoperators Calculatedcolumnsinthedatasourceview(DSV) Measureexpressions Semiadditivemeasures YoucanreproduceeachthesefeaturesinMDXscript(infact,sometimesyoumust,becausesomeare onlysupportedintheEnterpriseSKU),butdoingsooftenhurtsperformance. Forexample,usingdistributiveunaryoperators(thatis,thosewhosememberorderdoesnotmatter, suchas+,,and~)isgenerallytwiceasfastastryingtomimictheircapabilitieswithassignments. Therearerareexceptions.Forexample,youmightbeabletoimproveperformanceofnondistributive unaryoperators(thoseinvolving*,/,ornumericvalues)withMDX.Furthermore,youmayknowsome specialcharacteristicofyourdatathatallowsyoutotakeashortcutthatimprovesperformance.Such optimizationsrequireexpertleveltuningandingeneral,youcanrelyontheAnalysisServicesengine featurestodothebestjob. Measureexpressionsalsoprovideauniquechallenge,becausetheydisabletheuseofaggregates(data hastoberolledupfromtheleaflevel).Onewaytoworkaroundthisistouseahiddenmeasurethat containspreaggregatedvaluesintherelationalsource.Youcanthentargetthehiddenmeasuretothe aggregatevalueswithaSCOPEstatementinthecalculationscript. 2.1.4.4 Comparing Objects and Values Whendeterminingwhetherthecurrentmemberortupleisaspecificobject,useIS.Forexample,the followingqueryisnotonlynonperformant,butincorrect.Itforcesunnecessarycellevaluationand comparesvaluesinsteadofmembers. [Customer].[CustomerGeography].[Country].&[Australia]=[Customer].[Customer Geography].currentmember Furthermore,dontperformextrastepswhendeducingwhetherCurrentMemberisaparticular memberbyinvolvingIntersectandCounting. 32. 32 intersect({[Customer].[CustomerGeography].[Country].&[Australia]}, [Customer].[CustomerGeography].currentmember).count>0 UseISinstead. [Customer].[CustomerGeography].[Country].&[Australia]is[Customer].[Customer Geography].currentmember 2.1.4.5 Evaluating Set Membership DeterminingwhetheramemberortupleisinasetisbestaccomplishedwithIntersect.TheRank functiondoestheadditionaloperationofdeterminingwhereinthesetthatobjectlies.Ifyoudontneed it,dontuseit.Forexample,thefollowingstatementmaydomoreworkthanyouneedittodo. rank([Customer].[CustomerGeography].[Country].&[Australia], )>0 ThisstatementusesIntersecttodeterminewhetherthespecifiedinformationisintheset. intersect({[Customer].[CustomerGeography].[Country].&[Australia]},).count>0 2.2 Testing Analysis Services Cubes Asyouprepareforuseracceptanceandpreproductiontestingofacube,youshouldfirstconsiderwhat acubeisandwhatthatmeansforuserqueries.Dependingonyourbackgroundandroleinthe developmentanddeploymentcycle,therearedifferentwaystolookatthis. Asadatabaseadministrator,youcanthinkofacubeasadatabasethatcanacceptanyqueryfrom users,andwheretheresponsetimefromanysuchqueryisexpectedtobereasonableatermthatis oftenvaguelydefined.Inmanycases,youcanoptimizeresponsetimeforspecificqueriesusing aggregates(whichforrelationalDBAsissimilartoindextuning),andtestingshouldgiveyouanearly ideaofgoodaggregationcandidates.Butevenwithaggregates,youmustalsoconsidertheworstcase: Youshouldexpecttoseeleaflevelscanqueries.Suchqueries,whichcanbeeasilyexpressedbytools likeExcel,mayenduptouchingeverypieceofdatathecube.Dependingonyourdesign,thiscanbea 33. 33 significantamountofdata.Youshouldconsiderwhatyouwanttodowithsuchqueries.Thereare multipleoptions:Forexample,youmaychoosetoscaleyourhardwaretohandletheminadecent responsetime,oryoumaysimplychoosetocancelthem.Ineithercase,asyoupreparefortesting, makesuresuchqueriesarepartthetestsuiteandthatyouobservewhathappenstotheAnalysis Servicesinstancewhentheyrun.Youshouldalsounderstandwhatareasonableresponsetimeisfor yourenduserandmakethatpartofthetestsuite. AsaBIdeveloper,youcanlookatthecubeasyourdescriptionofthemultidimensionalspaceinwhich queriescanbeexpressed.Apartofthisspacewillbeinstantiatedwithdatastructuresondisk supportingit:dimensions,measuregroups,andtheirpartitions.However,someofthis multidimensionalspacewillbeservedbycalculationsoradhocmemorystructures,forexample:MDX calculations,manytomanydimensions,andcustomrollups.Wheneverqueriesarenotserveddirectly byinstantiateddata,thereisapotential,querytimecalculationpricetobepaid,thismayshowupas badresponsetime.Asthecubedeveloper,youshouldmakesurethatthetestingcoversthesecases. Hence,astheBIdeveloper,youshouldmakesurethetestqueriesalsostressnoninstantiateddata.This isavaluableexercise,becauseyoucanuseittomeasuretheimpactonusersofcomplexcalculations andthenadjustthedatamodelaccordingly. Yourapproachtotestingwilldependonwhichsituationyoufindyourselfin.Ifyouaredevelopinganew system,youcanworkdirectlytowardsthetestgoalsdrivenbybusinessrequirements.However,ifyou aretryingtoimproveanexistingsystem,itisanadvantagetoacquireatestbaselinefirst. 2.2.1 Testing Goals Beforeyoudesignatestharness,youshoulddecidewhatyourtestinggoalswillbe.Testingnotonly allowsyoutofindfunctionalbugsinthesystem,italsohelpsquantifythescalabilityandpotential bottlenecksthatmaybehardtodiagnoseandfixinabusyproductionenvironment.Ifyoudonotknow whatyourscalebarrierisorwheretheysystemmightbreak,itbecomeshardtoactwithconfidence whenyoutunethefinalproductionsystem. ConsiderwhatcharacteristicsarereasonabletoexpectfromtheBIsystemsanddocumentthese expectationsaspartofyourtestplan.Thedefinitionofreasonabledependstoalargeextentonyour familiaritywithsimilarsystems,theskillsofyourcubedesigners,andthehardwareyourunon.Itis oftenagoodideatogetasecondopiniononwhattestvaluesarereachableforexamplefroma neutralthirdpartythathasexperiencewithsimilarsystems.Thishelpsyousetexpectationsproperly withbothdevelopersandbusinessusers. BIsystemsvaryinthecharacteristicsorganizationsrequireofthemnoteveryoneneedsscalabilityto thousandsofusers,tensofterabytes,nearzerodowntime,andguaranteedsubsecondresponsetime. Whileallthesegoalscanbeachievedformostcases,itisnotalwayscheaptoacquiretheskillsrequired todesignasystemtosupportthem.Considerwhatyoursystemneedstodoforyourorganization,and avoidoverdesigningintheareaswhereyoudontneedthehighestrequirements.Forexample:youmay decidethatyouneedveryfastresponsetimes,butthatyoualsowantaverylowcostserverthatcan runinasharedstorageenvironment.Forsuchascenario,youmaywanttoreducethedatainthecube 34. 34 toasizethatwillfitinmemory,eliminatingtheneedforthemajorityofI/Ooperationsandproviding fastscantimesevenforpoorlyfilteredqueries. Hereisatableofpotentialtestgoalsyoushouldconsider.Iftheyarerelevantforyourorganizations requirements,youshouldtailorthemtoreflectthoserequirements. TestGoal Description Examplegoal Scalability Howmanyconcurrentusersshouldbe supportedbythesystem? Mustsupport10,000 concurrentlyconnectedusers,of which1,000runqueries simultaneously. Performance/ throughput Howfastshouldqueriesreturntotheclient? Thismayrequireyoutoclassifyqueriesinto differentcomplexities. Notallqueriescanbeansweredquicklyandit willoftenbewisetoconsultanexpertcube designertoliaisewithuserstounderstandwhat querypatternscanbeexpectedandwhatthe complexityofansweringthesequerieswillbe. Anotherwaytolookatthistestgoalisto measurethethroughputinqueriesanswered persecondinamixedworkload. Simplequeriesreturninga singleproductgroupforagiven yearshouldreturninlessthan1 secondevenatfulluser concurrency. Queriesthattouchnomore than20%ofthefactrowsshould runinlessthan30seconds. Mostotherqueriestouchinga smallpartofthecubeshould returninaround10seconds. Withourworkload,weexpect throughputtobearound50 queriesreturnedpersecond. Userqueriesrequestingthe endofmonthcurrencyrate conversionshouldreturninno morethan20seconds.Queries thatdonotrequirecurrency conversionshouldreturninless than5seconds. DataSizes Whatisthegranularityofeachdimension attribute?Howmuchdatawilleachmeasure groupcontain? Notethatthecubedesignerswilloftenhave beenconsideringthisandmayalreadyknowthe answer. Thelargestcustomerdimension willcontain30millionrowsand have10attributesandtwouser hierarchies.Thelargestnonkey attributewillhave1million members. Thelargestmeasuregroupis sales,with1billionrows.The secondlargestispurchases,with 100millionrows.Allother measuregroupsaretrivialin size. 35. 35 TargetServer Platforms Whichservermodeldoyouwanttorunon?Itis oftenagoodideatotestonboththatserver andanevenbiggerserverclass.Thisenables youtoquantifythebenefitsofupgrading. Mustrunon2socket6core Nehalemmachinewith32GBof RAM. Mustbeabletoscaleto4 socketNehalem8coremachine with256GBofRAM. TargetI/O system WhichI/Osystemdoyouwanttouse?What characteristicswillthatsystemhave? MustrunoncorporateSANand usenomorethan1,000random IOPSat32,000blocksizesat6ms latency. WillrunondedicatedNAND devicesthatsupport80,000IOPS at100slatency. Targetnetwork infrastructure Whichnetworkconnectivitywillbeavailable betweenusersandAnalysisServices,and betweenAnalysisServicesandthedatasources? Notethatyoumayhavetosimulatethese networkconditionsinalab. Intheworstcasescenario, userswillconnectovera100ms latencyWANlinkwitha maximumbandwidthof 10Mbit/sec. Therewillbea10Gbit dedicatednetworkavailable betweenthedatasourceandthe cube. Processing Speeds Howfastshouldrowsbebroughtintothecube andhowoften? Dimensionsshouldbefully processedeverynightwithin30 minutes. Twotimesduringtheday, 100,000,000rowsshouldbe addedtothesalesmeasure group.Thisshouldtakeno longerthan15minutes. 2.2.2 Test Scenarios Basedontheconsiderationsfromtheprevioussectionyoushouldbeabletocreateauserworkload thatrepresentstypicaluserbehaviorandthatenablesyoutomeasurewhetheryouaremeetingyour testinggoals. Typicaluserbehaviorandwellwrittenqueriesareunfortunatelynottheonlyqueriesyouwillreceivein mostsystems.Aspartofthetestphase,youshouldalsotrytoflushoutpotentialproductionissues beforetheyarise.Werecommendthatyoumakesureyourtestworkloadcontainsthefollowingtypes ofqueriesandteststhemthoroughly: 36. 36 Queriesthattouchseveraldimensionsatthesametime EnoughqueriestotesteveryMDXexpressioninthecalculationscript Queriesthatexercisemanytomanydimensions Queriesthatexercisecustomrollups Queriesthatexerciseparent/childdimensions Queriesthatexercisedistinctcountmeasuregroups Queriesthatcrossjoinattributesfromdimensionsthatcontainmorethan100,000members Queriesthattoucheverysinglepartitioninthedatabase Queriesthattouchalargesubsetofpartitionsinthedatabase(forexample,currentyear) Queriesthatreturnalotofdatatotheclient(forexample,morethan100,000rows) Queriesthatusecubesecurityversusqueriesthatdonotuseit Queriesexecutingconcurrentlywithprocessingoperationsifthisispartofyourdesign Youshouldtestonthefulldatasetfortheproductioncube.Ifyoudont,theresultswillnotbe representativeofreallifeoperations.Thisisespeciallytrueforcubesthatarelargerthanthememory onthemachinetheywilleventuallyrunon. Ofcourse,youshouldstillmakesurethatyouhaveplentyofqueriesinthetestscenariosthatrepresent typicaluserbehaviorsrunningonaworkloadthatonlyshowcasestheslowestperformingpartsofthe cubewillnotrepresentarealproductionenvironment(unlessofcourse,theentirecubeispoorly designed). Asyourunthetests,youwilldiscoverthatcertainqueriesaremoredisruptivethanothers.Onegoalof testingistodiscoverwhatsuchquerieslooklike,sothatyoucaneitherscalethesystemtodealwith themorprovideguidanceforuserssothattheycanavoidexercisingthecubeinthiswayifpossible. Partofyourtestscenariosshouldalsoaimtoobservethecubesbehaviorasuserconcurrencygrows. YoushouldworkwithBIdevelopersandbusinessuserstounderstandwhattheworstcasescenariofor userconcurrencyis.Testingatthatconcurrencywillshakeoutpoorlyscalabledesignsandhelpyou configurethecubeandhardwareforbestperformanceandstability. 2.2.3 Load Generation Itishardtocreatealoadthatactuallylooksliketheexpectedproductionloaditrequiressignificant experienceandcommunicationwithenduserstocomeupwithafullyrepresentativesetofqueries.But afteryouhaveasetofqueriesthatmatchuserbehavior,youcanfeedthemintoatestharness.Analysis Servicesdoesnotshipwithatestharnessoutofthebox,butthereareseveralsolutionsavailablethat helpyougetstarted: ascmdYoucanusethiscommandlinetooltorunasetofqueriesagainstAnalysisServices.Itships withtheAnalysisServicessamplesandismaintainedonCodePlex. VisualStudioYoucanconfigureMicrosoftVisualStudiotogenerateloadagainstAnalysisServices,and youcanalsouseVisualStudiotovisuallyanalyzethatload. 37. 37 ThirdpartytoolsYoucanusetoolssuchasHPLoadRunnertogeneratehighconcurrencyload.Note thatAnalysisServicesalsosupportsanHTTPbasedinterface,whichmeansitmaybepossibletouse webstresstoolstogenerateload. Rollyourown:Wehaveseencustomerswritetheirowntestharnessesusing.NETandtheADOMD.NET interfacetoAnalysisServices.Usingthe.NETthreadinglibraries,itispossibletogeneratealotofuser loadfromasingleloadclient. Nomatterwhichloadtoolyouuse,youshouldmakesureyoucollecttheruntimeofallqueriesandthe PerformanceMonitorcountersforallruns.Thisdataenablesyoutomeasuretheeffectofanychanges youmakeduringyourtestruns.Whenyougenerateuserworkloadtherearealsosomeotherfactorsto consider. Firstofall,youshouldtestbothasequentialrunandaparallelrunofqueries.Thesequentialrungives youthebestpossibleruntimeofthequerywhilenootherusersareonthesystem.Theparallelrun enablesyoutoshakeoutissueswiththecubethataretheresultofmanyusersrunningconcurrently. Second,youshouldmakesurethetestscenarioscontainasufficientnumberofqueriessothatyouwill beabletorunthetestscenarioforsometime.Tostresstheserverproperly,andtoavoidqueryingthe samehotspotvaluesoverandoveragain,queriesshouldtouchavarietyofdatainthecube.Ifallyour queriestouchasmallsetofdimensionvalues,itwillhardlyrepresentarealproductionrun.Onewayto spreadqueriesoverawidersetofcellsinthecubeistousequerytemplates.Eachtemplatecanbeused togenerateasetofqueriesthatareallvariantsofthesamegeneraluserbehavior. Third,yourtestharnessshouldbeabletocreatereproducibletests.Ifyouareusingcodethatgenerates manyqueriesfromasmallsetoftemplates,makesurethatitgeneratesthesamequeriesonevery testrun.Ifnot,youintroduceanelementofrandomnessinthetestthatmakesithardtocompare differentruns. References: Ascmd.exeonMSDNhttp://msdn.microsoft.com/en us/library/ms365187%28v=sql.100%29.aspx AnalysisServicesCommunitysampleshttp://sqlsrvanalysissrvcs.codeplex.com/ o Describeshowtouseascmdforloadgeneration o ContainsVisualStudiosamplecodethatachievesasimilareffect HPLoadRunner https://h10078.www1.hp.com/cda/hpms/display/main/hpms_content.jsp?zn=bto&cp=111 12617^8_4000_100 2.2.4 Clearing Caches Tomaketestrunsreproducible,itisimportantthateachrunstartwiththeserverinthesamestateas thepreviousrun.Todothis,youmustclearoutanycachescreatedbyearlierruns. 38. 38 TherearethreecachesinAnalysisServicesthatyoushouldbeawareof: Theformulaenginecache Thestorageenginecache Thefilesystemcache Clearingformulaengineandstorageenginecaches:ThefirsttwocachescanbeclearedwiththeXMLA ClearCachecommand.Thiscommandcanbeexecutedusingtheascmdcommandlineutility: Clearingfilesystemcaches:Thefilesystemcacheisabithardertogetridofbecauseitresidesinside Windowsitself. IfyouhavecreatedaseparateWindowsvolumeforthecubedatabase,youcandismountthevolume itselfusingthefollowingcommand: fsutil.exevolumedismount Thisclearsthefilesystemcacheforthisdriveletterormountpoint.Ifthecubedatabaseresidesonlyon thislocation,runningthiscommandresultsinacleanfilesystemcache. Alternatively,youcanusetheutilityRAMMapfromsysinternals.Thisutilitynotonlyallowsyoutoread thefilesystemcachecontent,italsoallowsyoutopurgeit.Ontheemptymenu,clickEmptySystem WorkingSet,andthenclickEmptyStandbyList.Thisclearsthefilesystemcachefortheentiresystem. NotethatwhenRAMMapstartsup,ittemporarilyfreezesthesystemwhileitreadsthememorycontent thiscantakesometimeonalargemachine.Hence,RAMMapshouldbeusedwithcare. ThereiscurrentlyaCodePlexprojectcalledASStoredProceduresfoundat: http://asstoredprocedures.codeplex.com/wikipage?title=FileSystemCache.Thisprojectcontainscode forautilitythatenablesyoutoclearthefilesystemcacheusingastoredprocedurethatyoucanrun directlyonAnalysisServices. NotethatneitherFSUTILnorRAMMapshouldbeusedinproductioncubesbothcausedisruptionto usersconnectedtothecube.AlsonotethatneitherRAMMaporASStoredProceduresissupportedby Microsoft. 2.2.5 Actions During Testing for Operations Management Whileyourtestteamiscreatingandrunningthetestharness,youroperationsteamcanalsotakesteps topreparefordeployment. 39. 39 Testyourdatacollectionsetup:Testrunsgiveyouauniquechancetotryoutyourdatacollection proceduresbeforeyougointoproduction.Thedatacollectionyouperformcanalsobeusedtodrive earlyfeedbacktothedevelopmentteam. Understandserverutilization:Whileyoutest,youcangetanearlyinsightintoserverutilization.You willbeabletomeasurethememoryusageofthecubeandthewaythenumberofusersmapstoI/O loadandCPUutilization.Ifthecubeislargerthanmemory,youcanalsomeasuretheeffectof concurrencyandleaflevelscanontheI/Osubsystem.Remembertomeasuretheworstcaseexamples describedearliertounderstandwhattheimpactonthesystemis. Earlythreadtuning:Duringtesting,youcandiscoverthreadingbottlenecks,asdescribedinPart1.This enablesyoutogointoproductionwithpretunedsettingsthatimproveuserexperience,scalability,and hardwareutilizationofthesolution. 2.2.6 Actions After Testing Whentestingiscomplete,youhavereportsthatdescribetheruntimeofeachquery,andyoualsohave agreaterunderstandingoftheserverutilization.Thisisagoodtimetoreviewthedesignofthecube withtheBIdevelopersusingthenumbersyoucollectedduringtesting.Often,someeasywinscanbe harvestedatthispoint. Whenyouputacubeintoproduction,itisimportanttounderstandthelongtermeffectsofusers buildingspreadsheetsandreportsreferencingit.Considerthedependenciesthataregeneratedasthe cubeissuccessfullydeployedinadhocdatastructuresacrosstheorganization.Thedatamodelcreated andexposedbythecubewillbelinkedintospreadsheetsandreports,anditbecomeshardtomakedata modelchangeswithoutdisturbingusers.Fromanoperationalperspective,preproductiontestingis typicallyyourlastchancetorequestcheapdatamodelchangesfromyourBIdevelopersbeforebusiness usersinevitablylockthemselvesintothedatastructuresthatunlocktheirdata.Noticethatthisis differentfromtypical,staticreportinganddevelopmentcycles.Withstaticreports,theBIdevelopers areincontrolofthedependencies;ifyougiveusersExcelorotheradhocaccesstocubes,thatcontrolis lost.Explorativedatapowercomesataprice. 2.3 Tuning Query Performance Toimprovequeryperformance,youshouldunderstandthecurrentsituation,diagnosethebottleneck, andthenapplyoneofseveraltechniquesincludingoptimizingdimensiondesign,designingandbuilding aggregations,partitioning,andapplyingbestpractices.Theseshouldbethefirststopsforoptimization, beforediggingintoqueriesingeneral. Muchtimecanbeexpendedpursuingdeadendsitisimportanttofirstunderstandthenatureofthe problembeforeapplyingspecifictechniques.Togainthisunderstanding,itisoftenusefultohavea mentalmodelofhowthequeryengineworks.Wewillthereforestartwithabriefintroductiontothe AnalysisServicesqueryprocessor. 40. 40 2.3.1 Query Processor Architecture Tomakethequeryingexperienceasfastaspossibleforendusers,theAnalysisServicesquerying architectureprovidesseveralcomponentsthatworktogethertoefficientlyretrieveandevaluatedata. Thefollowingfigureidentifiesthethreemajoroperationsthatoccurduringqueryingsession management,MDXqueryexecution,anddataretrievalaswellastheservercomponentsthat participateineachoperation. Figure 1818: Analysis Services query processor architecture 2.3.1.1 Session Management ClientapplicationscommunicatewithAnalysisServicesusingXMLforAnalysis(XMLA)overTCP/IPor HTTP.AnalysisServicesprovidesanXMLAlistenercomponentthathandlesallXMLAcommunications betweenAnalysisServicesanditsclients.TheAnalysisServicesSessionManagercontrolshowclients connecttoanAnalysisServicesinstance.UsersauthenticatedbytheWindowsoperatingsystemand whohaveaccesstoatleastonedatabasecanconnecttoAnalysisServices.Afterauserconnectsto SessionManagement XMLA Listener Session Manager SecurityManager QueryProcessing QueryProcessor QueryProcCache DataRetrieval Storage Engine SECache DimensionData AttributeStore HierarchyStore MeasureGroupData FactData Aggregations ClientApp(MDX) 41. 41 AnalysisServices,theSecurityManagerdeterminesuserpermissionsbasedonthecombinationof AnalysisServicesrolesthatapplytotheuser.Dependingontheclientapplicationarchitectureandthe securityprivilegesoftheconnection,theclientcreatesasessionwhentheapplicationstarts,andthenit reusesthesessionforalloftheusersrequests.Thesessionprovidesthecontextunderwhichclient queriesareexecutedbythequeryprocessor.Asessionexistsuntilitisclosedbytheclientapplicationor theserver. 2.3.1.2 Query Processing ThequeryprocessorexecutesMDXqueriesandgeneratesacellsetorrowsetinreturn.Thissection providesanoverviewofhowthequeryprocessorexecutesqueries.Formoreinformationabout optimizingMDX,seeOptimizingMDX. Toretrievethedatarequestedbyaquery,thequeryprocessorbuildsanexecutionplantogeneratethe requestedresultsfromthecubedataandcalculations.Therearetwomajordifferenttypesofquery executionplans:cellbycell(nave)evaluationorblockmode(subspace)computation.Whichoneis chosenbytheenginecanhaveasignificantimpactonperformance.Formoreinformation,seeSubspace Computation. Tocommunicatewiththestorageengine,thequeryprocessorusestheexecutionplantotranslatethe datarequestintooneormoresubcuberequeststhatthestorageenginecanunderstand.Asubcubeisa logicalunitofquerying,caching,anddataretrievalitisasubsetofcubedatadefinedbythecrossjoin ofoneormoremembersfromasinglelevelofeachattributehierarchy.AnMDXquerycanberesolved intomultiplesubcuberequests,dependingtheattributegranularitiesinvolvedandcalculation complexity;forexample,aqueryinvolvingeverymemberoftheCountryattributehierarchy(assuming itsnotaparentchildhierarchy)wouldbesplitintotwosubcuberequests:onefortheAllmemberand anotherforthecountries. Asthequeryprocessorevaluatescells,itusesthequeryprocessorcachetostorecalculationresults.The primarybenefitsofthecachearetooptimizetheevaluationofcalculationsandtosupportthereuseof calculationresultsacrossusers(withthesamesecurityroles).Tooptimizecachereuse,thequery processormanagesthreecachelayersthatdeterminethelevelofcachereusability:global,session,and query. 2.3.1.2.1 Query Processor Cache DuringtheexecutionofanMDXquery,thequeryprocessorstorescalculationresultsinthequery processorcache.Theprimarybenefitsofthecachearetooptimizetheevaluationofcalculationsandto supportreuseofcalculationresultsacrossusers.Tounderstandhowthequeryprocessorusescaching duringqueryexecution,considerthefollowingexample:YouhaveacalculatedmembercalledProfit Margin.WhenanMDXqueryrequestsProfitMarginbySalesTerritory,thequeryprocessorcachesthe nonnullProfitMarginvaluesforeachSalesTerritory.Tomanagethereuseofthecachedresultsacross users,thequeryprocessordistinguishesdifferentcontextsinthecache: 42. 42 QueryContextcontainstheresultofcalculationscreatedbyusingtheWITHkeywordwithina query.Thequerycontextiscreatedondemandandterminateswhenthequeryisover. Therefore,thecacheofthequerycontextisnotsharedacrossqueriesinasession. SessionContextcontainstheresultofcalculationscreatedbyusingtheCREATEstatement withinagivensession.Thecacheofthesessioncontextisreusedfromrequesttorequestinthe samesession,butitisnotsharedacrosssessions. GlobalContextcontainstheresultofcalculationsthataresharedamongusers.Thecacheof theglobalcontextcanbesharedacrosssessionsifthesessionssharethesamesecurityroles. Thecontextsaretieredintermsoftheirlevelofreuse.Atthetop,thequerycontextiscanbereused onlywithinthequery.Atthebottom,theglobalcontexthasthegreatestpotentialforreuseacross multiplesessionsandusersbecausethesessioncontextwillderivefromtheglobalcontextandthe querycontextwillderiveitselffromthesessioncontext. Figure 1919: Cache context layers Duringexecution,everyMDXquerymustreferenceallthreecontextstoidentifyallofthepotential calculationsandsecurityconditionsthatcanimpacttheevaluationofthequery.Forexample,toresolve aquerythatcontainsaquerycalculatedmember,thequeryprocessorcreatesaquerycontextto resolvethequerycalculatedmember,createsasessioncontexttoevaluatesessioncalculations,and createsaglobalcontexttoevaluatetheMDXscriptandretrievethesecuritypermissionsoftheuser whosubmittedthequery.Notethatthesecontextsarecreatedonlyiftheyarentalreadybuilt.After theyarebuilt,theyarereusedwherepossible. Eventhoughaqueryreferencesallthreecontexts,itwilltypicallyusethecacheofasinglecontext.This meansthatonaperquerybasis,thequeryprocessormustselectwhichcachetouse.Thequery processoralwaysattemptstousethebroadlyapplicablecachedependingonwhetherornotitdetects thepresenceofcalculationsatanarrowercontext. Ifthequeryprocessorencounterscalculationscreatedatquerytime,italwaysusesthequerycontext, evenifaqueryalsoreferencescalculationsfromtheglobalcontext(thereisanexceptiontothis 43. 43 querieswithquerycalculatedmembersoftheformAggregate()dosharethesessioncache).If therearenoquerycalculations,buttherearesessioncalculations,thequeryprocessorusesthesession cache.Thequeryprocessorselectsthecachebasedonthepresenceofanycalculationinthescope.This behaviorisespeciallyrelevanttouserswithMDXgeneratingfrontendtools.Ifthefrontendtool createsanysessioncalculationsorquerycalculations,theglobalcacheisnotused,evenifyoudonot specificallyusethesessionorquerycalculations. Thereareothercalculationscenariosthatimpacthowthequeryprocessorcachescalculations.When youcallastoredprocedurefromanMDXcalculation,theenginealwaysusesthequerycache.Thisis becausestoredproceduresarenondeterministic(meaningthatthereisnoguaranteewhatthestored procedurewillreturn).Asaresult,afteranondeterministiccalculationisencounteredduringthequery, nothingiscachedgloballyorinthesessioncache.Instead,theremainingcalculationsarestoredinthe querycache.Inaddition,thefollowingscenariosdeterminehowthequeryprocessorcachescalculation results: TheuseofMDXfunctionsthatarelocaledependent(suchasCaptionor.Properties)prevents theuseoftheglobalcache,becausedifferentsessionsmaybeconnectedwithdifferentlocales andcachedresultsforonelocalemaynotbecorrectforanotherlocale. Theuseofcellsecurity;functionssuchasUserName,StrToSet,StrToMember,andStrToTuple; orLookupCubefunctionsintheMDXscriptorinthedimensionorcellsecuritydefinitiondisable theglobalcache.Thatis,justoneexpressionthatusesanyofthesefunctionsorfeatures disablesglobalcachingfortheentirecube. IfvisualtotalsareenabledforthesessionbysettingthedefaultMDXVisualModepropertyin theAnalysisServicesconnectionstringto1,thequeryprocessorusesthequerycacheforall queriesissuedinthatsession. IfyouenablevisualtotalsforaquerybyusingtheMDXVisualTotalsfunction,thequery processorusesthequerycache. Queriesthatusethesubselectsyntax(SELECTFROMSELECT)orarebasedonasessionsubcube (CREATESUBCUBE)resultinthequeryor,respectively,sessioncachetobeused. Arbitraryshapescanonlyusethequerycacheiftheyareusedinasubselect,intheWHERE clause,orinacalculatedmember.Anarbitraryshapeisanysetthatcannotbeexpressedasa crossjoinofmembersfromthesamelevelofanattributehierarchy.Forexample,{(Food,USA), (Drink,Canada)}isanarbitraryset,asis{customer.geography.USA,customer.geography.[British Columbia]}.Notethatanarbitraryshapeonthequeryaxisdoesnotlimittheuseofanycache. Basedonthisbehavior,whenyourqueryingworkloadcanbenefitfromreusingdataacrossusers,itisa goodpracticetodefinecalculationsintheglobalscope.Anexampleofthisscenarioisastructured reportingworkloadwhereyouhavefewsecurityroles.Bycontrast,ifyouhaveaworkloadthatrequires individualdatasetsforeachuser,suchasinanHRcubewhereyouhavemanysecurityrolesoryouare 44. 44 usingdynamicsecurity,theopportunitytoreusecalculationresultsacrossusersislessenedor eliminated.Asaresult,theperformancebenefitsassociatedwithreusingthequeryprocessorcacheare notashigh. 2.3.1.3 Data Retrieval Whenyouqueryacube,thequeryprocessorbreaksthequeryintosubcuberequestsforthestorage engine.Foreachsubcuberequest,thestorageenginefirstattemptstoretrievedatafromthestorage enginecache.Ifnodataisavailableinthecache,itattemptstoretrievedatafromanaggregation.Ifno aggregationispresent,itmustretrievethedatafromthefactdatafromameasuregroupspartition data. RetrievingdatafromapartitionrequiresI/Oactivity.ThisI/Ocaneitherbeservedfromthefilesystem cacheorfromdisk.AdditionaldetailsoftheI/OsubsystemofAnalysisServicescanbefoundinPart2. Figure 2020: High-level overview of the data retrieval process 2.3.1.3.1 Storage Engine Cache Thestorageenginecacheisalsoknownasthedatacacheregistrybecauseitiscomposedofthe dimensionandmeasuregroupcachesthatarethesamestructurally.Whenarequestismadefromthe 45. 45 AnalysisServicesformulaenginetothestorageengine,itsendsarequestintheformofasubcube describingthestructureofthedatarequestandadatacachestructurethatwillcontaintheresultsof therequest.Usingthedatacacheregistryindexes,itattemptstofindacorrespondingsubcube: Ifthereisamatchingsubcube,thecorrespondingdatacacheisreturned. Ifasubcubesupersetisfound,anewdatacacheisgeneratedandtheresultsarefilteredtofit thesubcuberequest. Iflowergraindataexists,thedatacacheregistrycanaggregatethisdataandmakeitavailable aswellandthenewsubcubeanddatacachearealsoregisteredinthecacheregistry. Ifdatadoesnotexist,therequestgoestothestorageengineandtheresultsarecachedinthe cacheregistryforfuturequeries. AnalysisServicesallocatesmemoryviamemoryholdersthatcontainstatisticalinformationaboutthe amountofmemorybeingused.Memoryholdersareintheformofnonshrinkableandshrinkable memory;eachcombinationofasubcubeanddatacacheformsasingleshrinkablememoryholder. WhenAnalysisServicesisunderheavymemorypressure,cleanerthreadsremoveshrinkablememory. Therefore,ensureyoursystemhasenoughmemory;ifitdoesnot,yourdatacacheregistrywillbe clearedout(resultinginslowerqueryperformance)whenitisplacedundermemorypressure. 2.3.1.3.2 Aggressive Data Scanning Sometimes,intheevaluationofanexpression,moredataisrequestedthanrequiredtodeterminethe result. Ifyoususpectmoredataisbeingretrievedthanisrequired,youcanuseSQLServerProfilertodiagnose howaqueryintosubcubequeryeventsandpartitionscans.Forsubcubescans,checktheverbose subcubeeventandwhethermoremembersthanrequiredareretrievedfromthestorageengine.For smallcubes,thislikelyisntaproblem.Forlargercubeswithmultiplepartitions,itcangreatlyreduce queryperformance.Thefollowingfiguredemonstrateshowasinglequerysubcubeeventresultsin partitionscans. Therearetwopotentialsolutionstothis.Ifacalculationexpressioncontainsanarbitraryshape(thisis definedinthesectiononthequeryprocessorcache),thequeryprocessormaynotbeabletodetermine thatthedataislimitedtoasinglepartitionandrequestdatafromallpartitions.Trytoeliminatethe arbitraryshape. Figure2121:Aggressivepartitionscanning 46. 46 Othertimes,thequeryprocessorissimplyoverlyaggressiveinaskingfordata.Forsmallcubes,this doesntmatter,butforverylargecubes,itdoes.Ifyouobservethisbehavior,potentialsolutionsinclude thefollowing: ContactMicrosoftCustomerServiceandSupportforfurtheradvice. DisablePrefetch=1(thisisdoneintheconnectionstring):SometimesAnalysisServicesrequests additionaldatafromthesourcetoprepopulatethecache;itmayhelptoturnitoffsothat AnalysisServicesdoesnotrequesttoomuchdata. 2.3.2 Query Processor Internals ThereareseveralchangestoqueryprocessorinternalsinSQLServer2008AnalysisServicesthatare applicabletoday(comparedtoSQLServer2005AnalysisServices).Inthissection,thesechangesare discussedbeforespecificoptimizationtechniquesareintroduced. 2.3.2.1 Subspace Computation Thekeyideabehindsubspacecomputationisbestintroducedbycontrastingitwithacellbycell evaluationofacalculation.(Thisisalsoknownasanavecalculation.)Consideratrivialcalculation RollingSumthatsumsthesalesforthepreviousyearandthecurrentyear,andaquerythatrequeststhe RollingSumfor2005forallProducts. RollingSum=(Year.PrevMember,Sales)+Sales SELECT2005oncolumns,Product.MembersonrowsWHERERollingSum Acellbycellevaluationofthiscalculationproceedsasrepresentedinthefollowingfigure. 47. 47 Figure 2222: Cell-by-cell evaluation The10cellsfor[2005,AllProducts]areeachevaluatedinturn.Foreach,thepreviousyearislocated, andthenthesalesvalueisobtainedandthenaddedtothesalesforthecurrentyear.Therearetwo significantperformanceissueswiththisapproach. Firstly,ifthedataissparse(thatis,thinlypopulated),cellsarecalculatedeventhoughtheyareboundto returnanullvalue.Inthepreviousexample,calculatingthecellsforanythingbutProduct3andProduct 6isawasteofeffort.Theimpactofthiscanbeextremeinasparselypopulatedcube,thedifference canbeseveralordersofmagnitudeinthenumbersofcellsevaluated. Secondly,evenifthedataistotallydense,meaningthateverycellhasavalueandthereisno wastedeffortvisitingemptycells,thereismuchrepeatedeffort.Thesamework(forexample,getting thepreviousYearmember,settingupthenewcontextforthepreviousYearcell,checkingforrecursion) isredoneforeachProduct.Itwouldbemuchmoreefficienttomovethisworkoutoftheinnerloopof evaluatingeachcell. Nowconsiderthesameexampleperformedusingsubspacecomputation.Insubspacecomputation,the engineworksitswaydownanexecutiontreedeterminingwhatspacesneedtobefilled.Giventhe query,thefollowingspaceneedstobecomputed,where*meanseverymemberoftheattribute hierarchy. [Product.*,2005,RollingSum] Giventhecalculation,thismeansthatthefollowingspaceneedstobecomputedfirst. [Product.*,2004,Sales] 48. 48 Next,thefollowingspacemustbecomputed. [Product.*,2005,Sales] Finally,the+operatorneedstobeaddedtothosetwospaces. IfSaleswereitselfcoveredbycalculations,thespacesnecessarytocalculateSaleswouldbedetermined andthetreewouldbeexpanded.InthiscaseSalesisabasemeasure,sothestorageenginedataisused tofillthetwospacesattheleaves,andthen,workingupthetree,theoperatorisappliedtofillthe spaceattheroot.Hencetheonerow(Product3,2004,3)andthetworows{(Product3,2005,20), (Product6,2005,5)}areretrieved,andthe+operatorappliedtothemtoyieldsthefollowingresult. Figure 2323: Execution plan The+operatoroperatesonspaces,notsimplyscalarvalues.Itisresponsibleforcombiningthetwogiven spacestoproduceaspacethatcontainseachproductthatappearsineitherspacewiththesummed value.Thisisthequeryexecutionplan.Notethatitoperatesonlyondatathatcouldcontributetothe result.Thereisnonotionofthetheoreticalspaceoverwhichthecalculationmustbeperformed. Aqueryexecutionplanisnotoneortheotherbutcancontainbothsubspaceandcellbycellnodes. Somefunctionsarenotsupportedinsubspacemode,causingtheenginetofallbacktocellbycell mode.Butevenwhenevaluatinganexpressionincellbycellmode,theenginecanreturntosubspace mode. 2.3.2.2 Expensive vs. Inexpensive Query Plans Itcanbecostlytobuildaqueryplan.Infact,thecostofbuildinganexecutionplancanexceedthecost ofqueryexecution.TheAnalysisServicesenginehasacoarseclassificationschemeexpensiveversus inexpensive.Aplanisdeemedexpensiveifcellbycellmodeisusedorifcubedatamustbereadtobuild theplan.Otherwisetheexecutionplanisdeemedinexpensive. 49. 49 Cubedataisusedinqueryplansinseveralscenarios.Somequeryplansresultinthemappingofone membertoanotherbecauseofMDXfunctionssuchasPrevMemberandParent.Themappingsarebuilt fromcubedataandmaterializedduringtheconstructionofthequeryplans.TheIIf,CASE,andIF functionscangenerateexpensivequeryplansaswell,shoulditbenecessarytoreadcubedatainorder topartitioncubespaceforevaluationofoneofthebranches.Formoreinformation,seeIIfFunctionin SQLServer2008AnalysisServices. 2.3.2.3 Expression Sparsity Anexpressionssparsityreferstothenumberofcellswithnonnullvaluescomparedtothetotalnumber ofcellsintheresultoftheevaluationoftheexpression.Iftherearerelativelyfewnonnullvalues,the expressionistermedsparse.Iftherearemany,theexpressionisdense.Asweshallseelater,whether anexpressionissparseordensecaninfluencethequeryplan. Buthowcanyoutellwhetheranexpressionisdenseorsparse?Considerasimplenoncalculated measureisitdenseorsparse?InOLAP,basefactmeasuresareconsideredsparsebytheAnalysis Servicesengine.Thismeansthatthetypicalmeasuredoesnothavevaluesforeveryattributemember. Forexample,acustomerdoesnotpurchasemostproductsonmostdaysfrommoststores.Infactits thequitetheopposite.Atypicalcustomerpurchasesasmallpercentageofallproductsfromasmall numberofstoresonafewdays.Thefollowingtablelistssomeothersimplerulesforpopular expressions. Expression Sparse/dense Regularmeasure Sparse ConstantValue Dense(excludingconstantnullvalues, true/falsevalues) Scalarexpression;forexample,count, .properties Dense + Sparseifbothexp1andexp2aresparse; otherwisedense * Sparseifeitherexp1orexp2issparse; otherwisedense / Sparseifissparse;otherwisedense Sum(,) Aggregate(,) Inheritedfrom IIf(,,) Determinedbysparsityofdefaultbranch (refertoIIffunction) Formoreinformationaboutsparsityanddensity,seeGrossmargindensevs.sparseblockevaluation modeinMDX(http://sqlblog.com/blogs/mosha/archive/2008/11/01/grossmargindensevssparse blockevaluationmodeinmdx.aspx). 50. 50 2.3.2.4 Default Values Everyexpressionhasadefaultvaluethevaluetheexpressionassumesmostofthetime.Thequery processorcalculatesanexpressionsdefaultvalueandreusesacrossmostofitsspace.Mostofthetime thisisnullbecauseoftentimes(butnotalways)theresultofanexpressionwithnullinputvaluesisnull. Theenginecanthencomputethenullresultonce,andthenitneedstocomputeonlyvaluesforthe muchreducednonnullspace. AnotherimportantuseofthedefaultvaluesisintheconditionintheIIffunction.Knowingwhichbranch isevaluatedmoreoftendrivestheexecutionplan.Thedefaultvaluesofsomepopularexpressionsare listedinthefollowingtable. Expression Defaultvalue Comment Regularmeasure Null None. IsEmpty() True Themajorityoftheoreticalspaceis occupiedbynullvalues.Therefore, IsEmptywillreturnTruemostoften. = True Valuesforbothmeasuresareprincipally null,sothisevaluatestoTruemostofthe time. IS False Thisisdifferentthancomparingvalues theengineassumesthatdifferent membersarecomparedmostofthetime. 2.3.2.5 Varying Attributes Cellvaluesmostlydependonattributecoordinates.Butsomecalculationsdonotdependonevery attribute.Forexample,thefollowingexpressiondependsonlyontheCustomerattributeinthecustomer dimension. [Customer].[CustomerGeography].properties("PostalCode") Whenthisexpressionisevaluatedoverasubspaceinvolvingotherattributes,anyattributesthe expressiondoesntdependoncanbeeliminated,andthentheexpressioncanberesolvedandprojected backovertheoriginalsubspace.Theattributesanexpressiondependsonaretermeditsvarying attributes.Forexample,considerthefollowingquery. withmembermeasures.Zipas [Customer].[CustomerGeography].currentmember.properties("PostalCode") selectmeasures.zipon0, [Product].[Category].memberson1 from[AdventureWorks] 51. 51 where[Customer].[CustomerGeography].[Customer].&[25818] Theexpressiondependsonthecustomerattributeandnotthecategoryattribute;therefore,customer isavaryingattributeandcategoryisnot.Inthiscasetheexpressionisevaluatedonlyonceforthe customerandnotasmanytimesasthereareproductcategories. 2.3.3 Optimizing MDX Debuggingcalculationperformanceissuesacrossacubecanbedifficultiftherearemanycalculations. Thefirststepistotrytonarrowdownwheretheproblemexpressionisandthenapplybestpracticesto theMDX.Inordertonarrowdownaproblem,youwillfirstneedabaseline. 2.3.3.1 Baselining Query Speeds Beforebeginningoptimization,youneedreproduciblecoldcachebaselinemeasurements. Todothis,youshouldbeawareofthefollowingthreeAnalysisServicescaches: Theformulaenginecache Thestorageenginecache Thefilesystemcache BoththeAnalysisServicesandtheoperatingsystemcachesneedtobeclearedbeforeyoustarttaking measurements. 2.3.3.1.1 Clearing the Analysis Services Caches TheAnalysisServicesformulaengineandstorageenginecachescanbeclearedwiththeXMLA ClearCachecommand.YoucanuseSQLServerManagementStudiotorunClearCache. 2.3.3.1.2 Clearing the Operating System Caches ThefilesystemcacheisabithardertogetridofbecauseitresidesinsideWindowsitself.Youcanuse anyofthefollowingtoolstoperformthistask: Fsutil.exe:WindowsFileSystemUtility IfyouhavecreatedaseparateWindowsvolumeforthecubedatabase,youcandismountthe volumeitselfusingthefollowingcommand: fsutil.exevolumedismount Thisclearsthefilesystemcacheforthisdriveletterormountpoint.Ifthecubedatabaseresides onlyonthislocation,runningthiscommandresultsinacleanfilesystemcache. 52. 52 RAMMap:Sysinternalstool Alternatively,youcanuseRAMMapfromSysinternals(asofthiswriting,RAMMapv1.11is availableat:http://technet.microsoft.com/enus/sysinternals/ff700229.aspx).RAMMapcan helpyouunderstandhowWindowsmanagesmemory.Thistooln