Real-Life Data Mart Design Challenges Leslie Tierstein TopTier Consulting.

Download Real-Life Data Mart Design Challenges Leslie Tierstein TopTier Consulting.

Post on 01-Jan-2016

213 views

Category:

Documents

0 download

TRANSCRIPT

Real-Life Data MartDesign ChallengesLeslie Tierstein TopTier ConsultingnewScale Company ConfidentialnewScaleOverviewSome of our data mart design challengesDynamically defined dimensionsPotentially 100s of dimensionsLoading XML data into the data martAn incremental refresh that includes updatesSupporting multiple databases with a standard toolset and code baseDesigning the physical database to map to the BI tools business viewnewScale Company ConfidentialnewScaleThe OLTP System ManagementAn IT department defines a catalog of all the services it provides to usersEnd User Computing: Add VisioAdvanced Computing: Configure a serverUsers request a serviceSystem tracks the service request, all tasks required to complete itnewScale Company ConfidentialnewScaleThe OLTP SystemRequestor View Upgrade MemoryUser fills out and submits an order (requisition)newScale Company ConfidentialnewScaleThe OLTP SystemIT View Upgrade MemoryUser request has criteria for completionRequest is routed to appropriate people for authorization and task deliverynewScale Company ConfidentialnewScaleThe OLTP SystemService Designers ViewStandard information (duration, cost, delivery plan, authorizations) defined via UIService-specific data entry requirements defined via dictionariesnewScale Company ConfidentialnewScaleThe OLTP SystemVastly simplified ERD of transactional dataMetadata on service definitions is also maintainedWhere is form (dictionary-based) data?newScale Company ConfidentialnewScaleData Mart RequirementsData mart must include form dataRefresh needs to run in nightly maintenance windowMinimize time to minimize impact on global accountsData mart must include data on both open (in progress) and completed requisitionsRefresh must perform updatesnewScale Company ConfidentialnewScaleExtracting Form (XML) DataExtract XML data from the requisition BLOBCognos8 includes Composite, from IBM, which can extract data from XMLPerformance is problematicAdding a custom plug-in in the ETL tool provided insufficient power/flexibility Custom Java program was neededWould the custom program perform acceptably?newScale Company ConfidentialnewScaleETL Performance Studies100,000 requisitions per month (an order of magnitude higher than current usage)newScale Company ConfidentialnewScaleETL Performance StudiesnewScale Company ConfidentialnewScaleETL Performance StudiesAverages are good, but know your outlyersIf only to be able to warn users about potential issuesHotel/Hospitality Company:30% of serviceshave100-160 fields, with a maximum of 250. About 60% have 70-100 fieldsThere are dictionaries withabout 40fields, with a maximum of 43.Inserts are easy, updates NOTnewScale Company ConfidentialnewScaleUpdating Data Mart ContentsData Mart contents need to be updatedTasks and Requisitions are added to the data mart as soon as the requisition is submittedStatus of requisitions may change, as well as completion dateTask status and info is updated as the task is performed (completed)newScale Company ConfidentialnewScaleUpdating Data Mart ContentsArgh! Updating data mart contents is expensiveUpdates require indexesInserts perform much better without indexesDeletes (from large tables) are a disasterAvoid updates by placing data to be updated in a partition that is truncatedRefreshed data is written to the same partition or to a permanent partition when completenewScale Company ConfidentialnewScaleUpdating Data Mart ContentsStill an issuePartitioning not implemented because:Small table size and growth ratesLack of support in ETL toolSQLServer (partition views only!)What is partition key?Using insert/update modelTime will tell if this is robust enoughnewScale Company ConfidentialnewScaleDimensional ModelingTable, View or Metadata?Design the dimensional model in conjunction with the business view of the modelDate and People dimensionsDynamic Data Model to accommodate form dataOther companies have done this, but their scenarios are not as complex Oracle Noetix Views and Noetix data warehouse Flex fieldsnewScale Company ConfidentialnewScaleDate DimensionsDate required as a dimensional keyDue Date for task and requisitionClosed Date for task and requisitionStart Date for task and requisitionComplete date and time required as an attribute of the fact for calculationsCompliance with SLAs and OLAsnewScale Company ConfidentialnewScaleDate DimensionsHow many database objects for the date dimensions?3 tables, 3 subject areas1 table, 3 views, 3 subject areas1 table, 3 subject areasPros and cons of each approachETL, debugging, user interface in ReportNetWhere are relationships defined?In the BI toolIn the databasenewScale Company ConfidentialnewScaleDate DimensionsStatically or dynamically populatedSparse or dense dimension?Influences prompts in BI toolInteracts with decision re: number of tablesnewScale Company ConfidentialnewScalePeople DimensionsRequired DimensionsCustomer for requisitionRequestor for requisitionPerformer for taskHow many database objects?Do people fulfilling different roles intersect? And how much?Implications for prompts/filtersPerformerRequestornewScale Company ConfidentialnewScaleDimensional ModelHow to model the dimensions which correspond to dynamically defined dictionariesEach attribute in a dictionary is a dimension rejectedW-A-A-Y too many dimensionsJunk, Degenerate or Demographic dimensionsReference dimensions newScale Company ConfidentialnewScaleJunk DimensionsPopulate the dictionary dimension using a unique combination of values for each dictionary across all requisitionsThe fact table has a column for a relationship with each dimension many will be N/AA complex fact/dimensional build would be requiredSee Kimball, Design Tips 46, 48newScale Company ConfidentialnewScaleReference DimensionsEach dimension has a 1:1 relationship with the fact tableBest to not have basic analytical metrics and frequent query topics in the reference dimensionGreatly reduces the size of the fact tableSee Kimball, Design Tip 86newScale Company ConfidentialnewScaleRequisition Star SchemanewScale Company ConfidentialnewScaleTask Star SchemanewScale Company ConfidentialnewScaleTool SuitenewScale Company ConfidentialnewScaleData Mart RefreshCustom Java program is used in conjunction with Data Manager ETLForm data(BLOB/XML)CustomJava programOLTPRelational dataMetadata for mapping tables to dictionariesCognos ETL(Data Manager)Data mart datanewScale Company ConfidentialnewScaleData Mart Refresh and metadata creationDictionaries and their attributes must be mapped to tables/columns in the data martThis mapping is used in the BI business viewnewScale Company ConfidentialnewScaleETL/Metadata Process FlowSpecify ETL for facts and traditional dimensions Uses Cognos8 ETL tool, Data ManagerETL definitionsCognos8 Metadata Processor ETL Metadata(Data Manager Project)static Data ViewData Mart Metadata(Framework Mgr Catalog)Framework Manager ScriptFramework Manager CatalogBusiness ViewnewScale Company ConfidentialnewScaleETL/Metadata Process FlowTransform the Data Manager project into a Framework Manager catalog(Sigh) two tools, two different sets of metadataData Managers ETL destination tables become the basis of the BI tools metadataETL definitionsCognos8 Metadata Processor *ETL Metadata(Data Manager Project)static Data ViewData Mart Metadata(Framework Mgr Catalog)Framework Manager ScriptFramework Manager CatalogBusiness ViewnewScale Company ConfidentialnewScaleETL/Metadata Process FlowFor dictionaries (junk dimensions) use the Framework Manager API to create a business view Appropriate names for dictionary dimensions and data elementsnewScale Company ConfidentialnewScaleEnd User Business ViewDimensionsDynamic dimensionsBased on dictionaries designated as reportableDifferent for each site Universal/static dimensionsFound at all sitesnewScale Company ConfidentialnewScaleReportStudioCreate a report listing tasks performed, the performer, the service in which they were performed, and the current statusnewScale Company ConfidentialnewScaleConclusions (1)XML extractionJava programming was much more efficient than ETL tools available to usDimensional modelingJunk dimensions, reference dimensions, degenerate dimensions who knew?Subscribe to KimballHire short-term consultantsnewScale Company ConfidentialnewScaleConclusions (2)ETL PerformanceSatisfactoryNeed sophisticated scheduler to track all jobsAdhoc Report PerformanceInsufficient testing on the largest user data volumes (IMHO)The software went live on January 31, 2006Ask me in a few monthsnewScale Company ConfidentialnewScaleAbout the AuthorLeslie Tierstein is a Senior Technical Architect at newScale, Inc, in Foster City California. She can be reached at ltierstein@earthlink.net or leslie.tierstein@newscale.comThis paper is available on line at: http://home.earthlink.net/~ltierstein

Recommended

View more >