Real-Life Data Mart Design Challenges Leslie Tierstein TopTier Consulting.

Real-Life Data Mart Design Challenges Leslie Tierstein TopTier Consulting. slide 0
Download Real-Life Data Mart Design Challenges Leslie Tierstein TopTier Consulting.

Post on 01-Jan-2016

213 views

Category:

Documents

0 download

TRANSCRIPT

  • Real-Life Data MartDesign ChallengesLeslie Tierstein TopTier Consulting

    newScale Company ConfidentialnewScale

    OverviewSome of our data mart design challengesDynamically defined dimensionsPotentially 100s of dimensionsLoading XML data into the data martAn incremental refresh that includes updatesSupporting multiple databases with a standard toolset and code baseDesigning the physical database to map to the BI tools business view

    newScale Company ConfidentialnewScale

    The OLTP System ManagementAn IT department defines a catalog of all the services it provides to usersEnd User Computing: Add VisioAdvanced Computing: Configure a serverUsers request a serviceSystem tracks the service request, all tasks required to complete it

    newScale Company ConfidentialnewScale

    The OLTP SystemRequestor View Upgrade MemoryUser fills out and submits an order (requisition)

    newScale Company ConfidentialnewScale

    The OLTP SystemIT View Upgrade MemoryUser request has criteria for completionRequest is routed to appropriate people for authorization and task delivery

    newScale Company ConfidentialnewScale

    The OLTP SystemService Designers ViewStandard information (duration, cost, delivery plan, authorizations) defined via UIService-specific data entry requirements defined via dictionaries

    newScale Company ConfidentialnewScale

    The OLTP SystemVastly simplified ERD of transactional dataMetadata on service definitions is also maintainedWhere is form (dictionary-based) data?

    newScale Company ConfidentialnewScale

    Data Mart RequirementsData mart must include form dataRefresh needs to run in nightly maintenance windowMinimize time to minimize impact on global accountsData mart must include data on both open (in progress) and completed requisitionsRefresh must perform updates

    newScale Company ConfidentialnewScale

    Extracting Form (XML) DataExtract XML data from the requisition BLOBCognos8 includes Composite, from IBM, which can extract data from XMLPerformance is problematicAdding a custom plug-in in the ETL tool provided insufficient power/flexibility Custom Java program was neededWould the custom program perform acceptably?

    newScale Company ConfidentialnewScale

    ETL Performance Studies100,000 requisitions per month (an order of magnitude higher than current usage)

    newScale Company ConfidentialnewScale

    ETL Performance Studies

    newScale Company ConfidentialnewScale

    ETL Performance StudiesAverages are good, but know your outlyersIf only to be able to warn users about potential issuesHotel/Hospitality Company:30% of serviceshave100-160 fields, with a maximum of 250. About 60% have 70-100 fieldsThere are dictionaries withabout 40fields, with a maximum of 43.Inserts are easy, updates NOT

    newScale Company ConfidentialnewScale

    Updating Data Mart ContentsData Mart contents need to be updatedTasks and Requisitions are added to the data mart as soon as the requisition is submittedStatus of requisitions may change, as well as completion dateTask status and info is updated as the task is performed (completed)

    newScale Company ConfidentialnewScale

    Updating Data Mart ContentsArgh! Updating data mart contents is expensiveUpdates require indexesInserts perform much better without indexesDeletes (from large tables) are a disasterAvoid updates by placing data to be updated in a partition that is truncatedRefreshed data is written to the same partition or to a permanent partition when complete

    newScale Company ConfidentialnewScale

    Updating Data Mart ContentsStill an issuePartitioning not implemented because:Small table size and growth ratesLack of support in ETL toolSQLServer (partition views only!)What is partition key?Using insert/update modelTime will tell if this is robust enough

    newScale Company ConfidentialnewScale

    Dimensional ModelingTable, View or Metadata?Design the dimensional model in conjunction with the business view of the modelDate and People dimensionsDynamic Data Model to accommodate form dataOther companies have done this, but their scenarios are not as complex Oracle Noetix Views and Noetix data warehouse Flex fields

    newScale Company ConfidentialnewScale

    Date DimensionsDate required as a dimensional keyDue Date for task and requisitionClosed Date for task and requisitionStart Date for task and requisitionComplete date and time required as an attribute of the fact for calculationsCompliance with SLAs and OLAs

    newScale Company ConfidentialnewScale

    Date DimensionsHow many database objects for the date dimensions?3 tables, 3 subject areas1 table, 3 views, 3 subject areas1 table, 3 subject areasPros and cons of each approachETL, debugging, user interface in ReportNetWhere are relationships defined?In the BI toolIn the database

    newScale Company ConfidentialnewScale

    Date DimensionsStatically or dynamically populatedSparse or dense dimension?Influences prompts in BI toolInteracts with decision re: number of tables

    newScale Company ConfidentialnewScale

    People DimensionsRequired DimensionsCustomer for requisitionRequestor for requisitionPerformer for taskHow many database objects?Do people fulfilling different roles intersect? And how much?Implications for prompts/filters

    PerformerRequestor

    newScale Company ConfidentialnewScale

    Dimensional ModelHow to model the dimensions which correspond to dynamically defined dictionariesEach attribute in a dictionary is a dimension rejectedW-A-A-Y too many dimensionsJunk, Degenerate or Demographic dimensionsReference dimensions

    newScale Company ConfidentialnewScale

    Junk DimensionsPopulate the dictionary dimension using a unique combination of values for each dictionary across all requisitionsThe fact table has a column for a relationship with each dimension many will be N/AA complex fact/dimensional build would be requiredSee Kimball, Design Tips 46, 48

    newScale Company ConfidentialnewScale

    Reference DimensionsEach dimension has a 1:1 relationship with the fact tableBest to not have basic analytical metrics and frequent query topics in the reference dimensionGreatly reduces the size of the fact tableSee Kimball, Design Tip 86

    newScale Company ConfidentialnewScale

    Requisition Star Schema

    newScale Company ConfidentialnewScale

    Task Star Schema

    newScale Company ConfidentialnewScale

    Tool Suite

    newScale Company ConfidentialnewScale

    Data Mart RefreshCustom Java program is used in conjunction with Data Manager ETL

    Form data(BLOB/XML)

    CustomJava program

    OLTPRelational data

    Metadata for mapping tables to dictionaries

    Cognos ETL(Data Manager)

    Data mart data

    newScale Company ConfidentialnewScale

    Data Mart Refresh and metadata creationDictionaries and their attributes must be mapped to tables/columns in the data martThis mapping is used in the BI business view

    newScale Company ConfidentialnewScale

    ETL/Metadata Process FlowSpecify ETL for facts and traditional dimensions Uses Cognos8 ETL tool, Data Manager

    ETL definitions

    Cognos8 Metadata Processor

    ETL Metadata(Data Manager Project)

    static Data View

    Data Mart Metadata(Framework Mgr Catalog)

    Framework Manager Script

    Framework Manager CatalogBusiness View

    newScale Company ConfidentialnewScale

    ETL/Metadata Process FlowTransform the Data Manager project into a Framework Manager catalog(Sigh) two tools, two different sets of metadataData Managers ETL destination tables become the basis of the BI tools metadata

    ETL definitions

    Cognos8 Metadata Processor *

    ETL Metadata(Data Manager Project)

    static Data View

    Data Mart Metadata(Framework Mgr Catalog)

    Framework Manager Script

    Framework Manager CatalogBusiness View

    newScale Company ConfidentialnewScale

    ETL/Metadata Process FlowFor dictionaries (junk dimensions) use the Framework Manager API to create a business view Appropriate names for dictionary dimensions and data elements

    newScale Company ConfidentialnewScale

    End User Business ViewDimensionsDynamic dimensionsBased on dictionaries designated as reportableDifferent for each site Universal/static dimensionsFound at all sites

    newScale Company ConfidentialnewScale

    ReportStudioCreate a report listing tasks performed, the performer, the service in which they were performed, and the current status

    newScale Company ConfidentialnewScale

    Conclusions (1)XML extractionJava programming was much more efficient than ETL tools available to usDimensional modelingJunk dimensions, reference dimensions, degenerate dimensions who knew?Subscribe to KimballHire short-term consultants

    newScale Company ConfidentialnewScale

    Conclusions (2)ETL PerformanceSatisfactoryNeed sophisticated scheduler to track all jobsAdhoc Report PerformanceInsufficient testing on the largest user data volumes (IMHO)The software went live on January 31, 2006Ask me in a few months

    newScale Company ConfidentialnewScale

    About the AuthorLeslie Tierstein is a Senior Technical Architect at newScale, Inc, in Foster City California. She can be reached at ltierstein@earthlink.net or leslie.tierstein@newscale.comThis paper is available on line at: http://home.earthlink.net/~ltierstein

Recommended

View more >