the it perspective of business intelligence
DESCRIPTION
The IT Perspective of Business IntelligenceTRANSCRIPT
The IT Perspective: Data Warehousing, Management, and Analytical
Structures
• Explain the basics of:
Master Data Management
Data Warehousing
ETL
OLAP/Multidimensional Data
Objectives
• Install only the ones you need
• Which?
– Integration Services • Get your data from the world outside (ETL)
– Analysis Services • Cubes, Data Mining, support for PowerPivot on SharePoint
– Reporting Services • DIY Report Builder and traditional “big” reports
– Master Data Services • Quality of critical master data (cities, colours, customers)
– Database Engine • Data warehouse and OLTP relational storage
SQL Services – Why?
Master Data Management
• Ensures consistency of data across all organisational uses
• Impacts overall data quality
• Processes and tools for:
– Collection, aggregation, matching, distribution, and persistence of master data • Consistently
– Related to Federated Data Management
• Key to MDM: Modelling
MDM
Why MDM? It’s About Evolution of Enterprise Architecture
MDM Processes
• Batched Acquisition from Staging Tables
• Members, Attributes, Parent-Child Relationships
• SQL Integration Services
Import & Integration
• Versioning Changes
• Auditing
• Compliance
• Tracking of Instances
Modeling • Subscription Views
• Export to:
• Operational Systems
• Data Warehouses
• BI Analytics
• Reporting Tools
Export & Subscription
• Tools:
– Master Data Manager • Primary tool for managing
your master data
– MDS Configuration Manager • IT Pro tool
– MDS Web Service • For developers wanting to
extend MDS
• Concepts:
– Models
– Entities
– Attributes
– Members
– Hierarchies
– Collections
– Versions
– Database
Microsoft Master Data Services SQL 2008 R2 Enterprise, Datacenter, Developer
• Model organises data at highest level
– Allowing versioning of changes to data
• There are typically four categories of models:
– People (Customers, Staff)
– Places (Geographies, Cities, Countries)
– Things (Products)
– Concepts (Accounts, Behaviours, Transactions)
Modelling Master Data
Example: Product MDM Model
Product (model)
Product (entity)
Name (free-form attr)
Code (free-form attr)
Subcategory (domain-
based attr)
Name (free-form attr)
Code (free-form attr)
Category (domain-
based attr)
Name (free-form attr)
Code (free-form attr)
StandardCost (free-form
attr)
ListPrice (free-form
attr)
Photo (file attr)
Data Warehouse
OLE DB
ODBC
DB2 Oracle
XML
SQL Server
Analysis Services
SQL Server
Report Server Models
SQL Server
Data Mining Models
SQL Server
Integration Services
MySAP
Hyperion Essbase
SAP
NetWeaver BI SQL Server
Teradata
Rich Connectivity Data Providers
Star Schema
Star Schema Benefits
• Simple, not-so-normalised model
• High-performance queries
– Especially with Star Join Query Optimisation
• Mature and widely supported
• Low-maintenance
• Define hierarchies using multiple dimension tables
• Support fact tables with varying granularity
• Simplify consolidation of heterogeneous data
Snowflake Dimension Tables
Potential for slower query performance in relational reporting
No difference in performance in Analysis Services database
Fact Table Fundamentals
• Collection of measurements associated with a specific business process
• Specific column types
– Foreign keys to dimensions
– Measures – numeric and additive
– Metadata and lineage
• Consistent granularity – the most atomic level by which the facts can be defined
Fact Table Examples
Day Grain
Quarter Grain
Reseller sales data by:
•Product
•Order Date
•Reseller
•Employee
•Sales Territory
Sales quota data by:
•Employee
•Time
• Most common dimension used in analysis (aka Time dimension)
• Use consistently with all facts
• Useful common attributes – Year, Quarter, Month, Day
– Time series analysis support
– Navigation and summarisation enabled with hierarchies, such as calendar or fiscal
• Single table design (typically not snowflake design)
Date Dimension Table
Tip: Format the key of the dimension as yyyymmdd (e.g. 20100115) to make it readily understandable
Parent-Child Hierarchy
• A dimension that contains a parent attribute
• A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table
• Common examples
– Organisational charts
– General Ledger structures
– Bill of Materials
Parent-Child Hierarchy Example
Brian
Amy
Stacia
Stephen
Shu Michael
Peter
José
Syed
Slowly Changing Dimensions
• Maintain historical context as dimension data changes
• Three common ways (there are more):
– Type 1: Overwrite the existing dimension record
– Type 2: Insert a new ‘versioned’ dimension record
– Type 3: Track limited history with attributes
Integration and ETL
Let’s do ETL with SSIS
• SQL Server Integration Services (SSIS) service
• SSIS object model
• Two distinct runtime engines:
– Control flow
– Data flow
• 32-bit and 64-bit editions
• The basic unit of work, deployment, and execution
• An organised collection of: – Connection managers
– Control flow components
– Data flow components
– Variables
– Event handlers
– Configurations
• Can be designed graphically or built programmatically
• Saved in XML format to the file system or SQL Server
The Package
Control Flow
• Control flow is a process-oriented workflow engine
• A package contains a single control flow
• Control flow elements
– Containers
– Tasks
– Precedence constraints
– Variables
• The Data Flow Task
– Performs traditional ETL and more
– Fast and scalable
• Data Flow Components
– Extract data from Sources
– Load data into Destinations
– Modify data with Transformations
• Service Paths
– Connect data flow components
– Create the pipeline
Data Flow
demos Using SQL Server Integration Services
Objectives:
Gain an overview understanding of UI and approach of Integration Services
Appreciate additional capability and functionality
demos What did we see?
How to build up SSIS tasks
How to deploy SSIS packages and deliver comprehensive ETL
OLAP/Multidimensional Data
• Multidimensional data
• Combination of measures and dimensions as one conceptual model
– Measures are sourced from fact tables
– Dimensions are sourced from dimension tables
Cube = Unified Dimensional Model
• Members from tables/views in a data source view (based on a Data Warehouse)
• Contain attributes matching dimension columns
• Organise attributes as hierarchies
– One All level and one leaf level
– User hierarchies are multi-level combinations of attributes
– Can be placed in display folders
• Used for slicing and dicing by attribute
Dimensions
• Benefits
– View of data at different levels of summarisation
– Path to drill down or drill up
• Implementation
– Denormalised star schema dimension
– Normalised snowflake dimension
– Self-referencing relationship
Hierarchies
• Defined in Analysis Services
• Ordered collection of attributes into levels
• Navigation path through dimensional space
• Very important to get right!
Hierarchy
Customers by Geography
Country
State
City
Customer
Customers by Demographics
Marital
Gender
Customer
• Group of measures with same dimensionality
• Analogous to a fact table
• Cube can contain more than one measure group
– E.g. Sales, Inventory, Finance
• Defined by dimension relationships
Measure Group
Sales Inventory Finance
Customers X
Products X X
Time X X X
Promotions X
Warehouse X
Department X
Account X
Scenario X
Measure Group
Measure Group D
ime
ns
ion
• Define interaction between dimensions and measure groups
• Relationship types
– Regular
– Reference
– Fact (Degenerate)
– Many-to-many
– Data mining
Dimension Relationships
• Expressions evaluated at query time for values that cannot be stored in fact table
• Types of calculations
– Calculated members
– Named sets
– Scoped assignments
• Calculations are defined using MDX
Calculations
MDX = MultiDimensional EXpressions
demos Using BIDS to Review Dimension Design
Cube Design and Functionality
Objectives:
Introduction to Business Intelligence Development Studio
Illustrate Cube concepts and design, editing and deployment of Analysis
Services
demos What did we see?
How to design and build SSAS databases
How to deploy SSAS databases
• As a platform for enterprise Business Intelligence you should consider four services:
– Data Warehouse (can be relational)
– Process for Data Management (MDS)
– Process for Data Integration (ETL)
– Analysis (OLAP, Data Mining, Columnar)
= SQL Server 2008 R2
Summary
© 2010 Microsoft Corporation & Content and Code Ltd. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Content and Code. The material presented is not certain and may vary based on several factors. Microsoft makes no warranties, express, implied or statutory, as to the information in this presentation. © 2010 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Content and Code as of the date of this presentation. Because Content and Code & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Content and Code cannot guarantee the accuracy of any information provided after the date of this presentation. Content and Code no warranties, express, implied or statutory, as to the information in this presentation. E&OE.