isqs 3358, business intelligence dimensional modeling zhangxi lin texas tech university 1 1

62
ISQS 3358, Business Intelligence ISQS 3358, Business Intelligence Dimensional Modeling Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Upload: alberta-strickland

Post on 01-Jan-2016

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

ISQS 3358, Business IntelligenceISQS 3358, Business Intelligence

Dimensional ModelingDimensional ModelingZhangxi LinTexas Tech University

11

Page 2: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

OutlineOutlineData Warehousing ApproachesDimensional ModelingData Warehousing with Microsoft SQL

Server 2005Case: Adventure Works Cycles (AWC) : Data Warehouse Design Phases

2

Page 3: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Data Warehousing Data Warehousing ApproachesApproaches

3

Page 4: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

4

Data Warehouse Data Warehouse Development ApproachesDevelopment Approaches

Data warehouse development approaches

◦ Inmon Model: EDW approach ◦ Kimball Model: Data mart approach

Which model is better?◦ There is no one-size-fits-all strategy to

data warehousing ◦ One alternative is the hosted warehouse

Page 5: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

General Data Warehouse General Data Warehouse Development ApproachesDevelopment Approaches

“Big bang” approach

Incremental approach:◦Top-down incremental approach◦Bottom-up incremental approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 5

Page 6: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

““Big Bang” ApproachBig Bang” Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 6

Analyze enterpriserequirements

Build enterprisedata warehouse

Report in subsets orstore in data marts

Page 7: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Incremental Approach Incremental Approach to Warehouse Developmentto Warehouse Development Multiple iterations Shorter implementations Validation of each phase

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7

Strategy

Definition

Analysis

Design

Build

Production

Increment 1

Iterative

Page 8: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Top-Down ApproachTop-Down Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 8

Analyze requirements at the enterprise level

Develop conceptual information model

Identify and prioritize subject areas

Complete a model of selected subject area

Map to available data

Perform a source system analysis

Implement base technical architecture

Establish metadata, extraction, and load processes for the initial subject area

Create and populate the initial subject area data mart within the overall warehouse

framework

Page 9: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Bottom-Up ApproachBottom-Up Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 9

Define the scope and coverage of the data warehouse and analyze the source systems within this scope

Define the initial increment based on the political pressure, assumed business benefit and data volume

Implement base technical architecture and establish metadata, extraction, and load processes as required by increment

Create and populate the initial subject areas within the overall warehouse framework

Page 10: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Dimensional Dimensional ModelingModeling

10

Page 11: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

11

Dimensional ModelDimensional Model Also called star schema

◦ Fact table is in the middle and dimensions serving as the points on the star.

◦ A normalized fact table plus denormalized dimension tables Facts

◦ Measurements associated with a specific business process.◦ Most facts are additive (calculative); others are semi-additive,

non-additive, or descriptive (e.g. factless fact table).◦ Many facts can be derived from other facts. So, non-additive

facts can be avoided by calculating it from additive facts. Grain

◦ The level of detail contained in the fact table◦ The lowest level of detail is called atomic fact table

11

Page 12: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

12

DimensionsDimensions The foundation of the dimensional model to describe

the objects of the business The nouns of the DW/BI system

◦ Business processes (facts) are the verbs of the business

Dimension tables link to all the business processes. A dimension shared across all processes is called

conformed dimension The analysis involving data from more than one

business process is called drill-across.

12

Page 13: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

13

Data CubeData Cube Data cubes are multidimensional extensions of 2-D

tables, just as in geometry a cube is a three-dimensional extension of a square. The word cube brings to mind a 3-D object, and we can think of a 3-D data cube as being a set of similarly structured 2-D tables stacked on top of one another.

Data cubes aren't restricted to just three dimensions. Most OLAP systems can build data cubes with many more dimensions allows up to 64 dimensions.

In practice, we often construct data cubes with many dimensions, but we tend to look at just three at a time. What makes data cubes so valuable is that we can index the cube on one or more of its dimensions.

13

Page 14: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

14

Determining GranularityDetermining Granularity

YEAR?

QUARTER?

MONTH?

WEEK?

DAY?

Page 15: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

15

Star Schema ModelStar Schema Model

15

Product TableProduct_idProduct_disc,...

Time TableDay_idMonth_idYear_id,...

Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units, ...

Item TableItem_idItem_desc,...

Store TableStore_idDistrict_id,...

Central fact table

Denormalizeddimensions

Page 16: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

16

Snowflake Schema ModelSnowflake Schema Model

16

Time TableWeek_idPeriod_idYear_id

Dept TableDept_id

Dept_descMgr_id

Mgr TableDept_idMgr_id

Mgr_name

Product TableProduct_id

Product_desc

Item TableItem_id

Item_descDept_id

Sales Fact TableItem_idStore_idProduct_idWeek_id

Sales_amountSales_units

Store TableStore_idStore_descDistrict_id

District TableDistrict_idDistrict_desc

Page 17: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

17

Snowflake Schema ModelSnowflake Schema Model

◦Direct use by some tools◦More flexible to change◦Provides for speedier data loading◦Can become large and unmanageable◦Degrades query performance◦More complex metadata

17

Country State County City

Page 18: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Dimensional Modeling Dimensional Modeling ProcessProcessHigh level dimensional model design

◦ Choosing business model◦ Declaring the grain◦ Choosing dimensions◦ Identifying the facts

Detailed dimensional model developmentDimensional model review and validation

◦ IS◦ Core users◦ Business community

Final design iteration

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 18

Page 19: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

19

Example: Commrex Real Estate Data Example: Commrex Real Estate Data WarehousingWarehousing Analytic themes

◦ How to encourage realtors to use the online ASP services Value Chain

◦ Listors create their account◦ Listors post their real estate properties to the web-based

database services and pay listing fees◦ Property buyers search the website-based database and buy

properties from listors. This is the incentive for listors to use the ASP services

Business Processes◦ Listor sign up◦ Listor account management◦ Property data posting◦ Property search◦ Property database maintenance

19

Page 20: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 20

Property ID

Listor ID Listor ID

Address

Property Type

City

Company ID

Chapter

Functions

Specializations

Comp Name

Address

Telephone #

Listor Name

Chapter

Feature

Property Type

Subtype 1

Type Name

Subtype 2

Subtype n

M:1

M:M

M:M

Primary Key

Secondary Key

Link to a table

Legends

Property Listing Database Membership Database

IMW’s Database ERD Model

Company ID

Page 21: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 21

Property ID

Listor ID Listor ID

Address

Prop SubType

City

Company ID

Chapter

Functions

Specializations

Company ID

Address

Telephone #

Listor Name

Chapter

Feature

Prop SubType

Property Type

SubType Name

Primary Key

Secondary Key

Link to a table

Legends

Property Listing Fact Membership Dimension

IMW’s Data Warehouse Dimensional Model

Company Dimension

Property SubTypeDimension

Comp Name

Property Type

Type NameProperty TypeDimension

Page 22: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Data Warehousing Data Warehousing with Microsoft SQL with Microsoft SQL Server 2005Server 2005

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 22

Page 23: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Unified Dimensional Model Unified Dimensional Model (UDM)(UDM) A SQL Server 2005 technology A UDM is a structure that sits over the top of a data

mart and looks exactly like an OLAP system to the end user.

Advantages◦ No need for a data mart. ◦ Can be built over one or more OLTP systems. ◦ Mixed data mart and OLTP system data◦ Can include data from database from other vendors

and XML-formatted data◦ Allows OLAP cubes to be built directly on top of

transactional data◦ Low latency◦ Ease of creation and maintenance

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 23

Page 24: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Microsoft BI ToolsetMicrosoft BI Toolset Relational engine (RDBMS)

◦ T-SQL◦ .NET Framework Command Language Runtime (CLR)

SQL Server Integration Services (SSIS) – ETL◦ Data Transformation Pipeline (DTP)◦ Data Transformation Runtime (DTR)

SQL Server Analysis Service (SSAS) – queries, ad hoc use, OLAP, data mining◦ Multi-Dimensional eXpressions (MDX) – a scripting language for data

retrieval from dimensional database ◦ Dimension design◦ Cube design◦ Data mining

SQL Server Reporting Services (SSRS) – ad hoc query, report building

Microsoft Visual Studio .NET is the fundamental tool for application development

24

Page 25: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Structure and Components Structure and Components of Business Intelligenceof Business Intelligence

25

SSMSSSMS SSISSSISSSASSSAS

SSRSSSRS

SASEM

SASEM

SASEG

SASEG

MS SQL Server 2005MS SQL Server 2005

BIDS

Page 26: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Understanding the Cube Understanding the Cube Designer Tabs Designer Tabs Cube Structure: Use this tab to modify the architecture of a cube. Dimension Usage: Use this tab to define the relationships between

dimensions and measure groups, and the granularity of each dimension within each measure group.

Calculations: Use this tab to examine calculations that are defined for the cube, to define new calculations for the whole cube or for a subcube, to reorder existing calculations, and to debug calculations step by step by using breakpoints.

KPIs: Use this tab to create, edit, and modify the Key Performance Indicators (KPIs) in a cube.

Actions: Use this tab to create or modify drillthrough, reporting, and other actions for the selected cube..

Partitions: Use this tab to create and manage the partitions for a cube. Partitions let you store sections of a cube in different locations with different properties, such as aggregation definitions.

Perspectives: Use this tab to create and manage the perspectives in a cube. A perspective is a defined subset of a cube, and is used to reduce the perceived complexity of a cube to the business user.

Translations: Use this tab to create and manage translated names for cube objects, such as month or product names.

Browser: Use this tab to view data in the cube.

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 26

Page 27: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Case: Adventure Case: Adventure Works Cycles (AWC) Works Cycles (AWC)

27

Page 28: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Case: Adventure Works Case: Adventure Works Cycles (AWC)Cycles (AWC)A fictitious multinational

manufacturer and seller of bicycles and accessories

Based on Bothell, Washington, USA and has regional sales offices in several countries

http://www.msftdwtoolkit.com/

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 28

Page 29: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Basic Business Basic Business InformationInformationProduct orders by categoryProduct Orders by

Country/RegionProduct Orders by Sales ChannelCustomers by Sales Channel

Snapshot

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 29

Page 30: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

AWC Business Requirements - AWC Business Requirements - Interview summary Interview summary Interviewee: Brian Welker, VP of Sales Sales to resellers: $37 million last year 17 people report to him including 3 regional sales managers Previous problem: Hard to get information out of the

company’s system Major analytic areas:

Sales planning Growth analysis Customer analysis Territory analysis

Sales performanceBasic sales reporting Price listsSpecial offersCustomer satisfaction International support

Success criteria Easy data access, Flexible reporting and analyzing, All data in one

place What’s missing? – A lot – No indication of business value

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 30

Page 31: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Business ProcessesBusiness ProcessesPurchase OrdersDistribution Center Deliveries Distribution Center InventoryStore DeliveriesStore InventoryStore Sales

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 31

Page 32: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Analytic ThemesAnalytic ThemesSee the Excel file

AW_Analytic_Themes_List.xls

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 32

Page 33: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

AWC’s Bus MatrixAWC’s Bus Matrix

  Dimensions

Business Process

Da

te

Pro

du

ct

Em

plo

yee

Cu

stom

er (R

ese

ller)

Cu

stom

er (In

tern

et)

Sa

les T

errito

ry

Cu

rren

cy

Ch

an

ne

l

Pro

mo

tion

Ca

ll Re

aso

n

Fa

cility

Sales Forecasting X X X X X X X        

Orders X X X X X X X X X    

Call Tracking X X X X X X       X  

Returns X X   X X X X X   X

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 33

Page 34: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Prioritization GridPrioritization Grid

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 34

Orders

OrdersForecast

CallTracking

ExchangeRates

ReturnsManufacturingCosts

CustomerProfitability

ProductProfitability

FeasibilityHighLow

High

Low

BusinessValue / Impact

Page 35: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Exercise 2 – A quick walk Exercise 2 – A quick walk through an SSAS through an SSAS applicationapplication Learning Objectives

◦ How to design a data source view with SSAS based on an existing data warehouse

◦ How to design and deploy a cube. Tasks

◦ Analysis Service Tutorial Lesson 1: Defining a Data Source View within an Analysis Services Project

◦ Analysis Service Tutorial Lesson 2: Defining and Deploying a Cube

Deliverable: ◦ A Word file with the screenshot of the star schema

emailed to [email protected]◦ The subject of the email is: “ISQS 3358 Exercise 2”

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 35

Page 36: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

Supplemental Slides : Supplemental Slides : Data Warehouse Design Data Warehouse Design Phases Phases

36

Page 37: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

37

Data Warehouse Database Data Warehouse Database Design PhasesDesign Phases

Phase 1: Defining the business modelPhase 2: Defining the dimensional modelPhase 3: Defining the physical model

37

Page 38: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

38

Phase 1: Defining the Business Phase 1: Defining the Business ModelModel

◦Performing strategic analysis◦Creating the business model◦Documenting metadata

38

Page 39: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

39

Performing Strategic AnalysisPerforming Strategic Analysis

Identify crucial business processes Understand business processes Prioritize and select the business processes to

implement

39

BusinessBenefit

Low High

Low

High

Feasibility

Page 40: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

40

Creating the Business ModelCreating the Business Model

Defining business requirements:◦ Identifying the business measures◦ Identifying the dimensions◦ Identifying the grain◦ Identifying the business definitions and rules

Verifying data sources

40

Page 41: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

41

Business Requirements Drive Business Requirements Drive the Design Processthe Design Process

◦Primary input

◦Secondary input

Existing Metadata Production ERD Model

BusinessRequirements

Research

41

Page 42: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

42

Identifying MeasuresIdentifying Measuresand Dimensionsand Dimensions

The attribute varies continuously:◦ Balance◦ Units Sold◦ Cost◦ Sales

The attribute is perceived as constant or discrete:◦ Product◦ Location◦ Time◦ Size

42

Measures

Dimensions

Page 43: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

43

Using a Business Process Using a Business Process MatrixMatrix

43

Sample of business process matrix

Business Dimensions

Business Processes

Sales Returns Inventor

y

Customer

Date

Product

Channel

Promotion

Page 44: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

44

Determining GranularityDetermining Granularity

44

YEAR?

QUARTER?

MONTH?

WEEK?

DAY?

Page 45: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

45

Identifying Business RulesIdentifying Business Rules

45

Store

Store > District > Region

Location

Geographic proximity

0 - 1 miles1 - 5 miles > 5 miles

Product

Type Monitor Status

PC 15 inch NewServer 17 inch Rebuilt

19 inch CustomNone

Time

Month > Quarter > Year

Page 46: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

46

Documenting MetadataDocumenting MetadataDocumenting metadata should include:

◦Documenting the design process◦Documenting the development process◦Providing a record of changes ◦Recording enhancements over time

46

Page 47: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

47

Metadata Documentation Metadata Documentation ApproachesApproaches

◦Automated Data modeling tools ETL tools End-user tools

◦Manual

47

Page 48: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

48

Phase 2: Defining the Phase 2: Defining the Dimensional ModelDimensional Model

◦Identify fact tables: Translate business measures into fact tables Analyze source system information for additional

measures◦Identify dimension tables◦Link fact tables to the dimension tables◦Model the time dimension

48

Page 49: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

49

Star Dimensional ModelingStar Dimensional Modeling

49

Store TableStore_id

District_id...

Item TableItem_id

Item_desc...

Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units...

Product TableProduct_idProduct_desc

...

Time TableDay_id

Month_idPeriod_idYear_id

Page 50: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

50

Fact Table CharacteristicsFact Table Characteristics

◦Contain numerical metrics of the business◦Can hold large volumes of data◦Can grow quickly◦Can contain base, derived,

and summarized data◦Are typically additive◦Are joined to dimension

tables through foreign keys that reference primary keys in the dimension tables

50

Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units...

Page 51: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

51

Dimension Table Dimension Table CharacteristicsCharacteristics

Dimension tables have the following characteristics: ◦ Contain textual information that represents the

attributes of the business◦ Contain relatively static data◦ Are joined to a fact table through

a foreign key reference

51

Page 52: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

52

Star DimensionalStar DimensionalModel CharacteristicsModel Characteristics

◦ The model is easy for users to understand.◦ Primary keys represent a dimension.◦ Nonforeign key columns are values.◦ Facts are usually highly normalized.◦ Dimensions are completely denormalized.◦ Fast response to queries is provided.◦ Performance is improved by reducing table joins.◦ End users can express complex queries.◦ Support is provided by many front-end tools.

52

Page 53: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

53

Using Time in the Data Using Time in the Data WarehouseWarehouse

◦Defining standards for time is critical.◦Aggregation based on time is complex.

53

Page 54: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

54

The Time DimensionThe Time Dimension Time is critical to the data warehouse. A consistent

representation of time is required for extensibility.

54

Where should the element of time be stored?

Timedimension

Sales fact

Page 55: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

55

Using Data Modeling ToolsUsing Data Modeling Tools

◦ Tools with a GUI enable definition, modeling, and reporting.

◦ Avoid a mix of modeling techniques caused by: Development pressures Developers with lack of knowledge No strategy

◦ Determine a strategy.◦ Write and publish formally.◦ Make available electronically.

55

Page 56: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

56

Phase 3: Defining the Phase 3: Defining the Physical ModelPhysical Model

Why◦ Huge amount of data must be effectively processed

and retrieved in realtime. How

◦ Translate the dimensional design to a physical model for implementation.

◦ Define storage strategy for tables and indexes.◦ Perform database sizing.◦ Define initial indexing strategy.◦ Define partitioning strategy.◦ Update metadata document with physical information.

56

Page 57: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

57

Storage and Performance Storage and Performance ConsiderationsConsiderations

Database sizingData partitioningIndexingStar query optimization

57

Page 58: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

58

Database Sizing - Test Load Database Sizing - Test Load SamplingSampling

Analyze a representative sample of the data chosen using proven statistical methods.

Ensure that the sample reflects:◦Test loads for different periods◦Day-to-day operations◦Seasonal data and worst-case scenarios◦ Indexes and summaries

58

Page 59: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

59

Data PartitioningData Partitioning

Breaking up of data into separate physicalunits that can be handled independently

Types of data partitioning ◦ Horizontal partitioning. ◦ Vertical partitioning

59

Page 60: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

60

IndexingIndexing

Indexing is used for the following reasons:◦ It is a huge cost saving, greatly

improving performance and scalability.

◦ It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.

60

Page 61: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

61

ParallelismParallelism

61

Parallel Execution Servers

Sales table

Customerstable

P3

P3

P1

P1

P2

P2

Page 62: ISQS 3358, Business Intelligence Dimensional Modeling Zhangxi Lin Texas Tech University 1 1

62

Using Summary DataUsing Summary Data

Designing summary tables offers the following benefits:◦Provides fast access to precomputed data◦Reduces use of I/O, CPU, and memory

62