a weapon in your competitive arsenal: the data warehouseok-air.org/documents/2005...

27
1 A Weapon in Your Competitive Arsenal: The Data Warehouse John Rome, Arizona State University 2005 Fall Conference Agenda Quiz Background Define Data Warehousing Discuss Latest Buzzwords Demo of Actual Data Warehouse Lessons Learned and Some Advice Demo/Questions/Discussion Later Today… Data Mining, Dashboard and Data Quality

Upload: dinhkhue

Post on 07-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

1

A Weapon in Your Competitive Arsenal:The Data Warehouse

John Rome, Arizona State University

2005 Fall Conference

Agenda• Quiz• Background• Define Data Warehousing• Discuss Latest Buzzwords• Demo of Actual Data Warehouse• Lessons Learned and Some Advice• Demo/Questions/Discussion• Later Today…

– Data Mining, Dashboard and Data Quality

Page 2: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

2

Quiz--Truth or Urban Legend?1. Pizza Hut knows your favorite toppings,

what you ordered last and whether you like salad with your meat lover's pie?

2. Ekco sells more turkey basters during Christmas than Thanksgiving?

3. 4 wheel drive Green Subarus outsell Blue by a wide margin, except Wisconsin?

4. Walmart increases sales by placing diapers and beer next to each other?

About Arizona State University

• Located in Phoenix Metropolitan• 61,033 Students • 5,393 Full-Time Administrative Staff• 2,165 Full-Time Faculty• Awarded Research I Status in 1994• “New American University”• http://www.asu.edu

Page 3: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

3

`

“One University, Many Places”

Page 4: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

4

About ASU’s Data Administration • Reports to the President’s Office • 5 Professional Staff, 4 Support Staff• Mission: Data Access, Data Quality, and Data

Education• Supports Centralized/Decentralized Initiatives• Data Warehouse is “full-employment”• In Preliminary ERP discussions• Close ties with IR office• http://www.asu.edu/data_admin

Page 5: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

5

Warehousing Was and Still is Hot...• $8B Industry• 90% of CIOs claim to be developing (Meta Group, 1998)

with 99.9% today• Higher Education Institutions are building

them• Keynotes at IR conferences!!• Chapters in college textbooks• Amazon.com barometer

(Over 100 books on warehousing)

So Hot...Even Dilbert is talking about them!!

apologies to Scott Adams!!

WAREHOUSE

Page 6: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

6

What is a Data Warehouse?

• SUBJECT-ORIENTED• INTEGRATED• TIME-VARIANT• NON-VOLATILE

collection of data in support of management’s decision making process.

-Bill Inmon

Some More DefinitionsA copy of transaction data specifically structured for query and analysis.

A single, integrated store of corporate data which provides the infrastructuralbasis for informational applications in the enterprise.

-Ralph Kimball

-S.G.Kelly

Page 7: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

7

?

My Definition…

Age: 2 8 42 66Weight: 35 85 205 190Net Worth: $0.00 $52.00 $X90,000 30 Million

“A Database with Snapshots of Data Dedicated for Reporting Purposes”

Page 8: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

8

Why All the Fuss About Warehousing?• Powerful Data Source for Reporting • Fills in Gaps Left by Operational Systems• Integrates Data from Silo Systems• Both Strategic and Tactical• Keeps Historical Data• Assists Longitudinal Studies• Helps Assessment and Retention• Becoming Mission Critical to Organizations!

How Is a Warehouse Different?

• data is read-only• managed redundancy• serves management• “time fixed” data• “what if” processing• historical trends• response… minutes

• data is updated• minimal redundancy• serves operational users• “current value” data• repetitive processing• limited history• response… seconds

WarehouseOLTP

Page 9: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

9

OTHERSOURCES

MAINFRAME

MVS/ESA

LEGACY SYSTEM

(DB2/IDMS)

SQL/ODBC

SQL/ODBCNT

WEB SERVER

ASPCOLD FUSION

UNIXWEB SERVER

JAVA

SQL/JDBC

UNIX

SQL/”Native”

Data Warehouse

Sample Warehouse Architecture

Some “BI” BuzzwordsOLAP

MOLAP

ROLAP

Metadata

ReplicationAggregation

Star Schema Multi-dimensional

Facts/Dimensions

Bit-Mapped IndexingDrill-Down

Transformation Tools (ETL)

De-Normalized

Snowflake Schema

Operation Data Store (ODS)

XML

Data Mining

Data Quality Business Intelligence

DashboardsSQL

Data Mart

Page 10: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

10

What is a Data Mart?A data mart is often a very focused slice of a larger data warehouse.

Data Warehouse vs. Data Mart Data Warehouse Data Mart Scope Enterprise

Specific business process

Data Perspective

Historical data Some summary Lightly denormalized

Current (some history)Highly denormalized

Data Subjects 20-30 tables (each subject area) Multiple subjects

5-10 tables Single subject area

Ability to Integrate

Highly integrated Some/little integration

Time to Build

12-18 months 2-8 months

Characteristics Flexible Strategic Durable

Restrictive Tactical Focused

Page 11: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

11

What is SQL?

SELECT *FROM CONFERENCE_ATTENDEESWHERE LAST_NAME = ‘HALE’ ANDFIRST_NAME = ‘LESLIE’

SQL. Stuctured Query Language (pronounced sequel). The Lingua Franca of Data Access in Relational Databases. It is used to build queries to be performed against Data Warehouses.

Tools Are Doing the Dirty Work

Page 12: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

12

End User Access Tools

-Gartner Group

-Keith Gile, Forrester

What is ETL? • Tool or process used to move data from

one system/DB to another system/DB• Over 100 ETL tools on market, about 10

serious contenders• Range from Free - $750K• Better ones may be cost-prohibitive• Database often has bulk load utilities• Sometimes its E.L.T. (Load data 1st after

extract and then transform with programs or stored procedures after load )

Page 13: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

13

ETL Example

What is A Data Model?• graphical representation that identifies

the information needs of the business. A data driven, versus function (or process) based view of an organization.

takes

is offered by

offersis identified by

CLASS MEETING TIME

CLASS

CAMPUS

COURSE COLLEGE

STUDENT

COURSE CATALOG

Page 14: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

14

Warehouse Modeling Techniques?#1 Dimensional Modeling

(Star Join Schema)#2 Tabular Modeling

(E/R Denormalized)

takes

is offered by

offersis identified by

CLASS MEETING TIME

CLASS

CAMPUS

COURSE COLLEGE

STUDENT

COURSE CATALOG

Page 15: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

15

What Makes A Good Data Model?

• Completeness• Simplicity • No redundancy (OLTP)• Enforcement of Business Rules• Data Reusability• Stability and Flexibility• Communication Effectiveness

Some Design Guidelines• Add element of time to the tables• Appropriately name tables, attributes,

views• Add derived fields when necessary• Make sure data integrates• Consider security and privacy in design• Consider performance (indexes, etc.)• Make sure data model can answer the

critical business questions

Page 16: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

16

Display Your Model Proudly...

takes

is offered by

offersis identified by

CLASS MEETING TIME

CLASS

CAMPUS

COURSE COLLEGE

STUDENT

COURSE CATALOG

“Mona Lisa” “Wall Ware” “American Gothic”

Demo Time

• Ad Hoc Quer(ies) using BI Tool• Retention Application using Web

Page 17: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

17

About ASU’s Data Warehouse• 10 years in the making• Major subject areas (Student, HR,

Financial)• Supports over 1500 users• “Poor Man’s Repository” for definitions• Source of data – multiple operational

systems• Mission Critical to University

ASU’s Warehouse Vital StatisticsUsers: 1900+ loginsVolume: 50+ gigabytesApproach: EnterpriseDatabase: Sybase Adaptive ServerServer: Sun/UNIXDesktop: Brio (now Hyperion), MS-AccessWeb: ASP, Java, Cold Fusion

Page 18: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

18

ASU’s Warehouse Subject AreasPRIMARY SUPPORT SPECIAL

CENSUS

FINANCIAL

STUDENTFEES

HUMAN_RESOURCESTUDENT

RESEARCH

COURSE

FINANCIALAID

March 1, 2002

TRAINING(*)

DICTIONARY

LOOKUP

PERSON DIRECTORY_SERVICES

USER_TABLE

SRCDARS

FACILITY STUDENT_RETENTION

WAREHOUSE_ADMIN PROXY DBs

PROPERTY ETC.

OTHERSOURCES

MAINFRAME

MVS/ESA

LEGACY SYSTEM

(DB2/IDMS)

SQL/ODBC

SQL/ODBCNT

WEB SERVER

ASPCOLD FUSION

UNIXWEB SERVER

JAVA

SQL/JDBC

UNIX

SQL/”Native”

Data Warehouse

Sample Warehouse Architecture

Page 19: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

19

Lessons LearnedFrom the Home Office in Tempe, Arizona

Have a Historical Data Plan• Need Ability to Compare Data Over Time• Decide how Far Back or how many Years

of Data to Keep• “Census” Snapshots are a Must• Fiscal Year, Calendar Year, Semester or

Term, Pay-Period Data

#10

Page 20: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

20

DQ Isn’t as Good as You Think It Is

#9

It’s good, it’s bad,and it’s ugly!

It’s good, it’s still bad,and it’s still ugly!

Costs Shifting to the Customer• Faster PCs• Printers• Ethernet/Web Connection• Middleware?• Data Access Software

– Client/Server Application or Plug-in

#8

Page 21: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

21

User Involvement Critical

Strike 1 Finding users with free timeStrike 2 Different business users may

have conflicting ideas of what they want

Strike 3 Users often don’t know what they really want

Factors Making it Tough

#7

Data Definitions Are Important

#6

Page 22: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

22

Security & Privacy Still a Big Issue

• Careful design of the Data Warehouse helps security

• Variety of ways to implement security• All users must take responsibility for

security/privacy (train them!!!)• Security costs money to implement

#5

(Even if the data is read-only)

Web Solution is a Must • Internet has become more reliable• Offers Quick delivery of vital information• Reduction of access and communication

cost (IT overhead)• Ability to reach an expanded audience• No software, just a browser in many cases

“Because it requires minimal training and reduces IT overhead, the Web is becoming the de facto warehouse access platform.”-Wayne Eckerson

#4

Page 23: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

23

Web Will Win Out

Training Investment Pays Dividends• Recognized, but Often

not Funded• Rely on “Data

Trustee”/expert for support

• Standardize on One Tool

• Tool Training is easy, Data Training is Tough!

#3

“Be suspicious of your results!

Page 24: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

24

Need Support Structure in Place

• Need to move from pilot phase to production

• Treat system as “mission critical”• ASU Solutions

– “ware-q” e-mail– Warehouse User’s Group (WUG)– 1-800-what-now (just kidding!)

#2

Users Do Amazing Things...• Persistence Studies• Faculty Workload• Web-erize Reports• Create Pseudo-operational

systems to fill data gaps• Create Personal Letters

(encouraging Students to register)• Get Data for Legislative support• Find tutors

#1

Page 25: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

25

The Data Warehouse...Helps us do our business better

Inconvenience Store Problem Solved

Here’s the Data

Page 26: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

26

Another Quiz1. Phoenix isn't a good place for selling golf

clubs, despite # of golf courses.2. Motorcycle owners (picture Hells Angels

riders) usually rank within the highest income bracket.

3. Best Selling Women’s Shoe Size in 1986 and 2003?

4. What are Walmart’s Top 5 “Affinity” Sales with a Miller Lite 6 PK?

Shoes Size

Page 27: A Weapon in Your Competitive Arsenal: The Data Warehouseok-air.org/documents/2005 Fall/F05_ROMEDataWarehouse.pdf · Snowflake Schema Operation Data Store (ODS) XML ... • Mission

27

• Mar’s M&Ms Peanuts• Mar’s M&Ms Plain• Beefy Cigars 2 PK• Marlboro Regular PK• Cert’s Wintergreen Roll Candy

Affinity Sales

Questions