betsy blythe and lore balkan virginia tech enterprise data in jail a problem with a solution
TRANSCRIPT
Betsy Blythe and Lore BalkanVirginia Tech
Enterprise Data in Jail
A Problem with a Solution
The Problem
0010111001
– Data structures difficult to understand and inefficient to access for analysis and reports
– Data values change so point-in-time data lost
– Growing backlog of report requests
• Data in “ERP Jail”
The Solution
• Initial charge – Build a data warehouse
• Initial vision – Create business view of administrative data for Virginia Tech
Transactional ERP System
The Solution
Data Warehouse
User
A Data Access Architecture
The Solution
Laying the Foundation
• Staffing
– DBA– Data Administrator– Data Warehouse Architects– Training Coordinators– Web Application Developers
• Other Resources
– Hardware– Software
Laying the Foundation
• Planning
– Surveyed other institutions
– Did site visits and interviews
– Established scope
– Identified first subject area
– Drafted project plan
– Delivered management briefings
Laying the Foundation
• Staff Education and Training
– Data Warehouse Institute
– Ralph Kimball
Getting Started
• Focused on Finance
• Delivered Finance Reports for ERP
• Learned Finance data
• Built relationships and trust
• Evolved a shared vision for warehouse
Building the Data Warehouse
• Strategy
– Build by subject area
– Develop iteratively
– Design for enterprise
Building the Data Warehouse
• Design
– Star Schema
– Time Dimension
– Transaction Detail
– Surrogate Keys
– Conformed Dimensions
– Slowly Changing Dimensions
FactsDimensions
Org Fiscal Qtr Account Amount Encumbered BalanceDept Y 1 Travel $1,000 $0 $60,000Dept Y 1 Supplies $500 $0 $59,500Dept Y 2 Software $10,000 $10,000 $49,500Dept Y 2 Phones $250 $0 $49,250Dept Z 1 Travel $8,000 $0 $100,000Dept Z 1 Supplies $100 $0 $99,900Dept Z 2 Software $80,000 $80,000 $19,900Dept Z 2 Phones $500 $0 $19,400
The Design: Multidimensional
The Design:Dimensions
The Design:Facts
The Design:Star Schema
Conformed Dimensions
PayrollFacts
FinanceFacts
TimeDimension
OrgDimension
EarningsType
Dimension
EmployeeDimension
AccountDimension
FundDimension
3 Techniques
• overwrite changed attribute
• add new dimension record
• use field for ‘old’ value
The Design
Managing Change … Slowly Changing Dimensions
The Design
• Standards
– Names meaningful andstandardized
– Indicators simplify queries
– Code descriptions storedwith codes
– Business descriptions available with data
The Design
• Special Features
– External data may be included
– Derivations and calculations included
– Summary and aggregations may be included
– History is built by design
• Project Agreement
– Signed “blueprint” for the data mart
– Explains sponsorship and roles
– Details data requirements
– Identifies development team
– Identifies pilot users
– Lists key tasks and dependencies
Building the Data Warehouse
Building the Data Warehouse
• Data Mart Development Team *
– 2 IWA developers
– Functional area technical expert
– Functional area business/data expert
– Functional area key user
* Meets Weekly– Meeting Minutes document the process
• Development Process
– Data model design (ERwin)
– Source-to-target mapping
– Business definitions
– ETL development / testing (DataStage)
Building the Data Warehouse
• Development Process
– Data verification
– Process control checks
– Pilot user training
Building the Data Warehouse
• Data Access Strategy
– Stewardship same as ERP
– ERP security definitions leveraged
– Warehouse security built as part of ETL
– Training precedes access
Building the Data Warehouse
The Result
ERPData
Other DataSources
DataWarehouse
ReadyFor
Access &Query
ExtractTransform
Load
RunsEveryNight
ProcessChecks
The Result41 ERP Tables 1 Warehouse Table
EMPLOYEE_STATUS_DIMENSION
Provost’s Request:
Report showing employee id, name, current hire date, gender, ethnicity, rank and tenure for all full-time minority faculty
The Result:Query Example
select spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_desc from spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenrwhere pebempl_empl_status = 'A' and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code
The ERP Query
select ssn_fin_num, current_full_name, salary_hire_date, gender_desc,
ethnicity_desc, rank_desc, tenure_descfrom employee
where current_record_ind = 'Y' and active_employee_ind = 'Y' and faculty_ind = 'Y' and full_time_ind = 'Y' and (gender_code = 'F' or ethnicity_code != '1')
The Warehouse Query
The Result
Human ResourceHuman Resource
General PersonGeneral Person
FinanceFinance
StudentStudent AlumniDevelopment
AlumniDevelopment
Futuredata mart
Futuredata mart
Finance– Operating Ledger– General Ledger– Foundation– Accts Receivable
Human Resource– Employee– Job– Job Funding– Position– Position Allocation– Payroll
Alumni– Alumni Giving
The Result
• Metadata System– Business definitions maintained by Data Experts
– Business definitions stored with the data
– Data models and business definitions on the Web
The Result
• The VT Data Warehouse Users
– 900 Finance
– 400 VT Foundation
– 67 HR
– 10 Alumni
* See fact sheet handout
The Result
• A Data Architecture
– Structured for query
– Access by any ODBC or Oracle client
– Designed to include history
– Focus on the user
– Provides a stable business view of the data
The Result
• Query and Reporting Tools
– Web Enabled / Client Server
– Metadata stored with the data
– Appropriate to the user skill set
– Appropriate to the user need
Lessons Learned
• Functional area sponsorship is critical
• Analysis paralysis can be a problem
• The devil is in the details
• Let the ERP settle first
• Data verification is time consuming
• Canned reports sell the warehouse
• 24/7 availability is expected
• Success breeds demand
Lessons Learned
• Don’t lose sight of the reasons for creatingthe data warehouse
– Empower users to become self-sufficient
– Prevent users from impacting the production system
– Reduce interrupt-driven informationrequests to IS
– Summarize data for trend analysisand data retention
• Hardware – SUN E4500 w/ 8 CPUs, 8 GB RAM, 480 GB Disk Space
• Software – Solaris 2.7 Oracle 8.1.7 Erwin – data models Ascential DataStage – ETL Brio – web reports, ad hoc query SQR – reporting Perl – metadata interface
Resources