corporate data vault data warehousing workshop sept. 23 2015 data warehousing workshop sept. 23 2015

28
Corporate Data Vault Data Warehousing Workshop Sept. 23 2015

Upload: lee-phillips

Post on 19-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Corporate Data Vault

Data Warehousing WorkshopSept. 23 2015

Page 2: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Background to CDV Project

• Feb 2012 – Review of Corporate Data Model published

• Apr 2012 – Technical group set up

• Dec 2012 – Proposal for CDV sent to SMC

Page 3: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

The ProposalOption 1File Store

Option 2File Store with Direct

Access

Option 3Database tables

Option 4Data warehouse

Data stored in the same format as lodged by the data custodian;

Data retrieved only through the front-end application and copied to local work space.

Data stored in the same format as lodged by the data custodian;

Data can be accessed directly by third party products (e.g. SAS).

Data converted and stored in database table with similar structure to source;

Database tables can be accessed directly by third party products (e.g. SAS).

Data converted and stored in standardised relational database tables;

Database tables can be accessed directly by third party products (e.g. SAS).

Page 4: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Pros Cons

Option 1: File Store Simplest concept Lowest development effort

No direct access with 3rd party products

Possible proliferation of copies of files in local work areas

Long term usability of data more difficult to manage

Option 2: File Store Direct Access

Simple concept Provides direct access to data

Security more difficult to manage than for database options

Long term usability of data more difficult to manage

Option 3: Database tables

Provides direct access to data Data stored in single platform Easier to manage long term

usability issues

Data transformed from original format – transformed data may need validation

Option 4: Data warehouse

Provides direct access to data Standardized data in relational

databases Enables easier linkages

between data Opportunities to build other

applications on the warehouse

Data transformed from original format – transformed data may need validation

Difficult to design and build Business effort high as data

standardization required

Page 5: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Project Stage 1Two Prototypes

• Early 2013, the SMC requested that working prototypes of both Option 2 and 3 be developed

• Prototypes were designed, built & tested between June and Oct 2013

• A recommendation on the optimal solution was submitted to the SMC in Nov 2013.

Page 6: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Design, Build and Assessment In-scope Out of Scope

Focus of system development Produce a working system Final screen designs

Functions of the system (1) Lodging data & metadata(2) Storing data & metadata(3) Viewing of catalogue

(1) Security(2) Reports

Testing of system Testing to focus primarily on the “happy path”. Only major bugs and issues to be addressed.

Robust testing of the system

File Types SAS files only as (1) High risk (2) Benefit of variable metadata available within the file (3)Structured nature provided suitable test for both prototypes

All other file types

Page 7: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Issues with Database Prototype Issue Impact on Database Prototype

Unable to distinguish between a date and a date/time variable in a SAS dataset

SAS dataset is rejected because the date/time column is created as a date and a date/time variable cannot be loaded into a date column.

Maximum length of a character variable can be 16384

Character variables longer than 16384 will be truncated.

Maximum number of columns currently allowed is 254

SAS dataset is rejected is the number of variables exceed 254

There are 995 different formats available in SAS

Data integrity may be compromised or the dataset may be rejected if an unknown format is encountered. It would require each format to be coded for individually during conversion program.

Page 8: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Project Stage 2CDV v1 Build & Design

• The second stage of this project involved the further design, build and testing of the file store solution.

• It also included information sessions to users and the initial “Go Live” of the CDV.

• This second project ran from Jan 2014 until Dec 2014.

Page 9: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Project Stage 3CDV v1 Implementation

• The third stage of this project is ongoing since Jan 2015

• Roll-out of the system across the office

• Requirements gathering and specifications for CDV v2.

Page 10: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

About the CDV

• Independent of production processes

• Data stored in the same format as lodged

• Access data through a third party product

• CDV v1 accepts SAS datasets only

Page 11: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Technical Specs

• Three tier application

• Client tier: Java

• Business Logic tier: Weblogic

• Data Tier: Sybase database.

Page 12: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Functionality

• Lodge Data and Metadata

• Browse/Search the Catalogue

• Reports

• Security

Page 13: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 14: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Lodge Data and Metadata: Step 1

Page 15: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Lodge Data and Metadata: Step 2

Page 16: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Variable Details Screen

Link Classification from CARS

Page 17: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Metadata Stored

File Level

• Survey Name• Periodicity• Time Period• Version No. • Linked Themes• Micro/Macro Data• Reference Documentation• Description• Reason for Version• Date Lodged• Lodged By

Variable Level

• Name• Description• Primary Key• Unit Type• Length• Data Type• Linked Classification Details

Page 18: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 19: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 20: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 21: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 22: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 23: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 24: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 25: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015
Page 26: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Lodgement Summary

Page 27: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

Access To Data

Page 28: Corporate Data Vault Data Warehousing Workshop Sept. 23 2015 Data Warehousing Workshop Sept. 23 2015

The End

Any Questions?