1 all powder board and ski oracle 9i workbook chapter 8: data warehouses and data mining jerry post...

40
1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

Upload: david-bradley

Post on 05-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

1

All Powder Board and Ski

Oracle 9i WorkbookChapter 8: Data Warehouses and Data MiningJerry PostCopyright © 2003

Page 2: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

2

Oracle Relational Approach

Customer

Sale

SaleItem

Item

Relational Tables

Fact Measure

DimensionDimension

Dimension Dimension

Star Design

Meta-Data

Sale +

Customer

Materialized Views

Page 3: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

3

Desired Sales Cube Dimensions

Sales Dimensions

State (ship)MonthCategoryStyleSkillLevelSizeColorManufacturerBindingStyleWeightMax?ItemMaterial?WaistWidth?

Page 4: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

4

Early Data: Spreadsheets

Page 5: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

5

External Tables: Attach to CSV

create table OldSale_Ext( SaleID INTEGER, SaleDate DATE, ShipState VARCHAR2(50), ShipZIP VARCHAR2(50), PaymentMethod VARCHAR2(50), SKU VARCHAR2(50), QuantitySold INTEGER, SalePrice NUMBER(10,2) ModelID VARCHAR2(250), ItemSize NUMBER, ManufacturerID INTEGER,

create or replace directory csv_dir as ‘D:\students\BuildAllPowder\AllPowderSampleDataCSV';

Category VARCHAR2(50), Color VARCHAR2(50), ModelYear INTEGER, Graphics VARCHAR2(50), ItemMaterial VARCHAR2(50), ListPrice NUMBER(10,2), Style VARCHAR2(50), SkillLevel INTEGER, WeightMax NUMBER, WaistWidth NUMBER, BindingStyle VARCHAR2(50) )

Continued on next slide

Warning: currency columns cannot have $ symbols or commas

Page 6: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

6

External File Definitionorganization external (

type oracle_loaderdefault directory csv_dir

access parameters ( records delimited by newline fields terminated by ',' optionally enclosed by '"' lrtrim missing field values are null

(SaleID, SaleDate char date_format date mask "mm/dd/yyyy",ShipState, ShipZIP, PaymentMethod, SKU, QuantitySold, SalePrice,odelID, ItemSize, ManufacturerID, Category, Color, ModelYear,Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax,WaistWidth, BindingStyle

) ) location ('Lab 08-01 Early Sales.csv') )

reject limit unlimited;

Page 7: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

7

Create Customer and Employee

CustomerID and EmployeeID are missing from the old data.Instead of relying on blank cell values, create a new customer called “Walk-in” and a new employee called “Employee”Write down the ID numbers generated for these anonymous entries.If you use SQL, you can assign a value of zero to these entries.

INSERT INTO Customer (CustomerID, LastName)Values (0,'Walk-in')

INSERT INTO Employee (EmployeeID, LastName)Values (0,'Staff')

Page 8: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

8

Extract Model Data

SELECT DISTINCT OldSale_ext.ModelID, OldSale_ext.ManufacturerID, OldSale_ext.Category, OldSale_ext.Color, OldSale_ext.ModelYear, OldSale_ext.Graphics, OldSale_ext.ItemMaterial, OldSale_ext.ListPrice, OldSale_ext.Style, OldSale_ext.SkillLevel, OldSale_ext.WeightMax, OldSale_ext.WaistWidth, OldSale_ext.BindingStyleFROM OldSale_ext;

Page 9: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

9

UNION Query for Models

SELECT DISTINCT ModelID, ManufacturerID, Category, …

FROM OldSales_ext

UNION

SELECT DISTINCT ModelID, ManufacturerID, Category, …

FROM OldRentals_ext

Page 10: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

10

Insert Model Data into ItemModel

INSERT INTO ItemModel (ModelID, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle)SELECT DISTINCT qryOldModels.ModelID, qryOldModels.ManufacturerID, qryOldModels.Category, qryOldModels.Color, qryOldModels.ModelYear, qryOldModels.Graphics, qryOldModels.ItemMaterial, qryOldModels.ListPrice, qryOldModels.Style, qryOldModels.SkillLevel, qryOldModels.WeightMax, qryOldModels.WaistWidth, qryOldModels.BindingStyle FROM qryOldModels;

Page 11: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

11

Insert SKU Data into Inventory

INSERT INTO Inventory (ModelID, SKU, ItemSize, QuantityOnHand)SELECT DISTINCT qryOldInventory.ModelID, qryOldInventory.SKU, qryOldInventory.ItemSize, 0 As QuantityOnHand FROM qryOldInventory;

Note the use of the column alias to force a zero value for QuantityOnHand for each row

CREATE VIEW qryOldInventory ASSELECT DISTINCT ModelID, SKU, ItemSizeFROM OldSale_extUNIONSELECT DISTINCT ModelID, SKU, ItemSizeFROM OldRental_ext;

Page 12: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

12

Copy Sales Data

INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod)SELECT DISTINCT OldSales_ext.SaleID, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethodFROM OldSales_ext;

Note that if you have added data to your Sales table, your existing SaleID values might conflict with these

You can solve the problem by adding a number to these values so they are all larger than your highest ID

INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod)SELECT DISTINCT OldSales_ext.SaleID+5000, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethodFROM OldSales_ext;

Page 13: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

13

Copy SaleItem Rows

INSERT INTO SaleItem (SaleID, SKU, QuantitySold, SalePrice)SELECT DISTINCT OldSale_ext.SaleID+5000, OldSale_ext.SKU, OldSale_ext.QuantitySold, OldSale_ext.SalePrice FROM OldSale_ext;

If you transformed the SaleID in the prior step for the Sale data, you must do the exact same calculation for SaleID in the SaleItem table

Page 14: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

14

Copy Rental Data

INSERT INTO Rental (RentID, RentDate, ExpectedReturn, PaymentMethod)SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.RentDate, OldRental_ext.ExpectedReturn, OldRental_ext.PaymentMethod FROM OldRental_ext;

INSERT INTO RentItem (RentID, SKU, RentFee, ReturnDate)SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.SKU, OldRental_ext.RentFee, OldRental_ext.ReturnDate FROM OldRental_ext;

Page 15: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

15

Discoverer Administrator: Load Business Area

Schema

Select tables

Tables and views

Page 16: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

16

Load Wizard Options: LOV

Most options are selected by default

Select the LOV option to have Discoverer build lookup lists

Page 17: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

17

Discoverer: Business Area

Tables shown as folders and named so managers understand them

Columns shown as items

Add a calculated item

Page 18: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

18

Create a Data Hierarchy

Select Category and Style from the SkiBoardStyle lookup table

Page 19: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

19

Discoverer Desktop: New Workbook

Select the dimensions and the fact item

Page 20: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

20

Initial Crosstab Layout

Row area

Column areaPage area

Page 21: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

21

Discoverer Crosstab Browser

Select all items

Format options

Totals

Page 22: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

22

Time Series Analysis: Moving Average

Page 23: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

23

Time Series Analysis: Discoverer

Page 24: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

24

Sales by State for Regression

Note that some states are missing from the list.

Page 25: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

25

Regression Data Query

CREATE VIEW StateSales2004 ASSELECT StateName, Income2001, Pop2002, Sum(SalePrice*QuantitySold) AS Sales2004FROM Sale INNER JOIN StateDemographicsON Sale.ShipState = StateDemographics.StateCodeINNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleIDWHERE ShipState IS NOT NULL AND SaleDate Between '01-Jan-2004' And '31-Dec-2004'GROUP BY StateName, Income2001, Pop2002ORDER BY StateName;

Page 26: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

26

Regression Setup

You should include the label row but be sure to check the box to show you included it

Page 27: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

27

Regression Results

Relatively high R-square

Population is a significant predictor, Income is not

Page 28: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

28

Association Rules/Market Basket

Item to find Possible location

Data mining samples D:\Oracle\ora92\dm\demo\sample

ORACLE_HOME D:\Oracle\ora92

JAVA_HOME C:\OracleData\Ora92DS\jdk

Locate folders

Page 29: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

29

Copy Files to Protect Original

compileSampleCode.bat

executeSampleCode.bat

Sample_AssociationRules.java

Sample_AssociationRules_Transactional.property

Sample_Global.property

Page 30: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

30

Edit Sample_Global.property File

miningServer.url=jdbc:oracle:thin:@YourServerName:1521:DBName

miningServer.userName=odm

miningServer.password=password

inputDataSchemaName=powder

outputSchemaName=powder

timeout=120

If necessary, use enterprise manager to unlock and assign new passwords to accounts: odm and odm_mtr

Page 31: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

31

Create New Table To Hold Transaction Basket Data

CREATE TABLE MARKET_BASKET_TX_BINNED

( SEQUENCE_ID INTEGER,

ATTRIBUTE_NAME VARCHAR2(35),

VALUE NUMBER

);

GRANT SELECT ON MARKET_BASKET_TX_BINNED TO odm;

commit;

If you use these names, you do not have to edit the Transactional.property file

Page 32: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

32

Copy SaleItem Data

INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE)SELECT SaleID,

ItemModel.Category || '_' || ItemModel.Style AS AName, 1 As Value

FROM SaleItem Inner Join InventoryON SaleItem.SKU = Inventory.SKUInner Join ItemModelON Inventory.ModelID = ItemModel.ModelIDGROUP BY SaleID, ItemModel.Category || '_' || ItemModel.Style;

Page 33: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

33

Copy Sale Data

INSERT INTO MARKET_BASKET_TX_BINNED

(SEQUENCE_ID, ATTRIBUTE_NAME, VALUE)

SELECT SaleID, 'ID', SaleID

FROM Sale;

commit;

Page 34: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

34

Remove Dashes from Attribute

UPDATE MARKET_BASKET_TX_BINNED

SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,instr(ATTRIBUTE_NAME,'-')-1)

|| '_' || substr(ATTRIBUTE_NAME,instr(ATTRIBUTE_NAME,'-')+1)

WHERE instr(ATTRIBUTE_NAME,'-') > 0;

commit;

Run at least twice—until you get zero changes.Because a row might have more than one dash.

Page 35: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

35

Limit Size of Attribute_Name

UPDATE MARKET_BASKET_TX_BINNED

SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,20);

commit;

This is critical—but is probably due to a bug in Oracle’s code. There is a slight chance it arises because of the 30 character name limitation in Oracle.

Page 36: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

36

Compile and Run the Code

SET ORACLE_HOME = D:\Oracle\ora92

SET JAVA_HOME = C:\OracleData\ora92DS\jdk

compileSampleCode.bat Sample_AssociationRules.java

executeSampleCode.bat Sample_AssociationRules Sample_AssociationRules_Transactional.property

Type as all one line—do not hit <Enter> until the end

To redirect the output to a file, at the end, add:>myfile.txt

Page 37: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

37

Sample Results

Getting top 5 rules for model: Sample_AR_Model_tx sorted by support.

Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816]

Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967]

Rule 101: If Board_Half_Pipe=1 then Clothes_=1 [support: 0.11357143, confidence: 0.4622093]

Rule 9: If Clothes_=1 then Board_Half_Pipe=1 [support: 0.11357143, confidence: 0.23177843]

Rule 100: If Ski_Freestyle=1 then Clothes_=1 [support: 0.09785714, confidence: 0.48070174]

Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16.

Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816]

Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967]

Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more.

Investigate and think about the results.Do you have too many clothes targeted to half-pipe boards and freestyle skiers, or not enough?

Page 38: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

38

GIS: Microsoft MapPoint

The Discoverer worksheet places the data into rows and columns

A dynamic copy of this sheet is used to remove the top rows

Page 39: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

39

MapPoint Data Wizard

Page 40: 1 All Powder Board and Ski Oracle 9i Workbook Chapter 8: Data Warehouses and Data Mining Jerry Post Copyright © 2003

40

GIS Analysis of Sales