1 all powder board and ski oracle 9i workbook chapter 8: data warehouses and data mining jerry post...
TRANSCRIPT
1
All Powder Board and Ski
Oracle 9i WorkbookChapter 8: Data Warehouses and Data MiningJerry PostCopyright © 2003
2
Oracle Relational Approach
Customer
Sale
SaleItem
Item
Relational Tables
Fact Measure
DimensionDimension
Dimension Dimension
Star Design
Meta-Data
Sale +
Customer
Materialized Views
3
Desired Sales Cube Dimensions
Sales Dimensions
State (ship)MonthCategoryStyleSkillLevelSizeColorManufacturerBindingStyleWeightMax?ItemMaterial?WaistWidth?
4
Early Data: Spreadsheets
5
External Tables: Attach to CSV
create table OldSale_Ext( SaleID INTEGER, SaleDate DATE, ShipState VARCHAR2(50), ShipZIP VARCHAR2(50), PaymentMethod VARCHAR2(50), SKU VARCHAR2(50), QuantitySold INTEGER, SalePrice NUMBER(10,2) ModelID VARCHAR2(250), ItemSize NUMBER, ManufacturerID INTEGER,
create or replace directory csv_dir as ‘D:\students\BuildAllPowder\AllPowderSampleDataCSV';
Category VARCHAR2(50), Color VARCHAR2(50), ModelYear INTEGER, Graphics VARCHAR2(50), ItemMaterial VARCHAR2(50), ListPrice NUMBER(10,2), Style VARCHAR2(50), SkillLevel INTEGER, WeightMax NUMBER, WaistWidth NUMBER, BindingStyle VARCHAR2(50) )
Continued on next slide
Warning: currency columns cannot have $ symbols or commas
6
External File Definitionorganization external (
type oracle_loaderdefault directory csv_dir
access parameters ( records delimited by newline fields terminated by ',' optionally enclosed by '"' lrtrim missing field values are null
(SaleID, SaleDate char date_format date mask "mm/dd/yyyy",ShipState, ShipZIP, PaymentMethod, SKU, QuantitySold, SalePrice,odelID, ItemSize, ManufacturerID, Category, Color, ModelYear,Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax,WaistWidth, BindingStyle
) ) location ('Lab 08-01 Early Sales.csv') )
reject limit unlimited;
7
Create Customer and Employee
CustomerID and EmployeeID are missing from the old data.Instead of relying on blank cell values, create a new customer called “Walk-in” and a new employee called “Employee”Write down the ID numbers generated for these anonymous entries.If you use SQL, you can assign a value of zero to these entries.
INSERT INTO Customer (CustomerID, LastName)Values (0,'Walk-in')
INSERT INTO Employee (EmployeeID, LastName)Values (0,'Staff')
8
Extract Model Data
SELECT DISTINCT OldSale_ext.ModelID, OldSale_ext.ManufacturerID, OldSale_ext.Category, OldSale_ext.Color, OldSale_ext.ModelYear, OldSale_ext.Graphics, OldSale_ext.ItemMaterial, OldSale_ext.ListPrice, OldSale_ext.Style, OldSale_ext.SkillLevel, OldSale_ext.WeightMax, OldSale_ext.WaistWidth, OldSale_ext.BindingStyleFROM OldSale_ext;
9
UNION Query for Models
SELECT DISTINCT ModelID, ManufacturerID, Category, …
FROM OldSales_ext
UNION
SELECT DISTINCT ModelID, ManufacturerID, Category, …
FROM OldRentals_ext
10
Insert Model Data into ItemModel
INSERT INTO ItemModel (ModelID, ManufacturerID, Category, Color, ModelYear, Graphics, ItemMaterial, ListPrice, Style, SkillLevel, WeightMax, WaistWidth, BindingStyle)SELECT DISTINCT qryOldModels.ModelID, qryOldModels.ManufacturerID, qryOldModels.Category, qryOldModels.Color, qryOldModels.ModelYear, qryOldModels.Graphics, qryOldModels.ItemMaterial, qryOldModels.ListPrice, qryOldModels.Style, qryOldModels.SkillLevel, qryOldModels.WeightMax, qryOldModels.WaistWidth, qryOldModels.BindingStyle FROM qryOldModels;
11
Insert SKU Data into Inventory
INSERT INTO Inventory (ModelID, SKU, ItemSize, QuantityOnHand)SELECT DISTINCT qryOldInventory.ModelID, qryOldInventory.SKU, qryOldInventory.ItemSize, 0 As QuantityOnHand FROM qryOldInventory;
Note the use of the column alias to force a zero value for QuantityOnHand for each row
CREATE VIEW qryOldInventory ASSELECT DISTINCT ModelID, SKU, ItemSizeFROM OldSale_extUNIONSELECT DISTINCT ModelID, SKU, ItemSizeFROM OldRental_ext;
12
Copy Sales Data
INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod)SELECT DISTINCT OldSales_ext.SaleID, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethodFROM OldSales_ext;
Note that if you have added data to your Sales table, your existing SaleID values might conflict with these
You can solve the problem by adding a number to these values so they are all larger than your highest ID
INSERT INTO Sale (SaleID, SaleDate, ShipState, ShipZIP, PaymentMethod)SELECT DISTINCT OldSales_ext.SaleID+5000, OldSales_ext.SaleDate, OldSales_ext.ShipState, OldSales_ext.ShipZIP, OldSales_ext.PaymentMethodFROM OldSales_ext;
13
Copy SaleItem Rows
INSERT INTO SaleItem (SaleID, SKU, QuantitySold, SalePrice)SELECT DISTINCT OldSale_ext.SaleID+5000, OldSale_ext.SKU, OldSale_ext.QuantitySold, OldSale_ext.SalePrice FROM OldSale_ext;
If you transformed the SaleID in the prior step for the Sale data, you must do the exact same calculation for SaleID in the SaleItem table
14
Copy Rental Data
INSERT INTO Rental (RentID, RentDate, ExpectedReturn, PaymentMethod)SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.RentDate, OldRental_ext.ExpectedReturn, OldRental_ext.PaymentMethod FROM OldRental_ext;
INSERT INTO RentItem (RentID, SKU, RentFee, ReturnDate)SELECT DISTINCT OldRental_ext.RentID+5000, OldRental_ext.SKU, OldRental_ext.RentFee, OldRental_ext.ReturnDate FROM OldRental_ext;
15
Discoverer Administrator: Load Business Area
Schema
Select tables
Tables and views
16
Load Wizard Options: LOV
Most options are selected by default
Select the LOV option to have Discoverer build lookup lists
17
Discoverer: Business Area
Tables shown as folders and named so managers understand them
Columns shown as items
Add a calculated item
18
Create a Data Hierarchy
Select Category and Style from the SkiBoardStyle lookup table
19
Discoverer Desktop: New Workbook
Select the dimensions and the fact item
20
Initial Crosstab Layout
Row area
Column areaPage area
21
Discoverer Crosstab Browser
Select all items
Format options
Totals
22
Time Series Analysis: Moving Average
23
Time Series Analysis: Discoverer
24
Sales by State for Regression
Note that some states are missing from the list.
25
Regression Data Query
CREATE VIEW StateSales2004 ASSELECT StateName, Income2001, Pop2002, Sum(SalePrice*QuantitySold) AS Sales2004FROM Sale INNER JOIN StateDemographicsON Sale.ShipState = StateDemographics.StateCodeINNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleIDWHERE ShipState IS NOT NULL AND SaleDate Between '01-Jan-2004' And '31-Dec-2004'GROUP BY StateName, Income2001, Pop2002ORDER BY StateName;
26
Regression Setup
You should include the label row but be sure to check the box to show you included it
27
Regression Results
Relatively high R-square
Population is a significant predictor, Income is not
28
Association Rules/Market Basket
Item to find Possible location
Data mining samples D:\Oracle\ora92\dm\demo\sample
ORACLE_HOME D:\Oracle\ora92
JAVA_HOME C:\OracleData\Ora92DS\jdk
Locate folders
29
Copy Files to Protect Original
compileSampleCode.bat
executeSampleCode.bat
Sample_AssociationRules.java
Sample_AssociationRules_Transactional.property
Sample_Global.property
30
Edit Sample_Global.property File
miningServer.url=jdbc:oracle:thin:@YourServerName:1521:DBName
miningServer.userName=odm
miningServer.password=password
inputDataSchemaName=powder
outputSchemaName=powder
timeout=120
If necessary, use enterprise manager to unlock and assign new passwords to accounts: odm and odm_mtr
31
Create New Table To Hold Transaction Basket Data
CREATE TABLE MARKET_BASKET_TX_BINNED
( SEQUENCE_ID INTEGER,
ATTRIBUTE_NAME VARCHAR2(35),
VALUE NUMBER
);
GRANT SELECT ON MARKET_BASKET_TX_BINNED TO odm;
commit;
If you use these names, you do not have to edit the Transactional.property file
32
Copy SaleItem Data
INSERT INTO MARKET_BASKET_TX_BINNED (SEQUENCE_ID, ATTRIBUTE_NAME, VALUE)SELECT SaleID,
ItemModel.Category || '_' || ItemModel.Style AS AName, 1 As Value
FROM SaleItem Inner Join InventoryON SaleItem.SKU = Inventory.SKUInner Join ItemModelON Inventory.ModelID = ItemModel.ModelIDGROUP BY SaleID, ItemModel.Category || '_' || ItemModel.Style;
33
Copy Sale Data
INSERT INTO MARKET_BASKET_TX_BINNED
(SEQUENCE_ID, ATTRIBUTE_NAME, VALUE)
SELECT SaleID, 'ID', SaleID
FROM Sale;
commit;
34
Remove Dashes from Attribute
UPDATE MARKET_BASKET_TX_BINNED
SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,instr(ATTRIBUTE_NAME,'-')-1)
|| '_' || substr(ATTRIBUTE_NAME,instr(ATTRIBUTE_NAME,'-')+1)
WHERE instr(ATTRIBUTE_NAME,'-') > 0;
commit;
Run at least twice—until you get zero changes.Because a row might have more than one dash.
35
Limit Size of Attribute_Name
UPDATE MARKET_BASKET_TX_BINNED
SET ATTRIBUTE_NAME = substr(ATTRIBUTE_NAME,1,20);
commit;
This is critical—but is probably due to a bug in Oracle’s code. There is a slight chance it arises because of the 30 character name limitation in Oracle.
36
Compile and Run the Code
SET ORACLE_HOME = D:\Oracle\ora92
SET JAVA_HOME = C:\OracleData\ora92DS\jdk
compileSampleCode.bat Sample_AssociationRules.java
executeSampleCode.bat Sample_AssociationRules Sample_AssociationRules_Transactional.property
Type as all one line—do not hit <Enter> until the end
To redirect the output to a file, at the end, add:>myfile.txt
37
Sample Results
Getting top 5 rules for model: Sample_AR_Model_tx sorted by support.
Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816]
Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967]
Rule 101: If Board_Half_Pipe=1 then Clothes_=1 [support: 0.11357143, confidence: 0.4622093]
Rule 9: If Clothes_=1 then Board_Half_Pipe=1 [support: 0.11357143, confidence: 0.23177843]
Rule 100: If Ski_Freestyle=1 then Clothes_=1 [support: 0.09785714, confidence: 0.48070174]
Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16.
Rule 124: If Boots_=1 then Clothes_=1 [support: 0.17285714, confidence: 0.44814816]
Rule 38: If Clothes_=1 then Boots_=1 [support: 0.17285714, confidence: 0.35276967]
Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more.
Investigate and think about the results.Do you have too many clothes targeted to half-pipe boards and freestyle skiers, or not enough?
38
GIS: Microsoft MapPoint
The Discoverer worksheet places the data into rows and columns
A dynamic copy of this sheet is used to remove the top rows
39
MapPoint Data Wizard
40
GIS Analysis of Sales