micro oledb

22
June 10, 2022 Data Mining: Concepts and Tec hniques 1 Data Mining: Concepts and Techniques

Upload: yogesh-bansal

Post on 27-Sep-2015

261 views

Category:

Documents


2 download

TRANSCRIPT

  • Data Mining:
    Concepts and Techniques

    Data Mining: Concepts and Techniques

  • Appendix A: An Introduction to Microsofts OLE OLDB for Data Mining

    IntroductionOverview and design philosophyBasic componentsData set componentsData mining modelsOperations on data modelConcluding remarks

    Data Mining: Concepts and Techniques

  • Why OLE DB for Data Mining?

    Industry standard is critical for data mining development, usage, interoperability, and exchangeOLEDB for DM is a natural evolution from OLEDB and OLDB for OLAPBuilding mining applications over relational databases is nontrivialNeed different customized data mining algorithms and methodsSignificant work on the part of application buildersGoal: ease the burden of developing mining applications in large relational databases

    Data Mining: Concepts and Techniques

  • Motivation of OLE DB for DM

    Facilitate deployment of data mining modelsGenerating data mining modelsStore, maintain and refresh models as data is updatedProgrammatically use the model on other data setBrowse modelsEnable enterprise application developers to participate in building data mining solutions

    Data Mining: Concepts and Techniques

  • Features of OLE DB for DM

    Independent of provider or softwareNot specialized to any specific mining modelStructured to cater to all well-known mining modelsPart of upcoming release of Microsoft SQL Server 2000

    Data Mining: Concepts and Techniques

  • Overview

    Core relational engine exposes OLE DB in a language-based APIAnalysis server exposes OLE DB OLAP and OLE DB DMMaintain SQL metaphorReuse existing notions

    RDB engine

    OLE DB

    Analysis Server

    OLE DB OLAP/DM

    Data mining

    applications

    Data Mining: Concepts and Techniques

  • Key Operations to Support Data Mining Models

    Define a mining modelAttributes to be predictedAttributes to be used for predictionAlgorithm used to build the modelPopulate a mining model from training dataPredict attributes for new dataBrowse a mining model fro reporting and visualization

    Data Mining: Concepts and Techniques

  • DMM As Analogous to A Table in SQL

    Create a data mining module objectCREATE MINING MODEL [model_name] Insert training data into the model and train itINSERT INTO [model_name]Use the data mining modelSELECT relation_name.[id], [model_name].[predict_attr]consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/resetPredictions on datasets: prediction join between a model and a data set (tables)Deploy DMM by just writing SQL queries!

    Data Mining: Concepts and Techniques

  • Two Basic Components

    Cases/caseset: input dataA table or nested tables (for hierarchical data)Data mining model (DMM): a special type of tableA caseset is associated with a DMM and meta-info while creating a DMMSave mining algorithm and resulting abstraction instead of data itselfFundamental operations: CREATE, INSERT INTO, PREDICTION JOIN, SELECT, DELETE FROM, and DROP

    Data Mining: Concepts and Techniques

  • Flatterned Representation of Caseset

    Problem: Lots of replication!

    CustomersCustomer IDGenderHair ColorAgeAge ProbCar OwernershipCustomer IDCarCar ProbProduct PurchasesCustomer IDProduct NameQuantityProduct TypeCIDGendHairAgeAge probProdQuanTypeCarCar prob1MaleBlack35100%TV1ElecCar100%1MaleBlack35100%VCR1ElecCar100%1MaleBlack35100%Ham6FoodCar100%1MaleBlack35100%TV1ElecVan50%1MaleBlack35100%VCR1ElecVan50%1MaleBlack35100%Ham6FoodVan50%

    Data Mining: Concepts and Techniques

  • Logical Nested Table Representation of Caseset

    Use Data Shaping Service to generate a hierarchical rowsetPart of Microsoft Data Access Components (MDAC) productsCIDGendHairAgeAge probProduct PurchasesCar OwnershipProdQuanTypeCarCar prob1MaleBlack35100%TV1ElecCar100%VCR1ElecVan50%Ham6Food

    Data Mining: Concepts and Techniques

  • More About Nested Table

    Not necessary for the storage subsystem to support nested recordsCases are only instantiated as nested rowsets prior to training/predicting data mining modelsSame physical data may be used to generate different casesets

    Data Mining: Concepts and Techniques

  • Defining A Data Mining Model

    The name of the modelThe algorithm and parametersThe columns of caseset and the relationships among columnsSource columns and prediction columns

    Data Mining: Concepts and Techniques

  • Example

    CREATE MINING MODEL [Age Prediction]%Name of Model

    (

    [Customer ID]LONGKEY,%source column

    [Gender]TEXTDISCRETE,%source column

    [Age]DoubleDISCRETIZED() PREDICT,%prediction column

    [Product Purchases]TABLE%source column

    (

    [Product Name]TEXTKEY,%source column

    [Quantity]DOUBLENORMAL CONTINUOUS,%source column

    [Product Type]TEXTDISCRETE RELATED TO [Product Name]

    %source column

    ))

    USING [Decision_Trees_101]%Mining algorithm used

    Data Mining: Concepts and Techniques

  • Column Specifiers

    KEYATTRIBUTERELATION (RELATED TO clause)QUALIFIER (OF clause)PROBABILITY: [0, 1]VARIANCESUPPORTPROBABILITY-VARIANCEORDERTABLE

    Data Mining: Concepts and Techniques

  • Attribute Types

    DISCRETEORDEREDCYCLICALCONTINOUSDISCRETIZEDSEQUENCE_TIME

    Data Mining: Concepts and Techniques

  • Populating A DMM

    Use INSERT INTO statementConsuming a case using the data mining modelUse SHAPE statement to create the nested table from the input data

    Data Mining: Concepts and Techniques

  • Example: Populating a DMM

    INSERT INTO [Age Prediction]

    (

    [Customer ID], [Gender], [Age],

    [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type])

    )

    SHAPE

    {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]}

    APPEND

    {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales

    ORDER BY [CustID]}

    RELATE [Customer ID] TO [CustID]

    )

    AS [Product Purchases]

    Data Mining: Concepts and Techniques

  • Using Data Model to Predict

    Prediction joinPrediction on dataset D using DMM MDifferent to equi-joinDMM: a truth tableSELECT statement associated with PREDICTION JOIN specifies values extracted from DMM

    Data Mining: Concepts and Techniques

  • Example: Using a DMM in Prediction

    SELECT t.[Customer ID], [Age Prediction].[Age]

    FROM [Age Prediction]

    PRECTION JOIN

    (SHAPE

    {SELECT [Customer ID], [Gender] FROM Customers ORDER BY [Customer ID]}

    APPEND

    (

    {SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER BY [CustID]}

    RELATE [Customer ID] TO [CustID]

    )

    AS [Product Purchases]

    )

    AS t

    ON [Age Prediction].[Gender]=t.[Gender] AND

    [Age Prediction].[Product Purchases].[Product Name]=t.[Product Purchases].[Product Name] AND

    [Age Prediction].[Product Purchases].[Quantity]=t.[Product Purchases].[Quantity]

    Data Mining: Concepts and Techniques

  • Browsing DMM

    What is in a DMM?Rules, formulas, trees, , etcBrowsing DMMVisualization

    Data Mining: Concepts and Techniques

  • Concluding Remarks

    OLE DB for DM integrates data mining and database systemsA good standard for mining application builders How can we be involved?Provide association/sequential pattern mining modules for OLE DB for DM?Design more concrete language primitives?Referenceshttp://www.microsoft.com/data.oledb/dm.html

    Data Mining: Concepts and Techniques