sql server 2008 data mining

31
SSAS 2008 Data Mining Lynn Langit MSDN Developer Evangelist – Southern California http://blogs.msdn.com/SoCalDevGal

Upload: llangit

Post on 31-Oct-2014

20 views

Category:

Business


2 download

DESCRIPTION

Presentation given to Monrovia UG by Lynn Langit

TRANSCRIPT

Page 1: SQL Server 2008 Data Mining

SSAS 2008 Data Mining

Lynn LangitMSDN Developer Evangelist – Southern California

http://blogs.msdn.com/SoCalDevGal

Page 2: SQL Server 2008 Data Mining

Session Prerequisites

• Working SQL Server 2005 Developer• Understanding of OLAP concepts• Working SQL Server Analysis Server 2005

Developer• Interest in or basic knowledge of Data Mining

concepts

Page 3: SQL Server 2008 Data Mining

Session Objectives and Agenda• Understand what, why, when & how of SQL

Server 2008 Data Mining• Examine the core functionality of the Data

Mining Extensions• Hear about the new and/or advanced

functionality of Data Mining

Page 4: SQL Server 2008 Data Mining

Predictive AnalyticsPredictive AnalyticsPredictive AnalyticsPredictive Analytics

PresentatioPresentationn

ExplorationExploration DiscoveryDiscovery

PassivePassive

InteractiInteractiveve

ProactivProactivee

Role of SoftwareRole of Software

Business Business InsightInsight

Canned reportingCanned reporting

Ad-hoc reportingAd-hoc reporting

OLAPOLAP

Data miningData mining

What and Why Data Mining?

Page 5: SQL Server 2008 Data Mining

OLAP vs. Data Mining

Page 6: SQL Server 2008 Data Mining

DM - From Scenarios to Tasks

Page 7: SQL Server 2008 Data Mining

From Tasks to Techniques

Page 8: SQL Server 2008 Data Mining

BI for Everyone

Enterprise – Performance PointEnterprise – Performance Point

Individual – Excel Individual – Excel

Project – Share PointProject – Share Point

Page 9: SQL Server 2008 Data Mining

Microsoft’s Predictive Analytics

Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)

Application Application DeveloperDeveloper

Data Mining Data Mining SpecialistSpecialist

Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation

SQL Server 2005 SQL Server 2005 Business Intelligence Development StudioBusiness Intelligence Development Studio

Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services

Information Information WorkerWorker

Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system

Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining

BI AnalystBI Analyst

Custom Custom AlgorithmsAlgorithms

Page 10: SQL Server 2008 Data Mining

Data Mining Add-ins for Office 2007Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007

Data Mining Template for Visio 2007Data Mining Template for Visio 2007

Data Mining Client for Excel 2007Data Mining Client for Excel 2007

Information Information WorkerWorker

BI AnalystBI Analyst

Data Mining Data Mining SpecialistSpecialist

Page 11: SQL Server 2008 Data Mining

SSASSSAS(Data(Data

Mining)Mining)ExcelExcel

SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel

SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps

SSISSSISSSASSSASExcelExcel

Business Business UnderstandiUnderstandi

ngng

Data Data UnderstandiUnderstandi

ngng

Data Data PreparationPreparation

ModelingModeling

EvaluationEvaluation

DeploymentDeployment

DataData

Microsoft Data Mining Lifecycle CRISP-DM

www.crisp-dm.org

Page 12: SQL Server 2008 Data Mining

Understand & Prepare specifics

Page 13: SQL Server 2008 Data Mining

Demo 1 – Explore / Clean / Partition Data

Page 14: SQL Server 2008 Data Mining

Modeling Specifics

Page 15: SQL Server 2008 Data Mining

Demo 3 – Modeling

Page 16: SQL Server 2008 Data Mining

Evaluation Specifics

Page 17: SQL Server 2008 Data Mining

Demo 4 – Evaluation

Page 18: SQL Server 2008 Data Mining

Data Mining – Logical Model

Mining ModelMining Model Mining ModelMining Model

Training DataTraining Data

DB dataDB dataClient dataClient dataApplication dataApplication data

Data MiningData MiningEngineEngine

Data To Data To PredictPredict

Predicted DataPredicted Data

Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””

Data MiningData MiningEngineEngine

Page 19: SQL Server 2008 Data Mining

Analysis ServicesAnalysis ServicesServerServer

Mining ModelMining Model

Data Mining AlgorithmData Mining Algorithm DataDataSourceSource

Data Mining - Physical Model

Your ApplicationYour Application

OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA

DeploDeployy

BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)

App DataApp Data

Page 20: SQL Server 2008 Data Mining

Data Mining Interfaces – APIs

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 21: SQL Server 2008 Data Mining

Configuration

• Model Creation/Management• Database Administrators• Session Mining Models

• Model Application• Permissions on models• Permissions on data sources

Page 22: SQL Server 2008 Data Mining

Deployment• Browse

• Copy to Excel• Drillthrough

• Query• Default• Advanced

• Excel Services• Manage models and structures

• Export/Import• Rename

• Connection• Database• Trace

Page 23: SQL Server 2008 Data Mining

Excel Functions*

DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])

DMPREDICT ( Connection, Model, PredictionResult,

Value1, Name1, [...,Value32, Name32])

DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])

Page 24: SQL Server 2008 Data Mining

Data Mining Extensions (DMX) CREATE MINING MODELCREATE MINING MODEL CreditRiskCreditRisk

(CustID(CustID LONG KEY, LONG KEY,

Gender TEXT DISCRETE,Gender TEXT DISCRETE,

Income Income LONG LONG CONTINUOUS,CONTINUOUS,

Profession TEXT DISCRETE,Profession TEXT DISCRETE,

RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)

USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees

INSERT INTOINSERT INTO CreditRisk CreditRisk

(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)

Select Select

CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk

From CustomersFrom Customers

SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)

FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers

ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender

ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income

AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession

Page 25: SQL Server 2008 Data Mining

DMX Column Expressions

• Predictable Columns• Source Data Columns• Functions

Predict“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to

PredictAssociationPredictSequencePredictTimeSeries

• PredictProbability• PredictSupport• PredictHistogram• Cluster• ClusterProbability• GetNodeId• IsInNode

• Arithmetic operators• Stored Procedure• Subselect

• Select from nested tables

Page 26: SQL Server 2008 Data Mining

Data Mining Interfaces – XMLA ++

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 27: SQL Server 2008 Data Mining

New to SQL Server 2008

• Microsoft_Time_Series algorithm improved • now includes ARIMA in addition to the existing ARTxp method, and a blending algorithm is now used to deliver

more accurate and stable predictions, both short and long term, from a hybrid model. In addition, a new prediction mode allows you to add new data to time series models.

• Holdout Support added• You can easily partition your data into training and test sets that are stored in the mining structure and are

available to query after processing.

• Ability to build mining models based on filtered subsets added• (e.g. just male customers), this means that you no longer have to create multiple mining structures and re-read

the source data for such variations over a dataset

• Drillthrough functionality extended • makes all mining structure columns available, not just columns included in the model• allows you to build more compact models without sacrificing the ability to producing actionable output reports

like targeted mailing lists.

• Cross-validation added• allows users to quickly validate their modeling approach by automatically building temporary models and

evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in Business Intelligence Development Studio, in addition to being accessible programmatically via a stored procedure call.

Page 28: SQL Server 2008 Data Mining

Summary

• Data Mining in SQL Server 2008 is mature, powerful and accessible

• Can use Excel 2007• Familiar client for BI – OLAP cubes AND Data Mining

models• Model Creators / Users• Excel Data or Server Data

• SSAS and Excel both support the full DM Cycle• Data Understanding• Data Preparation• Modeling• Validation• Deployment

Page 29: SQL Server 2008 Data Mining

DM WebcastsFri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)

Page 30: SQL Server 2008 Data Mining

BI Resources from Lynn Langit

1. “Foundations of SQL Server 2005 Business Intelligence”(published by APress in April 2007)

2. 2. http://blogs.msdn.com/SoCalDevGal

3. “Building Business Intelligence 3. “Building Business Intelligence Solutions with SQL Server 2008” Solutions with SQL Server 2008” (MSPress Fall 2008)(MSPress Fall 2008)

Page 31: SQL Server 2008 Data Mining

DM Resources

Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx

Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet

Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx

Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx

SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx