Download - SQL Server 2008 Data Mining
SSAS 2008 Data Mining
Lynn LangitMSDN Developer Evangelist – Southern California
http://blogs.msdn.com/SoCalDevGal
Session Prerequisites
• Working SQL Server 2005 Developer• Understanding of OLAP concepts• Working SQL Server Analysis Server 2005
Developer• Interest in or basic knowledge of Data Mining
concepts
Session Objectives and Agenda• Understand what, why, when & how of SQL
Server 2008 Data Mining• Examine the core functionality of the Data
Mining Extensions• Hear about the new and/or advanced
functionality of Data Mining
Predictive AnalyticsPredictive AnalyticsPredictive AnalyticsPredictive Analytics
PresentatioPresentationn
ExplorationExploration DiscoveryDiscovery
PassivePassive
InteractiInteractiveve
ProactivProactivee
Role of SoftwareRole of Software
Business Business InsightInsight
Canned reportingCanned reporting
Ad-hoc reportingAd-hoc reporting
OLAPOLAP
Data miningData mining
What and Why Data Mining?
OLAP vs. Data Mining
DM - From Scenarios to Tasks
From Tasks to Techniques
BI for Everyone
Enterprise – Performance PointEnterprise – Performance Point
Individual – Excel Individual – Excel
Project – Share PointProject – Share Point
Microsoft’s Predictive Analytics
Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)
Application Application DeveloperDeveloper
Data Mining Data Mining SpecialistSpecialist
Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation
SQL Server 2005 SQL Server 2005 Business Intelligence Development StudioBusiness Intelligence Development Studio
Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services
Information Information WorkerWorker
Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system
Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining
BI AnalystBI Analyst
Custom Custom AlgorithmsAlgorithms
Data Mining Add-ins for Office 2007Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007
Data Mining Template for Visio 2007Data Mining Template for Visio 2007
Data Mining Client for Excel 2007Data Mining Client for Excel 2007
Information Information WorkerWorker
BI AnalystBI Analyst
Data Mining Data Mining SpecialistSpecialist
SSASSSAS(Data(Data
Mining)Mining)ExcelExcel
SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel
SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps
SSISSSISSSASSSASExcelExcel
Business Business UnderstandiUnderstandi
ngng
Data Data UnderstandiUnderstandi
ngng
Data Data PreparationPreparation
ModelingModeling
EvaluationEvaluation
DeploymentDeployment
DataData
Microsoft Data Mining Lifecycle CRISP-DM
www.crisp-dm.org
Understand & Prepare specifics
Demo 1 – Explore / Clean / Partition Data
Modeling Specifics
Demo 3 – Modeling
Evaluation Specifics
Demo 4 – Evaluation
Data Mining – Logical Model
Mining ModelMining Model Mining ModelMining Model
Training DataTraining Data
DB dataDB dataClient dataClient dataApplication dataApplication data
Data MiningData MiningEngineEngine
Data To Data To PredictPredict
Predicted DataPredicted Data
Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””
Data MiningData MiningEngineEngine
Analysis ServicesAnalysis ServicesServerServer
Mining ModelMining Model
Data Mining AlgorithmData Mining Algorithm DataDataSourceSource
Data Mining - Physical Model
Your ApplicationYour Application
OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA
DeploDeployy
BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)
App DataApp Data
Data Mining Interfaces – APIs
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET
.Net Stored Procedures Microsoft Algorithms Third Party Algorithms
XMLAXMLAOver TCP/IPOver TCP/IP
OLEDB for OLAP/DM ADO/DSO
XMLAXMLAOver HTTPOver HTTP
Any Platform, Any Device
C++ App VB App .Net App
AMO
Any App
ADOMD.NET
WANWAN
DM Interfaces
Configuration
• Model Creation/Management• Database Administrators• Session Mining Models
• Model Application• Permissions on models• Permissions on data sources
Deployment• Browse
• Copy to Excel• Drillthrough
• Query• Default• Advanced
• Excel Services• Manage models and structures
• Export/Import• Rename
• Connection• Database• Trace
Excel Functions*
DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])
DMPREDICT ( Connection, Model, PredictionResult,
Value1, Name1, [...,Value32, Name32])
DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])
Data Mining Extensions (DMX) CREATE MINING MODELCREATE MINING MODEL CreditRiskCreditRisk
(CustID(CustID LONG KEY, LONG KEY,
Gender TEXT DISCRETE,Gender TEXT DISCRETE,
Income Income LONG LONG CONTINUOUS,CONTINUOUS,
Profession TEXT DISCRETE,Profession TEXT DISCRETE,
RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)
USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees
INSERT INTOINSERT INTO CreditRisk CreditRisk
(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)
Select Select
CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk
From CustomersFrom Customers
SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)
FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers
ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender
ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income
AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession
DMX Column Expressions
• Predictable Columns• Source Data Columns• Functions
Predict“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to
PredictAssociationPredictSequencePredictTimeSeries
• PredictProbability• PredictSupport• PredictHistogram• Cluster• ClusterProbability• GetNodeId• IsInNode
• Arithmetic operators• Stored Procedure• Subselect
• Select from nested tables
Data Mining Interfaces – XMLA ++
Analysis Server (msmdsrv.exe)
OLAP Data Mining
Server ADOMD.NET
.Net Stored Procedures Microsoft Algorithms Third Party Algorithms
XMLAXMLAOver TCP/IPOver TCP/IP
OLEDB for OLAP/DM ADO/DSO
XMLAXMLAOver HTTPOver HTTP
Any Platform, Any Device
C++ App VB App .Net App
AMO
Any App
ADOMD.NET
WANWAN
DM Interfaces
New to SQL Server 2008
• Microsoft_Time_Series algorithm improved • now includes ARIMA in addition to the existing ARTxp method, and a blending algorithm is now used to deliver
more accurate and stable predictions, both short and long term, from a hybrid model. In addition, a new prediction mode allows you to add new data to time series models.
• Holdout Support added• You can easily partition your data into training and test sets that are stored in the mining structure and are
available to query after processing.
• Ability to build mining models based on filtered subsets added• (e.g. just male customers), this means that you no longer have to create multiple mining structures and re-read
the source data for such variations over a dataset
• Drillthrough functionality extended • makes all mining structure columns available, not just columns included in the model• allows you to build more compact models without sacrificing the ability to producing actionable output reports
like targeted mailing lists.
• Cross-validation added• allows users to quickly validate their modeling approach by automatically building temporary models and
evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in Business Intelligence Development Studio, in addition to being accessible programmatically via a stored procedure call.
Summary
• Data Mining in SQL Server 2008 is mature, powerful and accessible
• Can use Excel 2007• Familiar client for BI – OLAP cubes AND Data Mining
models• Model Creators / Users• Excel Data or Server Data
• SSAS and Excel both support the full DM Cycle• Data Understanding• Data Preparation• Modeling• Validation• Deployment
DM WebcastsFri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)
BI Resources from Lynn Langit
1. “Foundations of SQL Server 2005 Business Intelligence”(published by APress in April 2007)
2. 2. http://blogs.msdn.com/SoCalDevGal
3. “Building Business Intelligence 3. “Building Business Intelligence Solutions with SQL Server 2008” Solutions with SQL Server 2008” (MSPress Fall 2008)(MSPress Fall 2008)
DM Resources
Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx
Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet
Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx
Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx
SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx