Download - Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto
![Page 1: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/1.jpg)
Intro to Data Mining/Machine Learning Algorithms for Business IntelligenceIntro to Data Mining/Machine Learning Algorithms for Business Intelligence
Dr. Bambang Parmanto
![Page 2: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/2.jpg)
Extraction Of Knowledge From Data
![Page 3: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/3.jpg)
DSS Architecture: Learning and Predicting
Courtesy: Tim Graettinger
![Page 4: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/4.jpg)
Data Mining: Definitions
Data mining = the process of discovering and modeling hidden pattern in a large volume of data
Related terms = knowledge discovery in database (KDD), intelligent data analysis (IDA), decision support system (DSS).
The pattern should be novel and useful. Example of trivial (not useful) pattern: “unemployed people don’t earn income from work”
The data mining process is data-driven and must be automatic and semi-automatic.
![Page 5: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/5.jpg)
Example: Nonlinear Model
![Page 6: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/6.jpg)
Basic Fields of Data Mining
MachineLearning
Databases
Statistics
![Page 7: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/7.jpg)
Human-Centered Process
![Page 9: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/9.jpg)
Core Algorithms in Data Mining
Supervised Learning:◦Classification◦Prediction
Unsupervised Learning◦Association Rules◦Clustering◦Data Reduction (Principal Component Analysis)
◦Data Exploration and Visualization
![Page 10: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/10.jpg)
Supervised LearningSupervised: there are clear
examples from the past cases that can be used to train (supervise) the machine.
Goal: predict a single “target” or “outcome” variable
Training data where target value is known
Score to data where value is not known
Methods: Classification and Prediction
![Page 11: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/11.jpg)
Unsupervised Learning
Unsupervised: there is no clear examples to supervise the machine
Goal: segment data into meaningful segments; detect patterns
There is no target (outcome) variable to predict or classify
Methods: Association rules, data reduction & exploration, visualization
![Page 12: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/12.jpg)
Example of Supervised Learning: Classification
Goal: predict categorical target (outcome) variable
Examples: Purchase/no purchase, fraud/no fraud, creditworthy/not creditworthy…
Each row is a case (customer, tax return, applicant)
Each column is a variableTarget variable is often binary
(yes/no)
![Page 13: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/13.jpg)
Example of Supervised Learning: Prediction
Goal: predict numerical target (outcome) variable
Examples: sales, revenue, performance
As in classification:◦Each row is a case (customer, tax
return, applicant)◦Each column is a variable
Taken together, classification and prediction constitute “predictive analytics”
![Page 14: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/14.jpg)
Example of Unsupervised Learning: Association Rules
Goal: produce rules that define “what goes with what”
Example: “If X was purchased, Y was also purchased”
Rows are transactionsUsed in recommender systems –
“Our records show you bought X, you may also like Y”
Also called “affinity analysis”
![Page 15: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/15.jpg)
The Process of Data Mining
![Page 16: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/16.jpg)
Steps in Data Mining1. Define/understand purpose2. Obtain data (may involve random
sampling)3. Explore, clean, pre-process data4. Reduce the data; if supervised DM,
partition it5. Specify task (classification, clustering,
etc.)6. Choose the techniques (regression,
CART, neural networks, etc.)7. Iterative implementation and “tuning” 8. Assess results – compare models9. Deploy best model
![Page 17: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/17.jpg)
Preprocessing Data: Eliminating Outliers
17
![Page 18: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/18.jpg)
Handling Missing DataMost algorithms will not process records
with missing values. Default is to drop those records.
Solution 1: Omission◦ If a small number of records have missing values,
can omit them◦ If many records are missing values on a small set
of variables, can drop those variables (or use proxies)
◦ If many records have missing values, omission is not practical
Solution 2: Imputation ◦ Replace missing values with reasonable
substitutes◦ Lets you keep the record and use the rest of its
(non-missing) information
![Page 19: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/19.jpg)
Common Problem: Overfitting
Statistical models can produce highly complex explanations of relationships between variables
The “fit” may be excellentWhen used with new data, models
of great complexity do not do so well.
![Page 20: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/20.jpg)
100% fit – not useful for new data
200 300 400 500 600 700 800 900 10000
200
400
600
800
1000
1200
1400
1600
Expenditure
Rev
enu
e
Consequence: Deployed model will not work as well as expected with completely new data.
![Page 21: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/21.jpg)
Learning and TestingProblem: How well will our
model perform with new data?
Solution: Separate data into two parts ◦Training partition to
develop the model◦Validation partition to
implement the model and evaluate its performance on “new” data
Addresses the issue of overfitting
![Page 22: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/22.jpg)
Algorithms:
for Classification/Prediction tasks ◦k-Nearest Neighbor◦Naïve Bayes◦CART◦Discriminant Analysis◦Neural Networks
Unsupervised learning◦Association Rules◦Cluster Analysis
22
![Page 23: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/23.jpg)
K-Nearest Neighbor: The idea
How to classify: Find the k closest records to the one to be classified, and let them “vote”.
23
0
10
20
30
40
50
60
70
80
90
100
$0 $20,000 $40,000 $60,000 $80,000
Income
Ag
e Regular beer
Light beer
![Page 24: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/24.jpg)
Example
24
![Page 25: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/25.jpg)
Naïve Bayes: Basic IdeaBasic idea similar to k-nearest
neighbor: To classify an observation, find all similar observations (in terms of predictors) in the training set
Uses only categorical predictors (numerical predictors can be binned)
Basic idea equivalent to looking at pivot tables
25
![Page 26: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/26.jpg)
The “Primitive” Idea: Example
Y = personal loan acceptance (0/1)Two predictors: CreditCard (0/1), Online
(0,1)What is the probability of acceptance for
customers with CreditCard=1, Online=1?
26
Count of Personal Loan OnlineCreditCard Personal Loan 0 1 Grand Total
0 0 769 1163 19321 71 129 200
0 Total 840 1292 21321 0 321 461 782
1 36 50 861 Total 357 511 868Grand Total 1197 1803 3000
50/(461+50) = .0978
![Page 27: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/27.jpg)
Conditional Probability - Refresher
27
A = the event “customer accepts loan” (Loan=1)
B = the event “customer has credit card” (CC=1)
= probability of A given B (the conditional probability that A occurs given that B occurred)
)|( BAP
)(
)()|(
BP
BAPBAP
If P(B)>0
![Page 28: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/28.jpg)
A classic: Microsoft’s Paperclip
28
![Page 29: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/29.jpg)
Classification and Regression Trees (CART)
Trees and RulesGoal: Classify or predict an outcome based on a
set of predictorsThe output is a set of rulesExample: Goal: classify a record as “will accept credit
card offer” or “will not accept”Rule might be “IF (Income > 92.5) AND
(Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (nonacceptor)
Also called CART, Decision Trees, or just TreesRules are represented by tree diagrams
29
![Page 30: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/30.jpg)
30
![Page 31: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/31.jpg)
Key Ideas
Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new parts
Pruning the tree: Simplify the tree by pruning peripheral branches to avoid overfitting
31
![Page 32: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/32.jpg)
The first split: Lot Size = 19,000Second Split: Income = $84,000
32
![Page 33: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/33.jpg)
After All Splits
33
![Page 34: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/34.jpg)
Neural Networks: Basic IdeaCombine input information in a
complex & flexible neural net “model”
Model “coefficients” are continually tweaked in an iterative process
The network’s interim performance in classification and prediction informs successive tweaks
34
![Page 35: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/35.jpg)
Architecture
35
![Page 36: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/36.jpg)
36
![Page 37: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/37.jpg)
Discriminant AnalysisA classical statistical techniqueUsed for classification long before data
mining◦Classifying organisms into species◦Classifying skulls◦Fingerprint analysis
And also used for business data mining (loans, customer types, etc.)
Can also be used to highlight aspects that distinguish classes (profiling)
37
![Page 38: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/38.jpg)
Can we manually draw a line that separates owners from non-owners?
38
LDA: To classify a new record, measure its distance from the center of each classThen, classify the record to the closest class
![Page 39: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/39.jpg)
Loan Acceptance
39
In real world, there will be more records, more predictors, and less clear separation
![Page 40: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/40.jpg)
Association Rules (market basket analysis)Study of “what goes with what”
◦ “Customers who bought X also bought Y”◦ What symptoms go with what diagnosis
Transaction-based or event-basedAlso called “market basket
analysis” and “affinity analysis”Originated with study of customer
transactions databases to determine associations among items purchased
40
![Page 41: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/41.jpg)
LoreA famous story about association
rule mining is the "beer and diaper" story.
{diaper} > {beer}An example of how unexpected
association rules might be found from everyday data.
In 1992, Thomas Blischok of Teradata analyzed 1.2 million market baskets of 25 Osco Drug stores. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves. 41
![Page 42: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/42.jpg)
Used in many recommender systems
42
![Page 43: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/43.jpg)
Terms“IF” part = antecedent (item 1)“THEN” part = consequent (item 2)
“Item set” = the items (e.g., products) comprising the antecedent or consequent
Antecedent and consequent are disjoint (i.e., have no items in common)
Confidence: Item 2 comes together with Item 1 in 10% of all transactions
Support: Item 1 comes together with Item 2 in X% of all transactions
43
![Page 44: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/44.jpg)
Plate color purchase
44
![Page 45: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/45.jpg)
Lift ratio shows how important is the rule◦ Lift = Support (a U c) / (Support (a) x Support (c) )
Confidence shows the rate at which consequents will be found (useful in learning costs of promotion)
Support measures overall impact
45
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Support(a U c) Lift Ratio
1 100 Green=> Red, White 2 4 2 2.52 100 Green=> Red 2 6 2 1.6666673 100 Green, White=> Red 2 6 2 1.6666674 100 Green=> White 2 7 2 1.4285715 100 Green, Red=> White 2 7 2 1.4285716 100 Orange=> White 2 7 2 1.428571
![Page 46: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/46.jpg)
Application is not always easyWal-Mart knows that customers
who buy Barbie dolls have a 60% likelihood of buying one of three types of candy bars.
What does Wal-Mart do with information like that? 'I don't have a clue,' says Wal-Mart's chief of merchandising, Lee Scott
46
![Page 47: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/47.jpg)
Cluster Analysis•Goal: Form groups (clusters) of similar records•Used for segmenting markets into groups of similar customers•Example: Claritas segmented US neighborhoods based on demographics & income: “Furs & station wagons,” “Money & Brains”, …
47
![Page 48: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/48.jpg)
Example: Public Utilities
48
Goal: find clusters of similar utilities
Example of 3 rough clusters using 2 variables
Low fuel cost, low sales
High fuel cost, low sales
Low fuel cost, high sales
![Page 49: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/49.jpg)
Hierarchical Cluster
49
![Page 50: Intro to Data Mining/Machine Learning Algorithms for Business Intelligence Dr. Bambang Parmanto](https://reader036.vdocuments.site/reader036/viewer/2022062421/56649d195503460f949eebcb/html5/thumbnails/50.jpg)
ClusteringCluster analysis is an exploratory tool.
Useful only when it produces meaningful clusters
Hierarchical clustering gives visual representation of different levels of clustering◦On other hand, due to non-iterative
nature, it can be unstable, can vary highly depending on settings, and is computationally expensive
Non-hierarchical is computationally cheap and more stable; requires user to set k
Can use both methods
50