a framework for scalable cost- sensitive learning based on combining probabilities and benefits wei...
TRANSCRIPT
![Page 1: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/1.jpg)
A Framework for Scalable Cost-A Framework for Scalable Cost-sensitive Learning Based on sensitive Learning Based on Combining Probabilities and Combining Probabilities and BenefitsBenefits
Wei Fan, Haixun Wang, and Philip S. YuWei Fan, Haixun Wang, and Philip S. YuIBM T.J.WatsonIBM T.J.Watson
Salvatore J. StolfoSalvatore J. StolfoColumbia UniversityColumbia University
![Page 2: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/2.jpg)
Scalable Issues of Data MiningScalable Issues of Data Mining
ƒ Two folds: the data and the algorithm.ƒ Dataset:
too big to fit into memory.inherently distributed across the network.incremental data available periodically.
![Page 3: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/3.jpg)
Scalable Issues of Data Mining Scalable Issues of Data Mining
ƒ Learning algorithm:non-linear complexity in the size of dataset n. memory based due to random access pattern of record in dataset.significantly slower if dataset is not held entirely in memory.
ƒ State-of-the-artmany scalable solutions are algorithm specific.
decision trees: SPRINT, RainForest and BOATgeneral algorithms are not very scalable and only work for cost-insensitive problemsmeta-learning
ƒ Question: general and work for both cost-sensitive and cost-insentive problems.
![Page 4: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/4.jpg)
Cost-sensitive ProblemsCost-sensitive Problems
ƒ Charity Donation:Solicit to people who will donate large amount of charity.Costs $0.68 to send a letter.E(x): expected donation amount.Only solicit if E(x) > 0.68, otherwise lose money.
ƒ Credit card fraud detection:Detect frauds with high transaction amount
$90 to challenge a potential fraudE(x): expected fraudulant transaction amount.Only challenge if E(x) > $90, otherwise lose money.
ƒ Question: how to estimate E(x) efficiently?
![Page 5: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/5.jpg)
Basic FrameworkBasic Framework
D
D1 D2D2
large dataset
partition into
k subsets
ML1ML2 MLt
C1 C2Ck
generate
k models
![Page 6: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/6.jpg)
Basic FrameworkBasic Framework
DTest Set
C1 C2 Ck
Sent to k models
P1 P2 PkCompute k predictions
Combine
P
Combine to one prediction
![Page 7: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/7.jpg)
Cost-sensitive Decision MakingCost-sensitive Decision Making
ƒ Assume that records the benefit received by predicting an example of class to be an instance of class .
ƒ The expected benefit received to predict an example to be an instance of class (regardless of its true label) is
ƒ The optimal decision-making policy chooses the label that maximizes the expected benefit, i.e.,
ƒ When and is a
traditional accuracy-based problem.ƒ Total benefits
![Page 8: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/8.jpg)
Charity Donation ExampleCharity Donation Example
ƒ It costs $.68 to send a solicitation.ƒ Assume that is the best
estimate of the donation amount,
ƒ The cost-sensitive decision making will solicit an individual if and only if
![Page 9: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/9.jpg)
Credit Card Fraud Detection Credit Card Fraud Detection ExampleExample
ƒ It costs $90 to challenge a potential fraud
ƒ Assume that y(x) is the transaction amount
ƒ The cost-sensitive decision making policy will predict a transaction to be fraudulent if and only if
![Page 10: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/10.jpg)
Adult DatasetAdult Dataset
ƒ Downloaded from UCI database.ƒ Associate a benefit factor 2 to positives
and a benefit factor 1 to negatives
ƒ The decision to predict positive is
![Page 11: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/11.jpg)
Calculating probabilitiesCalculating probabilities
For decision trees, is the number of examples in a node and is the number of examples with class label , then the probability is more sophisticated methods
smoothing:early stopping, and early stopping plus smoothing
For rules, probability is calucated in the same way as decision trees
For naive Bayes, is the score for
class label , then
binning
![Page 12: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/12.jpg)
Combining Technique-Combining Technique-AveragingAveraging
ƒ Each model computes an expected benefit for example over every class label
ƒ Combining individual expected benefit together
ƒ We choose the label with the highest combined expected benefit
![Page 13: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/13.jpg)
1. Decision threshold line2. Examples on the left are more profitable than those on the right3. "Evening effect": biases towards big fish.
Why accuracy is higher?Why accuracy is higher?
![Page 14: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/14.jpg)
More sophisticated combining More sophisticated combining approachesapproaches
ƒ Regression:Treat base classifiers' outputs as indepedent variables of regression and the true label as dependent variables.
ƒ Modify Meta-learning:Learning a classifier that maps the base classifiers' class label predictions to that the true class label.For cost-sensitive learning, the top level classifier output probability instead of just a label.
![Page 15: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/15.jpg)
ExperimentsExperiments
ƒ Learner: C4.5 version 8ƒ Dataset:
Donation (KDD98)Credit CardAdult
ƒ Number of partitions: 8,16,32,64,128,and 256
![Page 16: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/16.jpg)
Accuracy comparisionAccuracy comparision
![Page 17: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/17.jpg)
Accuracy comparisonAccuracy comparison
![Page 18: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/18.jpg)
Accuracy comparisonAccuracy comparison
![Page 19: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/19.jpg)
Detailed SpreadDetailed Spread
![Page 20: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/20.jpg)
Credit Card Fraud DatasetCredit Card Fraud Dataset
![Page 21: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/21.jpg)
Adult DatasetAdult Dataset
![Page 22: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/22.jpg)
Why accuracy is higher?Why accuracy is higher?
![Page 23: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/23.jpg)
Scalability Analysis of Scalability Analysis of Averaging MethodAveraging Method
ƒ Baseline: a single model that is computed from the entire dataset as a whole.
ƒ Our approach: ensemble of multiple models, each of which is computed from disjoint datasets.
![Page 24: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/24.jpg)
Scalability AnalysisScalability Analysis
ƒ Serial Improvment
ƒ Parallel Improvment
ƒ Speedup
ƒ Scaled Speedup
![Page 25: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/25.jpg)
Scalability Results - Serial Scalability Results - Serial ImprovementImprovement
![Page 26: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/26.jpg)
Scalability Results - Parallel Scalability Results - Parallel ImprovementImprovement
![Page 27: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/27.jpg)
Scalability Results - SpeedupScalability Results - Speedup
![Page 28: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/28.jpg)
D1 D2D2k sites
ML1ML2 MLt
C1 C2Ck
generate
k models
Fully distributed learning Fully distributed learning frameworkframework
![Page 29: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/29.jpg)
Communication overheadCommunication overhead
![Page 30: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/30.jpg)
Overhead analysisOverhead analysis
![Page 31: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/31.jpg)
Summary and Future WorkSummary and Future Work
ƒ Evaluated a wide range of combining techniques include variations of averaging, regression and meta-learning for scalable cost-sensitive (and cost-insensitive learning).
ƒ Averaging, although simple, has the highest accuracy.
ƒ Previously proposed approaches have significantly more overhead and only work well for tradtional accuracy-based problems.
ƒ Future work: ensemble pruning and performance estimation
![Page 32: A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore](https://reader035.vdocuments.site/reader035/viewer/2022062511/551515a35503465e608b4e4d/html5/thumbnails/32.jpg)
ƒ Suppose that is the probability that is an instance of class label .
ƒ An inductive model will always predict the label with the highest probability, i.e.,
ƒ The accuracy of a method on dataset is
Accuracy-based Problems (0-1 Accuracy-based Problems (0-1 loss)loss)