cooley kxen april 04
TRANSCRIPT
-
8/3/2019 Cooley KXEN April 04
1/28
Automation throughStructured Risk Minimization
Robert Cooley, Ph.D.
VP Technical OperationsKnowledge Extraction Engines (KXEN), Inc.
-
8/3/2019 Cooley KXEN April 04
2/28
Personal Motivation & BackgroundPersonal Motivation & Background
When the solution is simple, God is answering Albert Einstein
1991 Navy Nuclear Reactor Design
Emphasis on simple but highly functional design
The key is to automate data mining as much aspossible, but not more so
Gregory Piatetsky-Shapiro
1996 - Data mining research
Emphasis on automation of knowledge discovery
-
8/3/2019 Cooley KXEN April 04
3/28
Personal Motivation & BackgroundPersonal Motivation & Background
Solving a problem of interest, do not solve a moregeneral problem as an intermediate step
Vladimir Vapnik
1997 Structured Risk Minimization (SRM)
Emphasis on simplicity to manage generalization
1998 Support Vector Machines (SVM) for Text
Classification Hoping for better generalization, ended up with increased
automation
When all you have is a hammer, every problemlooks like a nail
Abraham Maslow
2001 Joined KXEN Automation through SRM
-
8/3/2019 Cooley KXEN April 04
4/28
Learning / GeneralizationLearning / Generalization
Over Fit Model/Low Robustness(No Training Error, High Test Error)Under Fit Model/High Robustness(Training Error = Test Error)
Robust Model
(Low Training ErrorLow Test Error)
-
8/3/2019 Cooley KXEN April 04
5/28
What is Robustness?What is Robustness?
Statistical Ability to generalize from a set oftraining examples
Are the available examples sufficient? Training Ability to handle a wide range of
situations Any number of variables, cardinality, missing values,
etc.
Deployment Ability to resist degradation underchanging conditions
Values not seen in training Engineering Ability to avoid catastrophic
failure Return an answer without crashing under challenging
conditions
-
8/3/2019 Cooley KXEN April 04
6/28
From W. ofFrom W. of OckhamOckham to Vapnikto Vapnik
If two models explain the data equally wellthen the simpler model is to be preferred
1st discovery: VC (Vapnik-Chervonenkis) dimension measures
the complexity of a set of mappings.
-
8/3/2019 Cooley KXEN April 04
7/28
-
8/3/2019 Cooley KXEN April 04
8/28
From W. ofFrom W. of OckhamOckham to Vapnikto Vapnik
If two models explain the data equally well then themodel with the smallest VC dimension has bettergeneralization performance
1st discovery: VC (Vapnik-Chervonenkis) dimension measures
the complexity of a set of mappings.2nd discovery: The VC dimension can be linked togeneralization results (results on new data).
3rd discovery: Dont just observe differences between models,control them.
-
8/3/2019 Cooley KXEN April 04
9/28
From W. ofFrom W. of OckhamOckham to Vapnikto Vapnik
To build the best model try to optimize the
performance on the training data set,AND minimize the VC dimension
1st discovery: VC (Vapnik-Chervonenkis) dimension measures
the complexity of a set of mappings.
2nd discovery: The VC dimension can be linked togeneralization results (results on new data).
3rd discovery3rd discovery: Dont just observe differences between models,control them.
-
8/3/2019 Cooley KXEN April 04
10/28
Structured Risk Minimization (SRM)Structured Risk Minimization (SRM)
Quality: How well does a model describe your existing data?
Achieved by minimizing Error.
Reliability: How well will a model predict future data?
Achieved by minimizing Confidence Interval.
Quality:Quality: How well does a model describe your existing data?How well does a model describe your existing data?
Achieved by minimizing Error.Achieved by minimizing Error.
Reliability:Reliability: How well will a model predict future data?How well will a model predict future data?
Achieved by minimizing Confidence Interval.Achieved by minimizing Confidence Interval.
Error
Risk
Best Model Total Risk
Conf. Interval
Model Complexity
-
8/3/2019 Cooley KXEN April 04
11/28
SRM is a Principle, NOT an AlgorithmSRM is a Principle, NOT an Algorithm
Robust RegressionRobust Regression Smart SegmenterSmart SegmenterConsistent CoderConsistent Coder Support VectorSupport Vector
MachineMachine
-
8/3/2019 Cooley KXEN April 04
12/28
Features of SRMFeatures of SRM
Adding Variables is Low Cost, Potentially High Benefit Does not cause over-fitting More variables can only add to the model quality
Random variables do not harm quality or reliability Highly correlated variables do not harm the modeling process Efficient scaling with additional variables
Adding Variables is Low Cost, Potentially High BenefitAdding Variables is Low Cost, Potentially High Benefit Does not cause overDoes not cause over--fittingfitting
More variables can only add to the model qualityMore variables can only add to the model quality
Random variables do not harm quality or reliabilityRandom variables do not harm quality or reliability Highly correlated variables do not harm the modeling processHighly correlated variables do not harm the modeling process
Efficient scaling with additional variablesEfficient scaling with additional variables
Free of Distribution Assumptions Normal distributions arent necessary Skewed distributions do not harm quality Resistant to outliers
Independence of inputs isnt necessary
Free of Distribution AssumptionsFree of Distribution Assumptions Normal distributions arenNormal distributions arent necessaryt necessary
Skewed distributions do not harm qualitySkewed distributions do not harm quality
Resistant to outliersResistant to outliers
Independence of inputs isnIndependence of inputs isnt necessaryt necessary
Indicates Generalization Capacity of any Model Insufficient training examples to get a robust model Choose between several models with similar training errors
Indicates Generalization Capacity of any ModelIndicates Generalization Capacity of any Model Insufficient training examples to get a robust modelInsufficient training examples to get a robust model
Choose between several models with similar training errorsChoose between several models with similar training errors
S
-
8/3/2019 Cooley KXEN April 04
13/28
SRM enables AutomationSRM enables Automation
without sacrificing Quality & Reliabilitywithout sacrificing Quality & Reliability
Create AnalyticDataset
Create AnalyticDataset Prepare Data
Prepare Data TrainModel
TrainModel
Validate &Understand
Validate &Understand
SelectVariables
SelectVariables
Analytic Dataset Creation (Lack of) Variable Selection
Data Preparation
Model Selection
Model Testing
Analytic Dataset CreationAnalytic Dataset Creation (Lack of) Variable Selection(Lack of) Variable Selection
Data PreparationData Preparation
Model SelectionModel Selection
Model TestingModel Testing
-
8/3/2019 Cooley KXEN April 04
14/28
Analytic Data Set CreationAnalytic Data Set Creation
Adding Variables is Low Cost, Potentially High BenefitAdding Variables is Low Cost, Potentially High BenefitAdding Variables is Low Cost, Potentially High Benefit
Kitchen Sink philosophy for variable creation
Dont shoe-horn several pieces of informationinto a single variable (e.g. RFM)
Break events out into several different time
periods A single analytic data set with hundreds or
thousands of columns can serve as input for
hundreds of different models
-
8/3/2019 Cooley KXEN April 04
15/28
Event Aggregation ExampleEvent Aggregation Example
Price
Customer 1
Time
AA
B
B
CC
AA
BBCC
Jan Feb Mar Apr May Jun
50
40
30
20
10
50
40
30
20
10
Customer 2 132Customer 2132Customer 1
RFM
RFM Aggregation
105111Customer 2
105111Customer 1
Totalpurchase
CBA
Simple Aggregation
$35
$25
Month5
$45
0
Month6
000$25
00$35$45
Month4
Month3
Month2
Month1
Relative Aggregation
1
0
B C
1
0
A B
1000
0111
C outB AC BA outTransitions
-
8/3/2019 Cooley KXEN April 04
16/28
Data PreparationData Preparation
Free of Distribution AssumptionsFree of Distribution AssumptionsFree of Distribution Assumptions
Different encodings of the same informationgenerally lead to the same predictive quality
No need to check for co-linearities
No need to check for random variables SRM can be used within the data preparation
process to automate binning of values
-
8/3/2019 Cooley KXEN April 04
17/28
BinningBinning
Nominal Group values
TV, Chair, VCR, Lamp, Sofa [TV,VCR],[Chair, Sofa], [Lamp]
Ordinal Group values, preserving order 1, 2, 3, 4, 5 [1, 2, 3], [4, 5]
Continuous Group values into ranges 1.5, 2.6, 5.2, 12.0, 13.1 [1.5 5.2], [12.0 13.1]
-
8/3/2019 Cooley KXEN April 04
18/28
Why Bin Variables?Why Bin Variables?
Performance
Captures non-linear behavior of continuousvariables
Minimizes impact of outliers
Removes noise from large numbers of distinct
values
Explainability
Grouped values are easier to display and
understand Speed
Predictive algorithms get faster as the number of
distinct values decreases
-
8/3/2019 Cooley KXEN April 04
19/28
Binning StrategiesBinning Strategies
None Leave each distinct value as a
separate input
Standardized Group values according to
preset ranges Age: [10 19], [20 29], [30 39],
Zip Code: [94100 94199], [94200 94299],
Target Based Use advanced techniques tofind the right bins for a given question.DEPENDS ON THE QUESTION!
-
8/3/2019 Cooley KXEN April 04
20/28
No BinningNo Binning
Customer Value by Age
0
1
2
3
4
56
7
8
910
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
-
8/3/2019 Cooley KXEN April 04
21/28
Standardized BinningStandardized Binning
Customer Value by Age
0
12
3
45
6
78
18 - 25 26 - 35 36 - 45
-
8/3/2019 Cooley KXEN April 04
22/28
Target Based BinningTarget Based Binning
Customer Value by Age
0
1
2
3
4
5
6
7
89
18 - 28 29 - 39 40 - 45
Predictive Benefit ofPredictive Benefit of
-
8/3/2019 Cooley KXEN April 04
23/28
Predictive Benefit ofPredictive Benefit of
Target Based BinningTarget Based Binning
KXEN Approximation - 20 Bins
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-200 0 200 400 600 800 1000 1200
Sine Wave Training Dataset
5
5
0
5
1
5
2
0 200 400 600 800 1000 1200
X
-2
-1.
-1
-0.
0.
1.
-200
Y
M d l T iM d l T ti
-
8/3/2019 Cooley KXEN April 04
24/28
Model TestingModel Testing
Indicates Generalization Capacity of any ModelIndicates Generalization Capacity of any ModelIndicates Generalization Capacity of any Model
It is not always possible to get a robustmodel from a limited set of data
Determine if a given number of trainingexamples is sufficient
Determine amount of model degradationover time
T diti l M d li M th d lT diti l M d li M th d l
-
8/3/2019 Cooley KXEN April 04
25/28
Traditional Modeling MethodologyTraditional Modeling Methodology
CRISP-DM
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
SEMMA
Sample Explore
Modify
Model Assess
SRM M d li M th d lSRM M d li M th d l
-
8/3/2019 Cooley KXEN April 04
26/28
SRM Modeling MethodologySRM Modeling Methodology
1. Business Understanding2. Create Analytic Dataset
3. Modeling
4. Data Understanding5. Deployment
6. Maintenance
C l iConclusions
-
8/3/2019 Cooley KXEN April 04
27/28
ConclusionsConclusions
SRM is a general principle that can be
applied to all phases of data mining
The main benefit of applying SRM is
increased automation, not increasedpredictive quality
In order to take full advantage of thebenefits, the overall modeling methodologymust be modified
Thank YouThank You
-
8/3/2019 Cooley KXEN April 04
28/28
Thank YouThank You
Questions?
Contact Information:[email protected]
651-338-3880
Additional KXEN Information:
www.kxen.com
5/25 NYC Discovery Workshop, 9am to 1pm
mailto:[email protected]://www.kxen.com/http://www.kxen.com/mailto:[email protected]