treatment learning: implementation and application
Post on 09-Jan-2016
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Treatment Learning:Implementation and Application
Ying Hu
Electrical & Computer Engineering
University of British Columbia
Ying Hu http://www.ece.ubc.ca/~yingh 2
Outline
1. An example2. Background Review3. TAR2 Treatment Learner
• TARZAN: Tim Menzies• TAR2: Ying Hu & Tim Menzies
4. TAR3: improved tar2• TAR3: Ying Hu
5. Evaluation of treatment learning6. Application of Treatment Learning7. Conclusion
Ying Hu http://www.ece.ubc.ca/~yingh 3
First Impression
low high
6.7 <= rooms < 9.8 and12.6 <= parent teacher ratio < 15.9
0.6 <= nitric oxide < 1.9 and17.16 <= living standard < 39
• C4.5’s decision tree:• Treatment learner:
Boston Housing Dataset (506 examples, 4 classes)
Ying Hu http://www.ece.ubc.ca/~yingh 4
Review: Background
What is KDD ? – KDD = Knowledge Discovery in Database [fayyad96]
– Data mining: one step in KDD process– Machine learning: learning algorithms
Common data mining tasks– Classification
• Decision tree induction (C4.5) [quinlan86]• Nearest neighbors [cover67]• Neural networks [rosenblatt62]• Naive Baye’s classifier [duda73]
– Association rule mining• APRIORI algorithm [agrawal93]• Variants of APRIORI
Ying Hu http://www.ece.ubc.ca/~yingh 5
Treatment Learning: Definition– Input: classified dataset
• Assume: classes are ordered
– Output: Rx=conjunction of attribute-value pairs• Size of Rx = # of pairs in the Rx
– confidence(Rx w.r.t Class) = P(Class|Rx)– Goal: to find Rx that have different level of
confidence across classes– Evaluate Rx: lift– Visualization form of output
Ying Hu http://www.ece.ubc.ca/~yingh 6
Motivation: Narrow Funnel Effect When is enough learning enough?
– Attributes: < 50%, accuracy: decrease 3-5% [shavlik91]
– 1-level decision tree is comparable to C4 [Holte93]
– Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97]
– Scheduling: random sampling outperforms complete search (depth-first) [crawford94]
Narrow funnel effect– Control variables vs. derived variables
– Treatment learning: finding funnel variables
Ying Hu http://www.ece.ubc.ca/~yingh 7
TAR2: The Algorithm Search + attribute utility estimation
– Estimation heuristic: Confidence1
– Search: depth-first search• Search space: confidence1 > threshold
Discretization: equal width interval binning Reporting Rx
– Lift(Rx) > threshold Software package and online distribution
Ying Hu http://www.ece.ubc.ca/~yingh 8
The Pilot Case Study Requirement optimization
– Goal: optimal set of mitigations in a cost effective manner
Risks
Mitigations
RequirementsCost
reduce
relates
Benefit
incur
achieve
Iterative learning cycle
Ying Hu http://www.ece.ubc.ca/~yingh 9
The Pilot Study (continue) Cost-benefit distribution (30/99 mitigations)
Compared to Simulated Annealing
Ying Hu http://www.ece.ubc.ca/~yingh 10
Problem of TAR2 Runtime vs. Rx size
To generate Rx of size r: To generate Rx from size [1..N]
Ying Hu http://www.ece.ubc.ca/~yingh 11
TAR3: the improvement
Random sampling– Key idea:
• Confidence1 distribution = probability distribution
• sample Rx from confidence1 distribution
– Steps:• Place item (ai) in increasing order according to
confidence1 value
• Compute CDF of each ai
• Sample a uniform value u in [0..1]
• The sample is the least ai whose CDF>u
– Repeat till we get a Rx of given size
Ying Hu http://www.ece.ubc.ca/~yingh 12
Comparison of Efficiency Runtime vs. Data size
Runt i me vs. at t r i bute#
R2 = 0. 9436
0
5
10
15
20
25
30
10 20 30 40 50 60 70 80 90 99
at t r i bute#
Runt
ime
(sec
)
Runt i me vs. Rx si ze
R2 = 0. 8836
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8
Treatment si ze
Runt
ime
(sec
)
Runtime vs. Rx size
Runtime vs. TAR2
Ying Hu http://www.ece.ubc.ca/~yingh 13
Comparison of Results
Mean and STD in each round
Final Rx: TAR2=19, TAR3=20
10 UCI domains, identical best Rx
pilot2 dataset (58 * 30k )
Ying Hu http://www.ece.ubc.ca/~yingh 14
External Evaluation
All attributes(10 UCI datasets)
learning
FSS framework
someattributes
learning
CompareAccuracy
C4.5Naive Bayes
Feature subset selectorTAR2less
Ying Hu http://www.ece.ubc.ca/~yingh 15
The Results
Accuracy using Naïve Bayes
(Avg increase = 0.8% )
Number of attributes
Accuracy using C4.5(avg decrease 0.9%)
Ying Hu http://www.ece.ubc.ca/~yingh 16
Compare to other FSS methods
# of attribute selected (C4.5 )
# of attribute selected (Naive Bayes)
17/20, fewest attributes selected Another evidence for funnels
Ying Hu http://www.ece.ubc.ca/~yingh 17
Applications of Treatment Learning Downloading site: http://www.ece.ubc.ca/~yingh/ Collaborators: JPL, WV, Portland, Miami Application examples
– pair programming vs. conventional programming– identify software matrix that are superior error
indicators– identify attributes that make FSMs easy to test– find the best software inspection policy for a
particular software development organization Other applications:
– 1 journal, 4 conference, 6 workshop papers
Ying Hu http://www.ece.ubc.ca/~yingh 18
Main Contributions
New learning approach A novel mining algorithm Algorithm optimization Complete package and online distribution Narrow funnel effect Treatment learner as FSS Application on various research domains
Ying Hu http://www.ece.ubc.ca/~yingh 19
======================
Some notes follow
Ying Hu http://www.ece.ubc.ca/~yingh 20
Rx Definition example Input example
– classified dataset
– Output example:
Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx)
Ying Hu http://www.ece.ubc.ca/~yingh 21
TAR2 in practice Domains containing narrow funnels
– A tail in the confidence1 distribution– A small number of variables that have disproportionally
large confidence1 value– Satisfactory Rx of small size (<6)
Ying Hu http://www.ece.ubc.ca/~yingh 22
Background: Classification
2-step procedure– The learning phase– The testing phase
Strategies employed– Eager learning
• Decision tree induction (e.g. C4.5)• Neural Networks (e.g. Backpropagation)
– Lazy learning• Nearest neighbor classifiers (e.g. K-nearest
neighbor classifier)
Ying Hu http://www.ece.ubc.ca/~yingh 23
Background: Association Rule
Possible Rule:B => C,E[support=2%, confidence= 80%]
Wheresupport(X->Y) = P(X)confidence(X->Y) = P(Y|X)
Representative algorithms– APRIORI
• Apriori property of large itemset
– Max-Miner• More concise
representation of the discovered rules
• Different prune strategies.
ID Transactions
1 A, B, C,E,F
2 B,C,E
3 B,C,D,E
4 … …
Ying Hu http://www.ece.ubc.ca/~yingh 24
Background: Extension
CBA classifier– CBA = Classification Based on Association– X=>Y, Y = class label– More accurate than C4.5 (16/26)
JEP classifier– JEP = Jumping Emerging Patterns
• Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0• Model: collection of JEPs• Classify: maximum collective impact
– More accurate than both C4.5 & CBA (15/25)
Ying Hu http://www.ece.ubc.ca/~yingh 25
Background: Standard FSS Method
Information Gain attribute ranking Relief Principle Component Analysis (PCA) Correlation based feature selection Consistency based subset evaluation Wrapper subset evaluation
Ying Hu http://www.ece.ubc.ca/~yingh 26
Comparison
Relation to classification– Class boundary / class density– Class weighting
Relation to association rule mining– Multiple classes / no class– Confidence-based pruning
Relation to change detecting algorithm– support: |P(X|y=c1)-P(X|y=c2)|– confidence: |P(y=c1|X)-P(y=c2|X)|– Baye’s rule
Ying Hu http://www.ece.ubc.ca/~yingh 27
Confidence Property
Universal-extential upward closureR1: Age.young -> Salary.low
R2: Age.young, Gender.m -> Salary.low
R2: Age.young, Gender.f -> Salary.low Long rule tend to have high confidence Large Rx tend to have high lift value
Ying Hu http://www.ece.ubc.ca/~yingh 28
TAR3: Usability
Usability: more user-friendly– Intuitive, default setting
top related