treatment learning: implementation and application

Treatment Learning:Implementation and Application

Ying Hu

Electrical & Computer Engineering

University of British Columbia

Ying Hu http://www.ece.ubc.ca/~yingh 2

Outline

1. An example2. Background Review3. TAR2 Treatment Learner

• TARZAN: Tim Menzies• TAR2: Ying Hu & Tim Menzies

4. TAR3: improved tar2• TAR3: Ying Hu

5. Evaluation of treatment learning6. Application of Treatment Learning7. Conclusion

First Impression

low high

6.7 <= rooms < 9.8 and12.6 <= parent teacher ratio < 15.9

0.6 <= nitric oxide < 1.9 and17.16 <= living standard < 39

• C4.5’s decision tree:• Treatment learner:

Boston Housing Dataset (506 examples, 4 classes)

Review: Background

What is KDD ? – KDD = Knowledge Discovery in Database [fayyad96]

– Data mining: one step in KDD process– Machine learning: learning algorithms

Common data mining tasks– Classification

• Decision tree induction (C4.5) [quinlan86]• Nearest neighbors [cover67]• Neural networks [rosenblatt62]• Naive Baye’s classifier [duda73]

– Association rule mining• APRIORI algorithm [agrawal93]• Variants of APRIORI

Treatment Learning: Definition– Input: classified dataset

• Assume: classes are ordered

– Output: Rx=conjunction of attribute-value pairs• Size of Rx = # of pairs in the Rx

– confidence(Rx w.r.t Class) = P(Class|Rx)– Goal: to find Rx that have different level of

confidence across classes– Evaluate Rx: lift– Visualization form of output

Motivation: Narrow Funnel Effect When is enough learning enough?

– Attributes: < 50%, accuracy: decrease 3-5% [shavlik91]

– 1-level decision tree is comparable to C4 [Holte93]

– Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97]

– Scheduling: random sampling outperforms complete search (depth-first) [crawford94]

Narrow funnel effect– Control variables vs. derived variables

– Treatment learning: finding funnel variables

TAR2: The Algorithm Search + attribute utility estimation

– Estimation heuristic: Confidence1

– Search: depth-first search• Search space: confidence1 > threshold

Discretization: equal width interval binning Reporting Rx

– Lift(Rx) > threshold Software package and online distribution

The Pilot Case Study Requirement optimization

– Goal: optimal set of mitigations in a cost effective manner

Mitigations

RequirementsCost

reduce

relates

Benefit

achieve

Iterative learning cycle

The Pilot Study (continue) Cost-benefit distribution (30/99 mitigations)

Compared to Simulated Annealing

Problem of TAR2 Runtime vs. Rx size

To generate Rx of size r: To generate Rx from size [1..N]

TAR3: the improvement

Random sampling– Key idea:

• Confidence1 distribution = probability distribution

• sample Rx from confidence1 distribution

– Steps:• Place item (ai) in increasing order according to

confidence1 value

• Compute CDF of each ai

• Sample a uniform value u in [0..1]

• The sample is the least ai whose CDF>u

– Repeat till we get a Rx of given size

Comparison of Efficiency Runtime vs. Data size

Runt i me vs. at t r i bute#

R2 = 0. 9436

10 20 30 40 50 60 70 80 90 99

at t r i bute#

Runt i me vs. Rx si ze

R2 = 0. 8836

1 2 3 4 5 6 7 8

Treatment si ze

Runtime vs. Rx size

Runtime vs. TAR2

Comparison of Results

Mean and STD in each round

Final Rx: TAR2=19, TAR3=20

10 UCI domains, identical best Rx

pilot2 dataset (58 * 30k )

External Evaluation

All attributes(10 UCI datasets)

learning

FSS framework

someattributes

learning

CompareAccuracy

C4.5Naive Bayes

Feature subset selectorTAR2less

The Results

Accuracy using Naïve Bayes

(Avg increase = 0.8% )

Number of attributes

Accuracy using C4.5(avg decrease 0.9%)

Compare to other FSS methods

# of attribute selected (C4.5 )

# of attribute selected (Naive Bayes)

17/20, fewest attributes selected Another evidence for funnels

Applications of Treatment Learning Downloading site: http://www.ece.ubc.ca/~yingh/ Collaborators: JPL, WV, Portland, Miami Application examples

– pair programming vs. conventional programming– identify software matrix that are superior error

indicators– identify attributes that make FSMs easy to test– find the best software inspection policy for a

particular software development organization Other applications:

– 1 journal, 4 conference, 6 workshop papers

Main Contributions

New learning approach A novel mining algorithm Algorithm optimization Complete package and online distribution Narrow funnel effect Treatment learner as FSS Application on various research domains

======================

Some notes follow

Rx Definition example Input example

– classified dataset

– Output example:

Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx)

TAR2 in practice Domains containing narrow funnels

– A tail in the confidence1 distribution– A small number of variables that have disproportionally

large confidence1 value– Satisfactory Rx of small size (<6)

Background: Classification

2-step procedure– The learning phase– The testing phase

Strategies employed– Eager learning

• Decision tree induction (e.g. C4.5)• Neural Networks (e.g. Backpropagation)

– Lazy learning• Nearest neighbor classifiers (e.g. K-nearest

neighbor classifier)

Background: Association Rule

Possible Rule:B => C,E[support=2%, confidence= 80%]

Wheresupport(X->Y) = P(X)confidence(X->Y) = P(Y|X)

Representative algorithms– APRIORI

• Apriori property of large itemset

– Max-Miner• More concise

representation of the discovered rules

• Different prune strategies.

ID Transactions

1 A, B, C,E,F

2 B,C,E

3 B,C,D,E

4 … …

Background: Extension

CBA classifier– CBA = Classification Based on Association– X=>Y, Y = class label– More accurate than C4.5 (16/26)

JEP classifier– JEP = Jumping Emerging Patterns

• Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0• Model: collection of JEPs• Classify: maximum collective impact

– More accurate than both C4.5 & CBA (15/25)

Background: Standard FSS Method

Information Gain attribute ranking Relief Principle Component Analysis (PCA) Correlation based feature selection Consistency based subset evaluation Wrapper subset evaluation

Comparison

Relation to classification– Class boundary / class density– Class weighting

Relation to association rule mining– Multiple classes / no class– Confidence-based pruning

Relation to change detecting algorithm– support: |P(X|y=c1)-P(X|y=c2)|– confidence: |P(y=c1|X)-P(y=c2|X)|– Baye’s rule

Confidence Property

Universal-extential upward closureR1: Age.young -> Salary.low

R2: Age.young, Gender.m -> Salary.low

R2: Age.young, Gender.f -> Salary.low Long rule tend to have high confidence Large Rx tend to have high lift value

TAR3: Usability

Usability: more user-friendly– Intuitive, default setting

treatment learning: implementation and application

classesevaluate rx

rx sizeto

resultsfinal rx

rx of size r

value pairssize of rx

rx confidencerx w

kdd processmachine learning

attribute selected c4

Documents

australian phytosanitary treatment application standard...

implementation of application

feldspar: application and implementation

application implementation methodology (aim)

hht - theory implementation and application

medication assisted treatment: program implementation...

the implementation of aptitude treatment interaction (ati

aim - oracle application implementation

implementation of taxi serice mobile application

implementation and application of vector form …

chapter model implementation and application

chapter 22 treatment design and implementation

design and implementation of sewage treatment control system

experience of implementation of standardized trauma...

application processing changes for aca implementation

venous thromboembolism treatment implementation guide

methadone maintenance treatment implementation indian...

hydrocyclone implementation at two wastewater treatment...

implementation of evidence-based treatment for ptsd

development, implementation and application of a