design and analysis of experimentscse.iitrpr.ac.in › ckn › courses › s2020 › cs503 ›...
TRANSCRIPT
Outline
• Measures for evaluation• Experimental design• Estimating the generalized performance
• Hypothesis testing• Interval estimation• Confidence intervals
Design And Analysis of Experiments CS503 - Machine Learning 2
Confusion Matrix (1)
• 2-Class Scenario
Design And Analysis of Experiments CS503 - Machine Learning 3
positive negative Total
positive True positive ( 𝑡𝑝 )
False negative( 𝑓𝑛 ) ( 𝑝 )
negative False positive( 𝑓𝑝 )
True negative( 𝑡𝑛 ) ( 𝑛 )
Total ( 𝑝′ ) ( 𝑛′ ) 𝑁
Confusion Matrix (2)
• K-Class Scenario
Design And Analysis of Experiments CS503 - Machine Learning 4
Performance Measures
• Error: 𝑓𝑝 + 𝑓𝑛 /𝑁• Accuracy: 𝑡𝑝 + 𝑡𝑛 /𝑁• tp-rate: 𝑡𝑝 /𝑝• fp-rate: 𝑓𝑝 /𝑛• Precision: 𝑡𝑝 /𝑝′• Recall: 𝑡𝑝 /𝑝• Sensitivity: 𝑡𝑝 /𝑝• Specificity: 𝑡𝑛 /𝑛• F Measure: !×#$%&'(')*×$%&+,,
#$%&'(')*-$%&+,,
Design And Analysis of Experiments CS503 - Machine Learning 5
positive negative Total
positive True positive ( 𝑡𝑝 )
False negative( 𝑓𝑛 ) ( 𝑝 )
negative False positive( 𝑓𝑝 )
True negative( 𝑡𝑛 ) ( 𝑛 )
Total ( 𝑝′ ) ( 𝑛′ ) 𝑁
Receiver Operating Characteristic
• Classification error is inefficient when• Costs associated with false positives and negative errors• Class distributions are skewed
• ROC - Assess predictive behavior that is independent of error costs or class distributions• Origin from signal detection theory
• Assume a classifier that uses a threshold to determine the class label• classify 𝑥 as positive if 𝑃 𝑦|𝑥 ≥ 𝜃• The number of true and false positives depend on 𝜃
Design And Analysis of Experiments CS503 - Machine Learning 6
Example (1)
𝑖 𝒚𝒊 𝑷 0 0.5 12 1 0.9 1 1 05 1 0.9 1 1 03 1 0.7 1 1 08 1 0.6 1 1 01 1 0.5 1 1 04 0 0.4 1 0 09 0 0.3 1 0 06 0 0.2 1 0 07 0 0.1 1 0 0
Design And Analysis of Experiments CS503 - Machine Learning 7
0 1
1
Example (1)
𝑖 𝒚𝒊 𝑷 0 0.5 12 1 0.9 1 1 05 1 0.9 1 1 03 1 0.7 1 1 08 1 0.6 1 1 01 1 0.5 1 1 04 0 0.4 1 0 09 0 0.3 1 0 06 0 0.2 1 0 07 0 0.1 1 0 0
Design And Analysis of Experiments CS503 - Machine Learning 8
0 1
1
Example (2)
𝑖 𝒚𝒊 𝑷 0 0.5 12 1 0.9 1 1 05 1 0.9 1 1 03 1 0.7 1 1 08 1 0.6 1 1 01 1 0.2 1 0 04 0 0.6 1 1 09 0 0.3 1 0 06 0 0.2 1 0 07 0 0.1 1 0 0
Design And Analysis of Experiments CS503 - Machine Learning 9
0 1
1
Receiver Operating Characteristic Curve
Design And Analysis of Experiments CS503 - Machine Learning 10
Domination in ROC Space
• Learner L1 dominates L2 if L1’s ROC curve is always above L2’s curve• If L1 dominates L2, then L1 is better than L2 for all possible error
costs and class distributions• If neither dominates (L2 and L3), then different classifiers are better
under different conditions
Design And Analysis of Experiments CS503 - Machine Learning 11
Quantitative Measure from ROC Curve
• Area Under the (ROC) Curve
Design And Analysis of Experiments CS503 - Machine Learning 12
Generating ROC Curve (1)
• Assume classifier outputs 𝑃(𝑦|𝑥) instead of just 𝑦 (the predicted class for instance 𝑥)• Let 𝜃 be a threshold such that if 𝑃 𝑦 𝑥 > 𝜃 , then 𝑥 is classified as 𝑦,
else not 𝑦• Compute fp-rate and tp-rate for different values of 𝜃 from 0 to 1• Plot each (fp-rate, tp-rate) and interpolate (or convex hull)• If multiple points have same fp-rate, then average tp-rates (k-fold
cross-validation)
Design And Analysis of Experiments CS503 - Machine Learning 13
Other Performance Measures
• Training time and space complexity• Testing time and space complexity• Interpretability of the model
Design And Analysis of Experiments CS503 - Machine Learning 14
Evaluating the Hypothesis (1)
• Can we make any conclusion about the generalization performance of the classifier based on the training set?• How about the validation set?• Could be biased if the validation set is used for
• Choosing the classifier (over another)• Parameter tuning
• Need another test set that is ‘truly’ unseen during training/tuning• Options are limited with small amount of training data
Design And Analysis of Experiments CS503 - Machine Learning 15
Evaluating the Hypothesis (2)
• Two main difficulties• Bias in the estimate - performance of the learned hypothesis on the training
set is optimistically biased• Variance in the estimate
• Performance estimated on unseen test set is unbiased• However estimate can still vary from the true performance depending on the make up of
the test set.
• Interested in the minimum variance unbiased estimate of the generalization performance.
Design And Analysis of Experiments CS503 - Machine Learning 16
Experimental Design (1)
• Train/Test Split• Given dataset 𝐼 = 𝑥' , 𝑦' '()
*
• Perform 𝐾 random trials, where for each trial• Randomly split 𝐼 into a training set (2/3rd) and testing set (1/3rd)• Learn a classifier on the training set• Compute the performance (error) on the test set
• Compute the average performance (error) over the 𝐾 trials• Problem?
Design And Analysis of Experiments CS503 - Machine Learning 17
Experimental Design (2)
• Train/Test Split• Given dataset 𝐼 = 𝑥' , 𝑦' '()
*
• Perform 𝐾 random trials, where for each trial• Randomly split 𝐼 into a training set (2/3rd) and testing set (1/3rd)• Learn a classifier on the training set• Compute the performance (error) on the test set
• Compute the average performance (error) over the 𝐾 trials• Problem?• Train and test sets overlap between trials - bias the result
Design And Analysis of Experiments CS503 - Machine Learning 18
Experimental Design (3)
• 𝐾 -Fold Cross Validation• Given dataset 𝐼 = 𝑥' , 𝑦' '()
*
• Partition 𝐼 into 𝐾 disjoint subsets - 𝐼), 𝐼+, … , 𝐼,• For 𝑘 = 1:𝐾 trials
• Learn the classifier on the training set 𝐼 − 𝐼!• Compute the performance on the test set 𝐼!
• Computer average performance over the 𝐾 trials• A better estimate of generalization performance
• Test sets do not overlap• Stratification
• Distribution of classes in training and testing sets should be the same as in original dataset• When size of 𝐼 is very small
• Leave one out cross validation - 𝐾 = 𝑁
Design And Analysis of Experiments CS503 - Machine Learning 19
Experimental Design (4)
• Bootstrapping• If not enough data for 𝐾 -Fold Cross Validation• Generate multiple sets of size 𝑁 from 𝐼 by sampling with replacement• Each set has approximately 63% of the examples in 𝐼• Compute average error over all samples
Design And Analysis of Experiments CS503 - Machine Learning 20
Interval Estimation (1)
• Estimate the mean 𝜇 of a normal distribution 𝒩 𝜇, 𝜎+
• Given a set 𝐼 = 𝑥' '()*
• Estimate
𝑚 =1𝑁3'()
*
𝑥'
• where m~𝒩 𝜇, 𝜎+/𝑁• Define statistic 𝑍 with a unit normal distribution 𝒩 0, 1
𝑚 − 𝜇𝜎/ 𝑁
~𝑍
Design And Analysis of Experiments CS503 - Machine Learning 21
Unit Normal Distribution
• 95% of 𝑍 lies in −1.96, 1.96• 99% of 𝑍 lies in −2.58, 2.58• …• Therefore, 𝑃 −1.96 < 𝑍 < 1.96 = 0.95
• Two-sided confidence interval
Design And Analysis of Experiments CS503 - Machine Learning 22
Interval Estimation (2)𝑃 −1.96 < 𝑍 < 1.96 = 0.95
𝑃 −1.96 < 𝑁𝑚 − 𝜇𝜎
< 1.96 = 0.95
𝑃 𝑚 − 1.96𝜎𝑁< 𝜇 < 𝑚 + 1.96
𝜎𝑁
= 0.95
𝑃 𝑚 − 𝑧,/+𝜎𝑁< 𝜇 < 𝑚 + 𝑧,/+
𝜎𝑁
= 1 − 𝛼
Design And Analysis of Experiments CS503 - Machine Learning 23
𝑧"/$ 1 − 𝛼
2.58 0.99
2.33 0.98
1.96 0.95
1.64 0.90
1.28 0.80
Two-Sided Vs One-Sided Confidence Interval
Design And Analysis of Experiments CS503 - Machine Learning 24
𝑧" 2.33 1.64 1.28
1 − 𝛼 0.99 0.95 0.90
𝑃 𝑚 − 1.64𝜎𝑁< 𝜇 = 0.95
𝑃 𝑚 − 𝑧"𝜎𝑁< 𝜇 = 1 − 𝛼
Interval Estimation (3)
• Previous analysis required us to know 𝜎+• But typically this is unknown
• Instead, we can use sample variance 𝑆+
𝑆+ =1
𝑁 − 13'()
*
𝑥' −𝑚 +
• When 𝑥'~𝒩 𝜇, 𝜎+ , then 𝑁 − 1 𝑆+/𝜎+ is chi-squared with 𝑁–1degrees of freedom• 𝑁 𝑚 − 𝜇 /𝑆 is t-distributed with 𝑁–1 degrees of freedom
Design And Analysis of Experiments CS503 - Machine Learning 25
Student’s t-distribution
• Similar to normal distribution, but with larger spread (heavier tails)• It includes the additional
uncertainty with using sample variance• 𝑑𝑓 → ∞, it becomes a normal
distribution
Design And Analysis of Experiments CS503 - Machine Learning 26
Interval Estimation (4)
• When population variance 𝜎+ is unknown, we can use the student’s t distribution to obtain the interval
𝑆+ = )*.)
∑'()* 𝑥' −𝑚 +, 𝑁 𝑚 − 𝜇 /𝑆 ~𝑡*.)• So a two-sided confidence interval estimate would be of the form
𝑃 𝑚 − 𝑡,/+,*.)𝑆𝑁< 𝜇 < 𝑚 + 𝑡,/+,*.)
𝑆𝑁
= 1 − 𝛼
Design And Analysis of Experiments CS503 - Machine Learning 27
Interval Estimation (5)• 𝑚 = 3• 𝑆+ = 0.022 , 𝑆 = 0.149• 𝛼 = 0.05, 𝑑𝑓 = 𝑁 − 1 = 9• 𝑡0.0+2,3 = 2.685• 𝑃 3 − 0.127 < 𝜇 < 3 + 0.127 = 0.95• 𝑃 2.873 < 𝜇 < 3.217 = 0.95
Design And Analysis of Experiments CS503 - Machine Learning 28
𝑖 𝑥%1 3.02 3.13 3.24 2.85 2.96 3.17 3.28 2.89 2.910 3.0
Hypothesis Testing (1)
• Want to claim a hypothesis 𝐻)• E.g., 𝐻6 : error7 ℎ < 0.10
• Define the opposite of 𝐻) to be the null hypothesis 𝐻0• E.g., 𝐻8 : error7 ℎ ≥ 0.10
• Perform experiment collecting data about error4 ℎ• With what probability can we reject 𝐻0?
Design And Analysis of Experiments CS503 - Machine Learning 29
Hypothesis Testing (2)
• Sample 𝐼 = 𝑥3 3456 ~𝒩 𝜇, 𝜎!
• Estimate the sample mean 𝑚 = 56∑3456 𝑥3
• Want to test if 𝑚 is not equal to some constant 𝜇7• Null hypothesis - 𝐻7: 𝜇 = 𝜇7• Alternative hypothesis - 𝐻5: 𝜇 ≠ 𝜇7• Reject 𝐻7 if 𝑚 too far from 𝜇7
• We fail to reject 𝐻- with level of significance 𝛼 if 𝜇" lies in the 1 − 𝛼 confidence interval:
𝑁 𝑚 − 𝜇-𝜎 ∈ −𝑧./+, 𝑧./+
• We reject 𝐻- if it falls outside this interval on either side (two-sided test)
Design And Analysis of Experiments CS503 - Machine Learning 30
Hypothesis Testing (3)
• Sample 𝐼 = 𝑥' '()* ~𝒩 𝜇, 𝜎+
• Estimate the sample mean 𝑚 = )*∑'()* 𝑥'
• Null hypothesis - 𝐻0: 𝜇 ≤ 𝜇0• Alternative hypothesis - 𝐻): 𝜇 > 𝜇0• Reject 𝐻0• We fail to reject 𝐻8 with level of significance 𝛼 if 𝜇! lies in the 1 − 𝛼
confidence interval:𝑁 𝑚 − 𝜇8
𝜎∈ −∞, 𝑧9
• We reject 𝐻8 if it falls outside this interval (one-sided test)
Design And Analysis of Experiments CS503 - Machine Learning 31
Hypothesis Testing (4)
• Sample 𝐼 = 𝑥3 3456 ~𝒩 𝜇, 𝜎!
• Estimate the sample mean 𝑚 = 56∑3456 𝑥3
• Want to test if 𝜇 is not equal to some constant 𝜇7• variance 𝜎+ is unknown, use sample variance 𝑆+
• Null hypothesis - 𝐻7: 𝜇 ≤ 𝜇7• Alternative hypothesis - 𝐻5: 𝜇 > 𝜇7• Reject 𝐻7
• We fail to reject 𝐻- with level of significance 𝛼 if 𝜇" lies in the 1 − 𝛼 confidence interval:
𝑁 𝑚 − 𝜇-𝑆 ∈ −∞, 𝑡.,*1)
• We reject 𝐻- if it falls outside this interval (one-sided test)
Design And Analysis of Experiments CS503 - Machine Learning 32
Hypothesis Testing (5)• 𝑚 = 3, 𝑆+ = 0.022 , 𝑆 = 0.149• 𝜇0 = 2.9• 𝐻): 𝜇 > 2.9, 𝐻0: 𝜇 ≤ 2.9• 𝛼 = 0.05, 𝑑𝑓 = 𝑁 − 1 = 9• 𝑡0.02,3 = 1.833
• * 6.7!8
= 2.121 ∉ −∞, 1.833• Therefore, reject the null hypothesis• Accept the alternate hypothesis
Design And Analysis of Experiments CS503 - Machine Learning 33
𝑖 𝑥%1 3.02 3.13 3.24 2.85 2.96 3.17 3.28 2.89 2.910 3.0
Estimating Classifier Error
• Learn classifier on the training set• Test classifier on the test set 𝑉 of size 𝑁• Assume probability 𝑝 of error by the classifier• 𝑋 = number of errors made by the classifier on 𝑉• 𝑋 is described by binomial distribution
𝑃 𝑋 = 𝑗 = 𝑁𝑗 𝑝9 1 − 𝑝 *.9
Design And Analysis of Experiments CS503 - Machine Learning 34
Binomial Test
• Test whether the error probability 𝑝 is less than or equal to some value 𝑝0.• Null hypothesis - 𝐻0: 𝑝 ≤ 𝑝0• Alternative hypothesis - 𝐻): 𝑝 > 𝑝0• Reject 𝐻0 with significance 𝛼 if
𝑃 𝑋 ≥ 𝑒 =3:(;
*𝑁𝑟 𝑝0: 1 − 𝑝0 *.: < 𝛼
• Where 𝑒 = 𝑝0𝑁
Design And Analysis of Experiments CS503 - Machine Learning 35
Approximate Normal Test
• Approximating 𝑋 with normal distribution• 𝑋 is sum of 𝑁 independent random variables from the same distribution• 𝑋/𝑁 is approximately normal for large 𝑁 with mean 𝑝- and variance 𝑝- 1 − 𝑝- /𝑁
(central limit theorem)𝑁(𝑋/𝑁 − 𝑝7)𝑝7 1 − 𝑝7
~𝑍
• Fail to reject 𝐻-: 𝑝 ≤ 𝑝-with significance 𝛼 if 𝑁(𝑋/𝑁 − 𝑝-)𝑝- 1 − 𝑝-
∈ −∞, 𝑧.
• Reject 𝐻- if outside• Works well for 𝑁 not too small and 𝑝 is not very close to 0 or 1
Design And Analysis of Experiments CS503 - Machine Learning 36
Example (1)
• Let 𝑁 = 40, 𝑋 = 12,𝑚 = 0.3• Set 𝑝0 = 0.2, 𝛼 = 0.05• Alternate Hypothesis: 𝐻): 𝑝 > 𝑝0• Null Hypothesis: 𝐻0: 𝑝 ≤ 𝑝0• Compute
𝑁(𝑋/𝑁 − 𝑝0)𝑝0 1 − 𝑝0
= 1.58 𝑧0.02 = 1.64
• 1.58 ∈ −∞, 1.64• Therefore fail to reject 𝐻0Design And Analysis of Experiments CS503 - Machine Learning 37
t-Test
• So far we have looked at single validation set.• Suppose do a k-fold cross validation• 𝐾 – error percentages 𝑝:, 1 ≤ 𝑖 ≤ 𝐾
𝑚 =1𝐾@:;6
<
𝑝: , 𝑆= =1
𝐾 − 1@:;6
<
𝑝: −𝑚 =
• Hence𝐾 𝑚 − 𝑝0 /𝑆~𝑡<.)
• Reject the null hypothesis with significance 𝛼 if this value is greater than 𝑡,,<.)
Design And Analysis of Experiments CS503 - Machine Learning 38
Comparing two learners
• K-fold cross-validated paired t test• Paired test: Both learners get the same train/test sets• Use k-fold cross validation to get the 𝐾 training/test sets.• Errors of learners 1 and 2 on fold 𝑖 - 𝑝35, 𝑝3!
• Paired difference on fold 𝑖 - 𝑝3 = 𝑝35 − 𝑝3!• Null hypothesis is whether 𝑝3 has mean 0
𝑚 =1𝐾='()
,
𝑝' , 𝑆+ =1
𝐾 − 1='()
,
𝑝' −𝑚 +
• Hence𝐾 𝑚 − 0 /𝑆~𝑡9:5
Design And Analysis of Experiments CS503 - Machine Learning 39
Summary
• Measures for evaluation• Experimental design• Estimating the generalized performance
• Hypothesis testing• Interval estimation• Confidence intervals
Design And Analysis of Experiments CS503 - Machine Learning 40