sensitivity and specificity in predictive modeling
Post on 11-Aug-2015
63 Views
Preview:
TRANSCRIPT
Sensitivity and Specificity in Predictive ModelingSarajit Poddar7 June 2015
Solving Workforce Problems using Analytics
Sensitivity
1. When a Predictive Model is applied on a Real Life data, Sensitivity is the Probability to “Selecting” up the Correct outcome.
2. For instance, a Predictive Model is developed to identify Higher Performer Employees who are likely to leave within 6 months, Sensitivity is the probability to “identifying someone who will actually leave”.
3. Sensitivity is also called the “True Positive Rate”.
Specificity
1. When applying the Predictive Model on a Real Life data, Specificity is the Probability to “Rejecting” up the Incorrect outcome.
2. For instance, when the Predictive Model of identifying High Performer attrition, Specificity is the probability to “not identifying someone who will not leave”.
3. Sensitivity is also called the “True Negative Rate”.
Trade-off between Sensitivity and Specificity1. Sensitivity: When we are too cautious with identifying the potential
leavers, we may end up including in out pool someone who will not leave. Thus, we will end up having a bigger pool of identified employees, than it should. If Organisation is devising initiatives for preventing the attrition, it may have to allocate more fund than required, to address this. Thus, while sensitivity is high, the specificity is low.
2. Specificity: When the organisation wants to restrict the pool size, it may have more stringent selection condition. While it will not select the “non-leavers”, it may also miss out on “potential leavers”. Thus, while specificity is high, the sensitivity is low.
“Thus one needs to judge, what is more important for addressing the issue at hand. If losing high-performing Sales
Employees is going to cost the company more (opportunity cost), perhaps
increasing sensitivity is going to be more effective.”
False Positive (Type 1 Error)
If the predictive algorithm ends up selecting a high performer who has no “flight risk”, this called “False
Positive”. It is “Positive” because, the selection action has happened. It is “False” because, the employee
selected does not belong to the Target group.
Target group = High performers having high “flight-risk”.
False Negative (Type 2 Error)
If the predictive algorithm fails to select a high performer who has significant “flight risk”, this called “False Negative”. It is “Negative” because, someone from the Target group is “not selected”. It is “False” because, the employee not-selected belongs to the
Target group.
Target group = High performers having high “flight-risk”.
Actual Positive Actual Negative
Test
Outc
om
eN
eg
ati
ve
True Positive
False Positive
False Negative
True Negative
Test
Outc
om
ePo
siti
ve
Re-visiting the Errors
Type 1 Error (False Positive)
Selecting a member outside the Target group
Relaxed selection algorithm, with large filters to allow someone outside the target group.
Type 2 Error (True Negative)
Failure to select a member within the Target group.
Stringent selection algorithm, with small filters to dis-allow someone even within the target group.
Applying the Concept to Talent Acquisitionin
Sensitivity & SpecificitySensitivity
Probability of “Selecting” High quality candidates.
Increasing Sensitivity can mean, relaxing the selection parameters, thus allowing selection of “poor quality candidates”.
Decreasing Sensitivity can mean putting stringent selection parameters, potentially losing out on “good quality candidates”
Specificity
Probability of “Not Selecting” Poor quality candidates.
Increasing Specificity can mean, putting stringent selection parameters, thus increasing the chance of rejecting “poor quality candidates”.
Decreasing Specificity can mean relaxing the selection parameters, thus failing to reject “poor quality candidates”
Type 1 and Type 2 Errors
Type 1 Error (False Positive)
Selecting “Poor quality candidates”.
Relaxed selection algorithm.
Type 2 Error (True Negative)
Rejecting “High Quality Candidates”.
Stringent selection algorithm.
Important Ratios
Important Ratios1. True positive rate (TPR), Sensitivity = Σ True
positive / Σ Condition positive
2. True negative rate (TNR), Specificity = Σ True negative / Σ Condition negative
3. False positive rate (FPR), Fall-out = Σ False positive / Σ Condition negative
4. False negative rate (FNR), Miss rate = Σ False negative / Σ Condition positive
5. Accuracy (ACC) = Σ True positive + Σ True negative / Σ Total population
6. Prevalence = Σ Condition positive / Σ Total population
7. Positive predictive value (PPV), Precision = Σ True positive / Σ Test Outcome Positive
8. False discovery rate (FDR) = Σ False positive / Σ Test Outcome Positive
9. False omission rate (FOR) = Σ False negative / Σ Test Outcome Negative
10.Negative predictive value (NPV) = Σ True negative / Σ Test Outcome Negative
11.Positive likelihood ratio (LR+) = TPR / FPR
12.Negative likelihood ratio (LR−) = FNR / TNR
13.Diagnostic odds ratio (DOR) = LR+ / LR−
Source: Wikipedia
Condition Positive
Condition NegativeTe
st O
utc
om
ePo
siti
ve
Test
Outc
om
e
Neg
ati
ve
10 200
90 600
Scenario: Suppose, out of 1000 sales employees, 100 are high performers. 10 among the high performers have left the company in last 6 months. While 200 among the remaining employees have left. If an Predictive Algorithm is built which can predict this, what are the various ratios?
HighPerformers
Not HighPerformers
Left
the
Com
pany
Sta
yed
in t
he
Com
pany
True positive rate (TPR), Sensitivity = Σ True positive / Σ Condition positive
= 10 / 100 = 0.1
True negative rate (TNR), Specificity = Σ True negative / Σ Condition negative
= 600 / 800 = 0.75
False positive rate (FPR), Fall-out = Σ False positive / Σ Condition negative
= 200 / 800 = 0.25
False negative rate (FNR), Miss rate = Σ False negative / Σ Condition positive
= 90 / 100 = 0.9
Accuracy (ACC) = (Σ True positive + Σ True negative) / Σ Total population
= 610 / 1000 = 0.61
Prevalence = Σ Condition positive / Σ Total population
= 100/ 1000 = 0.1
Positive predictive value (PPV), Precision = Σ True positive / Σ Test
Outcome Positive
= 10 / 210 = 0.04
False discovery rate (FDR) = Σ False positive / Σ Test Outcome Positive
= 200 / 210 = 0.1
False omission rate (FOR) = Σ False negative / Σ Test Outcome Negative
= 90 / 690 = 0.13
Negative predictive value (NPV) = Σ True negative / Σ Test Outcome Negative
= 600 / 690 = 0.87
Positive likelihood ratio (LR+) = TPR / FPR
= 0.1 / 0.25 = 0.4
Negative likelihood ratio (LR−) = FNR / TNR
= 0.9 / 0.75 = 1.33
Diagnostic odds ratio (DOR) = LR+ / LR−
= 0.4 / 1.33 = 0.30
True Positive
False Positive
False Negative
True Negative
Illustration
Thank you
top related