sensitivity and specificity in predictive modeling

Sensitivity and Specificity in Predictive ModelingSarajit Poddar7 June 2015

Solving Workforce Problems using Analytics

Sensitivity

1. When a Predictive Model is applied on a Real Life data, Sensitivity is the Probability to “Selecting” up the Correct outcome.

2. For instance, a Predictive Model is developed to identify Higher Performer Employees who are likely to leave within 6 months, Sensitivity is the probability to “identifying someone who will actually leave”.

3. Sensitivity is also called the “True Positive Rate”.

Specificity

1. When applying the Predictive Model on a Real Life data, Specificity is the Probability to “Rejecting” up the Incorrect outcome.

2. For instance, when the Predictive Model of identifying High Performer attrition, Specificity is the probability to “not identifying someone who will not leave”.

3. Sensitivity is also called the “True Negative Rate”.

Trade-off between Sensitivity and Specificity1. Sensitivity: When we are too cautious with identifying the potential

leavers, we may end up including in out pool someone who will not leave. Thus, we will end up having a bigger pool of identified employees, than it should. If Organisation is devising initiatives for preventing the attrition, it may have to allocate more fund than required, to address this. Thus, while sensitivity is high, the specificity is low.

2. Specificity: When the organisation wants to restrict the pool size, it may have more stringent selection condition. While it will not select the “non-leavers”, it may also miss out on “potential leavers”. Thus, while specificity is high, the sensitivity is low.

“Thus one needs to judge, what is more important for addressing the issue at hand. If losing high-performing Sales

Employees is going to cost the company more (opportunity cost), perhaps

increasing sensitivity is going to be more effective.”

False Positive (Type 1 Error)

If the predictive algorithm ends up selecting a high performer who has no “flight risk”, this called “False

Positive”. It is “Positive” because, the selection action has happened. It is “False” because, the employee

selected does not belong to the Target group.

Target group = High performers having high “flight-risk”.

False Negative (Type 2 Error)

If the predictive algorithm fails to select a high performer who has significant “flight risk”, this called “False Negative”. It is “Negative” because, someone from the Target group is “not selected”. It is “False” because, the employee not-selected belongs to the

Target group.

Target group = High performers having high “flight-risk”.

Actual Positive Actual Negative

Test

Outc

om

eN

eg

ati

ve

True Positive

False Positive

False Negative

True Negative

Test

Outc

om

ePo

siti

ve

Re-visiting the Errors

Type 1 Error (False Positive)

Selecting a member outside the Target group

Relaxed selection algorithm, with large filters to allow someone outside the target group.

Type 2 Error (True Negative)

Failure to select a member within the Target group.

Stringent selection algorithm, with small filters to dis-allow someone even within the target group.

Applying the Concept to Talent Acquisitionin

Sensitivity & SpecificitySensitivity

Probability of “Selecting” High quality candidates.

Increasing Sensitivity can mean, relaxing the selection parameters, thus allowing selection of “poor quality candidates”.

Decreasing Sensitivity can mean putting stringent selection parameters, potentially losing out on “good quality candidates”

Specificity

Probability of “Not Selecting” Poor quality candidates.

Increasing Specificity can mean, putting stringent selection parameters, thus increasing the chance of rejecting “poor quality candidates”.

Decreasing Specificity can mean relaxing the selection parameters, thus failing to reject “poor quality candidates”

Type 1 and Type 2 Errors

Type 1 Error (False Positive)

Selecting “Poor quality candidates”.

Relaxed selection algorithm.

Type 2 Error (True Negative)

Rejecting “High Quality Candidates”.

Stringent selection algorithm.

Important Ratios

Important Ratios1. True positive rate (TPR), Sensitivity = Σ True

positive / Σ Condition positive

2. True negative rate (TNR), Specificity = Σ True negative / Σ Condition negative

3. False positive rate (FPR), Fall-out = Σ False positive / Σ Condition negative

4. False negative rate (FNR), Miss rate = Σ False negative / Σ Condition positive

5. Accuracy (ACC) = Σ True positive + Σ True negative / Σ Total population

6. Prevalence = Σ Condition positive / Σ Total population

7. Positive predictive value (PPV), Precision = Σ True positive / Σ Test Outcome Positive

8. False discovery rate (FDR) = Σ False positive / Σ Test Outcome Positive

9. False omission rate (FOR) = Σ False negative / Σ Test Outcome Negative

10.Negative predictive value (NPV) = Σ True negative / Σ Test Outcome Negative

11.Positive likelihood ratio (LR+) = TPR / FPR

12.Negative likelihood ratio (LR−) = FNR / TNR

13.Diagnostic odds ratio (DOR) = LR+ / LR−

Source: Wikipedia

Condition Positive

Condition NegativeTe

st O

utc

om

ePo

siti

ve

Test

Outc

om

e

Neg

ati

ve

10 200

90 600

Scenario: Suppose, out of 1000 sales employees, 100 are high performers. 10 among the high performers have left the company in last 6 months. While 200 among the remaining employees have left. If an Predictive Algorithm is built which can predict this, what are the various ratios?

HighPerformers

Not HighPerformers

Left

the

Com

pany

Sta

yed

in t

he

Com

pany

True positive rate (TPR), Sensitivity = Σ True positive / Σ Condition positive

= 10 / 100 = 0.1

True negative rate (TNR), Specificity = Σ True negative / Σ Condition negative

= 600 / 800 = 0.75

False positive rate (FPR), Fall-out = Σ False positive / Σ Condition negative

= 200 / 800 = 0.25

False negative rate (FNR), Miss rate = Σ False negative / Σ Condition positive

= 90 / 100 = 0.9

Accuracy (ACC) = (Σ True positive + Σ True negative) / Σ Total population

= 610 / 1000 = 0.61

Prevalence = Σ Condition positive / Σ Total population

= 100/ 1000 = 0.1

Positive predictive value (PPV), Precision = Σ True positive / Σ Test

Outcome Positive

= 10 / 210 = 0.04

False discovery rate (FDR) = Σ False positive / Σ Test Outcome Positive

= 200 / 210 = 0.1

False omission rate (FOR) = Σ False negative / Σ Test Outcome Negative

= 90 / 690 = 0.13

Negative predictive value (NPV) = Σ True negative / Σ Test Outcome Negative

= 600 / 690 = 0.87

Positive likelihood ratio (LR+) = TPR / FPR

= 0.1 / 0.25 = 0.4

Negative likelihood ratio (LR−) = FNR / TNR

= 0.9 / 0.75 = 1.33

Diagnostic odds ratio (DOR) = LR+ / LR−

= 0.4 / 1.33 = 0.30

True Positive

False Positive

False Negative

True Negative

Illustration

Thank you

sensitivity and specificity in predictive modeling

Business

error false positive

false positive type

false negative type

increasing sensitivity

high performers

high flightrisk

high performer attrition

predictive algorithm