stepwise logistic regression - lecture for students /faculty of mathematics and informatics

42
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited. Confidential and proprietary. Stepwise Logistic Regression Lecture for FMI Students 27.05.2010 Alexander Efremov

Upload: aefremov

Post on 14-Jun-2015

2.017 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.Confidential and proprietary.

Stepwise Logistic RegressionLecture for FMI Students 27.05.2010

Alexander Efremov

Page 2: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 2

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 3: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 3

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 4: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 4

Introduction Applications of the Logistic Regression

Medicine – diagnostics, modeling of disease growth, treatment effect

Psychology – learn process modeling, psychological tests evaluation

Economics – risk analysis, countries debt investigation, occupational choices

Marketing – products consumption, retailers actions effect

Criminology – risk factors for performing of criminal act

Sociology – employment, graduation, vote analysis

Ecology – modeling population growth

linguistics – language changes

Chemistry – reaction models

Media – news effects, copycat reaction

Finance – credit scoring, fraud detection

Physics, Biology, etc.

The Logistic Model

Page 5: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 5

Introduction System Under Investigation

Individuals /rough data/ => System => Model

=>=>

Page 6: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 6

IntroductionSystem Identification Stages

Page 7: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 7

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 8: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 8

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 9: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 9

Part I. Logistic Regression Model DevelopmentLogistic Model

Linear relation Logistic relation

Page 10: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 10

k

ky

ky

N

– index of current individual – intercept

– number of observations – the i+1-th model parameter

– dependent variable – the i-th independent variable /prob. of good/

– model output – i-th independent variable/predicted prob. of good/

Part I. Logistic Regression Model DevelopmentLogistic Model

Logistic Relation – General Form “Linear” Log. Regression Model

k

k

M

M

ke

ey

+=

kMke

y −+=

1

knnkk xxM ,,110 ... θθθ +++=

)...( ,,1101

knnk xxke

y θθθ +++−+=

knnkyy xx

k

k,,110ˆ1

ˆ...ln θθθ +++=−

kix ,

ni ,1=

Nk ,1=

Page 11: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 11

Part I. Logistic Regression Model DevelopmentLogistic Model

Notation

� Parameters vector

� Regression vector

� Logistic model

1+∈ nRθ

1+∈ nk Rϕ

Tn ]...[ 10 θθθθ =

Tknkk xx ]...1[ ,,1=ϕ

θϕθθθ Tkknnk ee

yxxk −+++− +

=+

=1

1

1

)...( ,,110

Page 12: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 12

Part I. Logistic Regression Model DevelopmentResidual

The Residual

kkkk eyee

y Tk

+=++

=−

ˆ1

1θϕ

=−=−

=−=0,ˆ

1,ˆ1ˆ

for

for

kk

kkkkk yy

yyyye

Sources of Uncertainty

� Unavailable significant factors

� Simplified relations

� Time-varying performance

� Database errors

� Fraud

Page 13: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 13

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 14: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 14

Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator

Cost Function

� Model output

� Likelihood contribution

� Likelihood function

� Log-likelihood function

Maximum Likelihood Criterion

kk yk

ykk yyl −−= 1

, )ˆ1(ˆθ

θθ

θθLL ln2minlnmax −⇔

∏=

=N

kklL

1,θθ

∑=

−−+=N

kkkkk yyyyL

1

))ˆ1ln()1(ˆln(ln θ

)|1(ˆ kkk yPy ϕ==

Page 15: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 15

Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator

Cost Function /-2 Log L/ for a Real Life Case

Page 16: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 16

Tailor Series Expansion

Cost Function Models

� Linear model

� Quadratic model

Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator

)()()1( ˆˆ iii θθθ ∆+=+

)()()(ˆ

)( )( iTiii gfM θθ

∆+=

)()()(21)()()(

ˆ)( )()( iiTiiTiii HgfM θθθ

θ∆∆+∆+=

3)()()(

21)()()(

ˆ)(

ˆ )()( OHgff iiTiiTiii +∆∆+∆+=∆+

θθθθθθ

)(ˆ

)( iTi fgθ

∇=)(

ˆ2)( ii fH

θ∇=

Cost function

Gradient

Hessian

)(ˆ

)(ˆ ln ii Lf

θθ−=

?)( =∆ iθ

Estimates Update

Page 17: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 17

Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator

Gradient Hessian

I-st Order Methods II-nd Order Method/e.g. Steepest Descent/ /e.g. Newton-Raphson/

gαθ −=∆ gH 1−−=∆ αθ

[ ] 1

10

+∂∂

∂∂

∂∂ ∈= nTfff Rg

nθθθ L11

2

2

1

2

0

2

1

2

21

2

01

2

0

2

10

2

20

2

+×+

∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂

∂∂∂

∂∂∂

∂∂∂

∂∂

= nn

fff

fff

fff

RH

nnn

n

n

θθθθθ

θθθθθ

θθθθθ

L

MOMM

L

L

θ(0) 1

2

θ*θopt

1

2

θ(0)

θ*θopt

Page 18: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 18

Steepest Newton-Descent Raphson

(NR)

NR with NR with

Line Search Quadratic

Interpolation

1

2

θ(0)

θ*θopt

θ(0) 1

2

θ*θopt

Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator

gαθ −=∆gH 1−−=∆ αθ

gH 1* −−=∆ αθgH 1* −−=∆ αθ

θ(0) 1

2

θ*θopt

θ(0) 1

2

θ*θopt

Page 19: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 19

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 20: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 20

Numerical Problems

Matrix inversion, hence SVD, EVD, QR, etc.

Local Minima

Part I. Logistic Regression Model DevelopmentPotential problems

Model Overfitting

αθθ −=+ )()1( ˆˆ ii 1−H g

-2lnL

k

y2,k

yk

1,ky

Page 21: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 21

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 22: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 22

Part I. Logistic Regression Model DevelopmentFrequently Used Statistics for Model Analysis

Individual Estimate Measures

� Standard error

� Wald statistic

� p-value

Overall Model Measures

� Coefficient of determination (R2)

� generalized R2

� gen. max. resc. R2

� Cost function

21

ˆ)ˆ( ~2ˆ

2

2

χθθ σ

θσ

θθ

i

i

i

iiiW == −

N

LL

eRθθ ˆln0ˆln

212

−=10ˆln2

1 −−= N

L

esR

θ

RsR

mR22 =

)(ˆ

)(ˆ ln2 ii Lf

θθ−=

iHi

)][diag( 1ˆ

−=θσ

21Pr χ>

χ

p-value

WWi

Page 23: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 23

Part I. Logistic Regression Model DevelopmentFrequently Used Statistics for Model Analysis

Modified criteria

� Akaike Information Criterion (AIC)

� Schwarz Criterion (SC)

� Minimum Description Length (MDL), Final Prediction Error (FPE), etc.

Model Validation

� Data split into development and validation samples

nLAIC 2ln2 ˆˆ +−= θθ

)1ln(ln2 ˆˆ −+−= NnLSC θθ

AIC

-2lnL

Page 24: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 24

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 25: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 25

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 26: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 26

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

xo, xe – sets of all variables, out/entered in the model

xoi, xei – the most/less significant variable

SLE – Significance Level to Enter

SLS – Significance Level to Stay

SWR

Page 27: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 27

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

Available information

Page 28: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 28

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

1

Initialization

Page 29: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 29

Forward Selection

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

12

Page 30: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 30

12 3

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

Forward Selection

Page 31: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 31

2 3

Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea

Backward Elimination

Page 32: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 32

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 33: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 33

Part II. Stepwise Logistic RegressionStep 0. Initialization

Logistic model

1. Intercept Model

2. Full model

3. One Factor Model

� Check for Enter

� Score Chi-Sq for all potential models

� Maximum Score Chi-Square

� p-value & threshold

� Model Determination (Optimization)

θϕTke

yk −+=

1

iiTii gHgS 1−=

R∈θ 1=kϕ1+∈ nRθ T

knkk xx ]1[ ,,1 K=ϕ

ii

Smaxarg1 =l

SLEvalue-p1

<l

Tkk x ]1[ ,1l

=ϕ2R∈θ

Page 34: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 34

Part II. Stepwise Logistic RegressionStep 1. Forward Selection

1. Check for Enter

� Score Chi-Square of all potential models

� Maximum Score Chi-Square

� p-value & threshold

2. Model Determination (Optimization)

3. Statistics for Model Analysis

� Individual Estimate Measures

� standard error

� Wald statistic & p-value

iiTii gHgS 1−=

ii

i Smaxarg=l

SLEvalue-p <il

Tkkk i

xx ]1[ ,,1 llK=ϕ1+∈ iRθ

Page 35: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 35

Part II. Stepwise Logistic RegressionStep 1. Forward Selection

3. Statistics for Model Analysis (part 2)

� Overall Model Measures

� Coefficients of determination

� Cost function

� Modified criteria

� Akaike Information Criterion (AIC)

� Schwarz Criterion (SC)

Page 36: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 36

Part II. Stepwise Logistic RegressionStepwise Logistic Regression

SWR

Page 37: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 37

Part II. Stepwise Logistic RegressionStep 2. Backward Elimination

1. Check for Leave

� Wald statistic & p-value of all potential models

� p-value & threshold

2. Model Determination (Optimization)

3. Statistics for Model Analysis

� Individual Estimate Measures

� standard error

� Wald statistic & p-value

Tkkkkk ijj

xxxx ]1[ ,,,, 111 llllKK

+−=ϕiR∈θ

SLLvalue-pmax >il

Page 38: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 38

3. Statistics for Model Analysis (part 2)

� Overall Model Measures

� Coefficients of determination

� Cost function

� Modified criteria

� Akaike Information Criterion (AIC)

� Schwarz Criterion (SC)

Part II. Stepwise Logistic RegressionStep 2. Backward Elimination

Page 39: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 39

Agenda

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 40: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 40

Part II. Stepwise Logistic RegressionPotential problems in the Stepwise Regression

Local Minima & Initial Conditions

Numerical Problems /SVD, EVD, QR, etc./

Model Overfitting

Page 41: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved.Confidential and proprietary. 41

Summary

IntroductionApplications of the Logistic Regression

System Identification & Stepwise Regression

Part I. Logistic Regression Model DevelopmentLogistic Model

Maximum Likelihood Estimator

Potential Problems

Model Analysis and Validation

Part II. Stepwise Logistic Regression (SWR)Basic Idea

SWR Algorithm

Potential Problems

Summary

Page 42: Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics

© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.Confidential and proprietary.

Stepwise Logistic RegressionLecture for FMI Students 27.05.2010

Alexander Efremov

Thank You!

http://anp.tu-sofia.bg/aefremov/index.htm