CS 59000 Statistical Machine learningLecture 10
Yuan (Alan) QiPurdue CS
Sept. 25 2008
Outline
• Review of Fisher’s linear discriminant, percepton, probabilistic generative models,
• Probabilistic discriminative models: Logistic regressionProbit regression
Fisher Linear Discriminant
Within Class and Between Class Scatter Matrices
Generative eigenvalue problem
Fisher’s Linear Discriminant
Generalized Linear Model
Minimize
where M denotes the set of all misclassified patterns
Stochastic Gradient Descent
Probabilistic Generative Models
Gaussian Class-Conditional DensitiesConditional densities of data:
The posterior distribution for label/class:
Maximum Likelihood Estimation
Linked to Fisher’s linear discriminant
Discrete features
Naïve Bayes classification:
Probabilistic Discriminative Models
Instead of modeling
Model directly
Generative vs Condition Models
Discussion
Logistic Regression
Let
Likelihood function
Maximum Likelihood Estimation
Note that
Newton-Raphson Optimization for Linear Regression
Let H denote Hessian matrix
It converges in one iteration for linear regression.
Newton-Raphson Optimization for Logistic Regression
Gradient and Hessian of the error function:
Newton-Raphson Optimization for Logistic Regression
Iterative reweighted least squares (IRLS):Solving a series of weighted least-square
problems
Other discriminative models
Generative models <-> Logistic regression
How about other discriminative functions?
Probit Regression
Probit function:
Labeling Noise Model
Robust to outliers and labeling errors
Generalized Linear Models
Generalized Linear Models
Generalized linear model: Activation function: Link function:
Canonical Link Function
If we choose the canonical link function:
Gradient of the error function:
Laplace Approximation for Posterior
Gaussian approximation around mode:
Illustration of Laplace Approximation
Evidence Approximation
Bayesian Information Criterion
Approximation of Laplace approximation:
More accurate evidence approximation needed
Bayesian Logistic Regression