recitation 2 april 28

RECITATION 2APRIL 28

Spline and Kernel methodGaussian ProcessesMixture Modeling for Density Estimation

Penalized Cubic Regression Splines

• gam() in library “mgcv”

• gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset)

• By default, the optimal smoothing parameter selected by GCV

• R Demo 1

Kernel Method• Nadaraya-Watson locally constant model

• locally linear polynomial model

• How to define “local”?• By Kernel function, e.g. Gaussian kernel

• R Demo 1• R package: “locfit”• Function: locfit(y~x, kern=“gauss”, deg= , alpha= )• Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= ,

alpha= bandwidth range)

Gaussian Processes• Distribution on functions

• f ~ GP(m,κ)• m: mean function• κ: covariance function

• p(f(x1), . . . , f(xn)) N∼ n(μ, K)• μ = [m(x1),...,m(xn)]• Kij = κ (xi,xj)

• Idea: If xi, xj are similar according to the kernel, then f(xi) is similar to f(xj)

Gaussian Processes – Noise free observations

• Example task: • learn a function f(x) to estimate y, from data (x, y)• A function can be viewed as a random variable of infinite dimensions

• GP provides a distribution over functions.

Gaussian Processes – Noise free observations• Model

• (x, f) are the observed locations and values (training data)• (x*, f*) are the test or prediction data locations and values.

• After observing some noise free data (x, f),

• Length-scale• R Demo 2

• Model• (x, y) are the observed locations and values (training data)• (x*, f*) are the test or prediction data locations and values.

• After observing some noisy data (x, y),

• R Demo 3

Gaussian Processes – Noisy observations(GP for Regression)

Reference• Chapter 2 from Gaussian Processes for Machine Learning

Carl Edward Rasmussen and Christopher K. I. Williams

• 527 lecture notes by Emily Fox

Mixture Models – Density Estimation

• EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC)

• Remember:

• EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD

• MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)

EM algorithm• Iterative procedure that attempts to maximize log-

likelihood ---> MLE estimates of the mixture model parameters.

• I.e. one final density estimate

Bayesian Mixture Modeling (MCMC)• Uses an iterative procedure to DRAW SAMPLES from

posterior (then you can average draws, etc.)

• Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.

recitation 2 april 28

Documents

kernel function

mean function

function fx

ya function

standard covariance

covariance function

gaussian kernel r demo

prediction data locations