recitation 2 april 28
DESCRIPTION
RECITATION 2 APRIL 28. Spline and Kernel method Gaussian Processes Mixture Modeling for Density Estimation. Penalized Cubic Regression Splines. gam() in library “ mgcv ” gam( y ~ s(x, bs =“ cr ”, k= n.knots ) , knots=list(x=c(…)), data = dataset) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/1.jpg)
RECITATION 2APRIL 28
Spline and Kernel methodGaussian ProcessesMixture Modeling for Density Estimation
![Page 2: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/2.jpg)
Penalized Cubic Regression Splines
• gam() in library “mgcv”
• gam( y ~ s(x, bs=“cr”, k=n.knots) , knots=list(x=c(…)), data = dataset)
• By default, the optimal smoothing parameter selected by GCV
• R Demo 1
![Page 3: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/3.jpg)
Kernel Method• Nadaraya-Watson locally constant model
• locally linear polynomial model
• How to define “local”?• By Kernel function, e.g. Gaussian kernel
• R Demo 1• R package: “locfit”• Function: locfit(y~x, kern=“gauss”, deg= , alpha= )• Bandwidth selected by GCV: gcvplot(y~x, kern=“gauss”, deg= ,
alpha= bandwidth range)
![Page 4: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/4.jpg)
Gaussian Processes• Distribution on functions
• f ~ GP(m,κ)• m: mean function• κ: covariance function
• p(f(x1), . . . , f(xn)) N∼ n(μ, K)• μ = [m(x1),...,m(xn)]• Kij = κ (xi,xj)
• Idea: If xi, xj are similar according to the kernel, then f(xi) is similar to f(xj)
![Page 5: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/5.jpg)
Gaussian Processes – Noise free observations
• Example task: • learn a function f(x) to estimate y, from data (x, y)• A function can be viewed as a random variable of infinite dimensions
• GP provides a distribution over functions.
![Page 6: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/6.jpg)
Gaussian Processes – Noise free observations• Model
• (x, f) are the observed locations and values (training data)• (x*, f*) are the test or prediction data locations and values.
• After observing some noise free data (x, f),
• Length-scale• R Demo 2
![Page 7: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/7.jpg)
• Model• (x, y) are the observed locations and values (training data)• (x*, f*) are the test or prediction data locations and values.
• After observing some noisy data (x, y),
• R Demo 3
Gaussian Processes – Noisy observations(GP for Regression)
![Page 8: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/8.jpg)
Reference• Chapter 2 from Gaussian Processes for Machine Learning
Carl Edward Rasmussen and Christopher K. I. Williams
• 527 lecture notes by Emily Fox
![Page 9: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/9.jpg)
Mixture Models – Density Estimation
• EM algorithm vs. Bayesian Markov Chain Monte Carlo (MCMC)
• Remember:
• EM algorithm = iterative algorithm that MAXIMIZES LIKELIHOOD
• MCMC DRAWS FROM POSTERIOR (i.e. likelihood+prior)
![Page 10: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/10.jpg)
EM algorithm• Iterative procedure that attempts to maximize log-
likelihood ---> MLE estimates of the mixture model parameters.
• I.e. one final density estimate
![Page 11: RECITATION 2 APRIL 28](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816075550346895dcf9ead/html5/thumbnails/11.jpg)
Bayesian Mixture Modeling (MCMC)• Uses an iterative procedure to DRAW SAMPLES from
posterior (then you can average draws, etc.)
• Don’t need to understand fine details but know that every iteration you get a set of parameter estimates from your posterior distribution.