announcements spring courses somewhat relevant to machine learning 5314: algorithms for molecular...

55
Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley) 5454: Algorithms (Frangillo) 5502: Data mining (Lv) 5753: Computer performance modeling (Grunwald) 7000-006: Geospatial data analysis (Caleb Phillips) 7000-008: Human-robot interaction (Dan Szafir) 7000-009: Data analytics: Systems algorithms and applications (Lv) 7000-021: Bioinformatics (Robin Dowell-Dean) Homework Importance sampling via likelihood weighting

Upload: martha-burns

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Announcements

Spring Courses Somewhat Relevant to Machine Learning  5314: Algorithms for molecular bio (who’s teaching?)  5446: Chaotic dynamics (Bradley)  5454: Algorithms (Frangillo)  5502: Data mining (Lv)  5753: Computer performance modeling (Grunwald)  7000-006: Geospatial data analysis (Caleb Phillips)  7000-008: Human-robot interaction (Dan Szafir)  7000-009: Data analytics: Systems algorithms and applications (Lv)  7000-021: Bioinformatics (Robin Dowell-Dean)

Homework

Importance sampling vialikelihood weighting

Page 2: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Learning In Bayesian Networks:Missing Data And Hidden Variables

Page 3: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Missing Vs. Hidden Variables

Missing

often known but absent for certain data points

missing at random or missing based on value  e.g., netflix ratings

Hidden

never observed but essential for predicting visible variables  e.g., human memory state

a.k.a. latent variables

Page 4: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Quiz

“Semisupervised learning” concerns learning where additional input examples are available, but labels are not. According to the model below, will partial data (either X or Y) inform the model parameters?

X known?

Y known?

X Y

θy|x

θx

θy|~x

XX Y

Page 5: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

X Y

θy|x

θx

θy|~x

X Y

Page 6: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Missing Data: Exact Inference In Bayes Net

Y: observed variablesZ: unobserved variables

How do we do parameter updates for θi in this case?

If Xi and Pai are observed, then situation is straightforward (e.g., like single-coin toss case).

If Xi or any Pai are missing, need to marginalize over Z

E.g., Xi ~ Categorical(θij)

Note: posterior is a Dirichlet mixture

Dirichlet

# values of Xi

Specific value of Xi

Dirichlet

X = {Y,Z}

parameter vector for Xiwith parent configuration j

Page 7: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Missing Data: Gibbs Sampling

Given a set of observed incomplete data, D = {y1, ..., yN}

1. Fill in arbitrary values for unobserved variables for each case Dc

2. For each unobserved variable zi in case n, sample:

3. evaluate posterior density on complete data Dc’

4. repeat steps 2 and 3, and compute mean of posterior density

Page 8: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Missing Data: Gaussian Approximation

Approximateas a multivariate Gaussian.

 Appropriate if sample size |D| is large, which is also the case when Monte Carlo is inefficient

1. find the MAP configuration by maximizing g(.)

2. approximate using 2nd degree Taylor polynomial

3. leads to approximate result that is Gaussian

~

negative Hessian of g(.) eval at

~

Page 9: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Missing Data: Further Approximations

As the data sample size increases, Gaussian peak becomes sharper, so can make predictions

based on the MAP configuration can ignore priors (diminishing importance) -> max likelihood

How to do ML estimation Expectation Maximization Gradient methods

Page 10: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Expectation Maximization

Scheme for picking values of missing data and hidden variables that maximizes data likelihood

E.g., population of Laughing Goat

baby stroller, diapers, lycra pants

backpack, saggy pants

baby stroller, diapers

backpack, computer, saggy pants

diapers, lycra

computer, saggy pants

backpack, saggy pants

Page 11: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Expectation Maximization Formally

V: visible variables

H: hidden variables

θ: model parameters

Model

P(V,H|θ)

Goal

Learn model parameters θ in the absence of H

Approach

Find θ that maximizes P(V|θ)

Page 12: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

EM Algorithm (Barber, Chapter 11)

Page 13: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

EM Algorithm

Guaranteed to find local optimum of θ

Sketch of proof

Bound on marginal likelihood

  equality only when q(h|v)=p(h|v,θ)

E-step: for fixed θ, find q(h|v) that maximizes RHS

M-step: for fixed q, find θ that maximizes RHS

if each step maximizes RHS, it’s also improving LHS  technically, it’s not lowering LHS

Page 14: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Barber Example

Contours are of the lower bound

Note alternating steps along θ and q axes

note that steps are not gradient steps and can be large

Choice of initial θ determines local likelihood optimum

Page 15: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Clustering: K-Means Vs. EM

K means

1. choose some initial values of μk

2. assign each data point to the closest cluster

3. recalculate the μk to be the means of the set of points assigned to cluster k

4. iterate to step 2

Page 16: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

K-means Clustering

From C. Bishop, Pattern Recognition and Machine Learning

Page 17: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

K-means Clustering

Page 18: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

K-means Clustering

Page 19: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

K-means Clustering

Page 20: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Clustering: K-Means Vs. EM

K means

1. choose some initial values of μk

2. assign each data point to the closest cluster

3. recalculate the μk to be the means of the set of points assigned to cluster k

4. iterate to step 2

Page 21: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Clustering: K-Means Vs. EM

EM

1. choose some initial values of μk

2. probabilistically assign each data point to clusters1. P(Z=k|μ)

3. recalculate the μk to be the weighted mean of the set of points

1. weight by P(Z=k|μ)

4. iterate to step 2

Page 22: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

EM for Gaussian Mixtures

Page 23: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

EM for Gaussian Mixtures

Page 24: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

EM for Gaussian Mixtures

Page 25: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Variational Bayes

Generalization of EM

also deals with missing data and hidden variables

Produces posterior on parameters

not just ML solution

Basic (0th order) idea

do EM to obtain estimates of p(θ) rather than θ directly

Page 26: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Variational Bayes

Assume factorized approximation of joint hidden and parameter posterior:

Find marginals that make this approximation as close as possible.

Advantage?

Bayesian Occam’s razor: vaguely specified parameter is a simpler model -> reduces overfitting

Page 27: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Gradient Methods

Useful for continuous parameters θ

Make small incremental steps to maximize the likelihood

Gradient update:

swap

Page 28: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

All Learning Methods Apply ToArbitrary Local Distribution Functions

Local distribution function performs either Probabilistic classification (discrete RVs) Probabilistic regression (continuous RVs)

Complete flexibility in specifying local distribution fn Analytical function (e.g., homework 5) Look up table Logistic regression Neural net Etc.

LOCAL DISTRIBUTION FUNCTION

Page 29: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Summary Of Learning Section

Given model structure and probabilities,inferring latent variables

Given model structure,learning model probabilities Complete data

Missing data

Learning model structure

Page 30: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Learning Model Structure

Page 31: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Learning Structure and Parameters

The principleTreat network structure, Sh, as a discrete RV

Calculate structure posterior

Integrate over uncertainty in structure to predict

The practiceComputing marginal likelihood, p(D|Sh), can be difficult.

Learning structure can be impractical due to the large number of hypotheses (more than exponential in # of nodes)

Page 32: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

source: www.bayesnets.com

Page 33: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Approach to Structure Learning

model selection  find a good model, and treat it as the correct model

selective model averaging  select a manageable number of candidate models and pretend that these models are exhaustive

Experimentally, both of these approaches produce good results.

  i.e., good generalization

Page 34: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 35: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

SLIDES STOLEN FROM DAVID HECKERMAN

Page 36: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 37: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Interpretation of Marginal Likelihood

Using chain rule for probabilities

Maximizing marginal likelihood also maximizes sequential prediction ability!

Relation to leave-one-out cross validation

Problems with cross validation can overfit the data, possibly because of interchanges (each item is used

for training and for testing each other item)

has a hard time dealing with temporal sequence data

Page 38: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Coin Example

Page 39: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

αh, α

t, #h, and #t all indexed by these conditions

Page 40: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

# parent config

# nodes

# node states

Page 41: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Computation of Marginal Likelihood

Efficient closed form solution if

no missing data (including no hidden variables)

mutual independence of parameters θ

local distribution functions from the exponential family (binomial, Poisson, gamma, Gaussian, etc.)

conjugate priors

Page 42: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Computation of Marginal Likelihood

Approximation techniques must be used otherwise.

E.g., for missing data can use Gibbs sampling or Gaussian approximation described earlier.

  Bayes theorem

1. Evaluate numerator directly, estimate denominator using Gibbs sampling

  2. For large amounts of data, numerator can be approximated by a multivariate Gaussian

Page 43: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Structure Priors

Hypothesis equivalence  identify equivalence class of a given network structure

All possible structures equally likely

Partial specification: required and prohibited arcs(based on causal knowledge)

Ordering of variables + independence assumptions  ordering based on e.g., temporal precedence  presence or absence of arcs are mutually independent ->n(n-1)/2 priors

p(m) ~ similarity(m, prior Belief Net)

Page 44: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Parameter Priors

all uniform: Beta(1,1)

use a prior Belief Net

parameters dependonly on local structure

Page 45: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

Model Search

Finding the belief net structure with highest score among those structures with at most k parents is NP-hard for k > 1 (Chickering, 1995)

Sequential search add, remove, reverse arcs

ensure no directed cycles

efficient in that changes to arcs affect onlysome components of p(D|M)

Heuristic methods greedy

greedy with restarts

MCMC / simulated annealing

Page 46: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 47: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 48: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

two most likely structures

Page 49: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 50: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)

2x1010

Page 51: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 52: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 53: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 54: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)
Page 55: Announcements Spring Courses Somewhat Relevant to Machine Learning 5314: Algorithms for molecular bio (who’s teaching?) 5446: Chaotic dynamics (Bradley)