cs 540: machine learning lecture 1: introductionarnaud/cs540/lec1_cs540_handouts.pdf ·...

41
CS 540: Machine Learning Lecture 1: Introduction AD January 2008 AD () January 2008 1 / 41

Upload: others

Post on 01-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

CS 540: Machine LearningLecture 1: Introduction

AD

January 2008

AD () January 2008 1 / 41

Page 2: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Acknowledgments

Thanks to

Nando de Freitas

Kevin Murphy

AD () January 2008 2 / 41

Page 3: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Administrivia & Announcement

Webpage: http://www.cs.ubc.ca/~arnaud/cs540_2008.html

O¢ ce hours: Tuesday & Thursday 11.30-12.30.

Introductory tutorial Matlab session by Ian Mitchell TODAY, 5pm -7pm, DMP 110

Matlab resources:http://www.cs.ubc.ca/~mitchell/matlabResources.html

AD () January 2008 3 / 41

Page 4: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Pre-requisites

Maths: multivariate calculs, linear algebra, probability.

CS: programming skills, knowledge of data structures and algorithmshelpful but not crucial.

Prior exposure to statistics/machine learning is highly desirable butyou will be ok as long as your math is good.

I will discuss Bayesian statistics at length.

AD () January 2008 4 / 41

Page 5: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

(Tentative) grading

The midterm will be sometimes around the end of February and isworth 30%

The �nal project will be in April ?? and is worth 40%.

There will be three (long) assignements worth 30%.

Additional optional exercises will be given.

AD () January 2008 5 / 41

Page 6: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Textbook

C.M. Bishop, Pattern Recognition and Machine Learning, Springer,2006.

Check the webpagehttp://research.microsoft.com/~cmbishop/PRML/

Corrections of exercises available on this webpage.

Each week, there will be required reading, optional reading andbackground reading.

Optional Textbook: Hastie, Tibshirani and Friedman, Elements ofStatistical Learning - Data Mining, Inference and Prediction,Springer-Verlag, 2001.

AD () January 2008 6 / 41

Page 7: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Related classes

CPSC 532L - Multiagent Systems (Leyton-Brown).

CPSC 532M - Dynamic Programming (Ian Mitchell)

Statistics courses.

AD () January 2008 7 / 41

Page 8: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

What is Machine Learning?

�Learning denotes changes in the system that are adaptive in thesense that they enable the system to do the task or tasks drawn fromthe same population more e¢ ciently and more e¤ectively the nexttime��Herb Simon.

Machine Learning is an interdisciplinary �eld at the intersection ofStatistics, CS and EE etc.

At the beginning, Machine Learning was fairly heuristic but it has nowevolved and become -in my opinion- Applied Statistics with a CS�avour.

Aim of this course: introducing basic models, concepts andalgorithms.

AD () January 2008 8 / 41

Page 9: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Main Learning Tasks

Supervised learning: given a training set of N input-output pairsfxn, tng 2 X � T , construct a function f : X ! T to predict theoutput bt = f (x) associated to a new input x .- Each input xn is a p-dimensional feature vector (covariates,explanatory variables).- Each output tn is a target variables (response).

Regression corresponds to T =Rd .

Classi�cation corresponds to T = f1, ...,Kg.Aim is to produce the correct output given a new input.

AD () January 2008 9 / 41

Page 10: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Polynomial regression

Figure: Polynomial regression where x 2 R and t 2 R

AD () January 2008 10 / 41

Page 11: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Linear regression

Figure: Linear least square �tting for x 2 R2 and t 2 R. We seek the linearfunction of x that minimizes the sum of square residuals for y .

AD () January 2008 11 / 41

Page 12: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Piecewise linear regression

Figure: Piecewise linear regression function

AD () January 2008 12 / 41

Page 13: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Figure: Linear classi�er where xn 2 R2 and tn 2 f0, 1g

AD () January 2008 13 / 41

Page 14: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Handwritten digit recognition

Figure: Examples of handwritten digits from US postal employes

AD () January 2008 14 / 41

Page 15: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Figure: Can you pick out the tufas?

AD () January 2008 15 / 41

Page 16: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Practical examples of classi�cation

Email spam �ltering (feature vector = �bag of words�)

Webpage classi�cation

Detecting credit card fraud

Credit scoring

Face detection in images

AD () January 2008 16 / 41

Page 17: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Unsupervised learning: we are given training data fxng .Aim is to produce a model or build useful representations for xmodeling the distribution of the data,clustering,data association,dimensionality reduction,structure learning.

AD () January 2008 17 / 41

Page 18: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Hard and Soft Clustering

Figure: Data (left), hard clustering (middle), soft clustering (right)

AD () January 2008 18 / 41

Page 19: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Finite Mixture Models

Finite Mixture of Gaussians

f (x j θ) =k

∑i=1piN

�x ; µi , σ

2i

where θ =�

µi , σ2i , pi

i=1,...,k is estimated from some data (x1, ..., xn) .

How to estimate the parameters?

How to estimate the number of components k?

AD () January 2008 19 / 41

Page 20: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

AD () January 2008 20 / 41

Page 21: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

AD () January 2008 21 / 41

Page 22: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Modelling dependent data

AD () January 2008 22 / 41

Page 23: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

AD () January 2008 23 / 41

Page 24: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Hidden Markov Models

In this case, we have

xn j xn�1 � f ( �j xn�1) ,yn j xn � g ( �j xn) .

For example, stochastic volatility model

xn = αxn�1 + σvn where vn � N (0, 1)

yn = β exp (xn/2)wn where wn � N (0, 1)

where the process (yn) is observed but (xn, α, σ, β) are unknown.

AD () January 2008 24 / 41

Page 25: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Tracking Application

AD () January 2008 25 / 41

Page 26: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Hierarchical Clustering

Figure: Dendogram from agglomerative hierarchical clustering with averagelinkage to the human microarray data

AD () January 2008 26 / 41

Page 27: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Dimensionality reduction

Figure: Simulated data in three classes, near the surface of a half-sphere

AD () January 2008 27 / 41

Page 28: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Figure: The best rank-two linear approximation to the half-sphere data. The rightpanel shows the projected points with coordinates given by the �rst to principalcomponents of the data

AD () January 2008 28 / 41

Page 29: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Manifold learning

Figure: Data lying on a low-dimensional manifold

AD () January 2008 29 / 41

Page 30: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Structure learning

Figure: Graphical model

AD () January 2008 30 / 41

Page 31: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Ockham�s razor

How can we predict the test set if it may be arbitrarily di¤erent fromthe training set?We need to make an assumption that the future will be like thepresent in statistical termsAny hypothesis that agrees with all is as good as any other, but wegenerally prefer �simpler�ones.

AD () January 2008 31 / 41

Page 32: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Active learning

Active learning is a principled way of integrating decision theory withtraditional statistical methods for learning models from data

In active learning, the machine can query the environment. That is, itcan ask questions.

Decision theory leads to optimal strategies for choosing when andwhat questions to ask in order to gather the best possible data.

�Good�data is often better than a lot of data

AD () January 2008 32 / 41

Page 33: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Active learning and surveillance

In network with thousands of cameras, which camera views should bepresented to the human operator?

AD () January 2008 33 / 41

Page 34: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Active learning and sensor networks

How doe we optimally choose among a subset of sensors in order toobtain the best understanding of the world while minimizing resourceexpenditure (power, bandwidth, distractions)?

AD () January 2008 34 / 41

Page 35: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Active learning example

AD () January 2008 35 / 41

Page 36: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Decision theoretic foundations of machine learning

Agents (humans, animals, machines) always have to make decisionsunder uncertainty

Uncertainty arises because we do not know the �true� state of theworld; e.g. because of limited �eld of view, because of �noise�, etc.

The optimal way to behave when faced with uncertainty is as follows

Estimate the state of the world, given the evidence you have seen upuntil now (this is called your belief state).Given your preferences over possible future states (expressed as a utilityfunction), and your beliefs about the e¤ects of your actions, pick theaction that you think will be the best.

AD () January 2008 36 / 41

Page 37: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Decision making under uncertainty

You should choose the action with maximum expected utility

a�t+1 = argmaxat+12A

∑xPθ (Xt+1 = x j a1:t+1, y1:t )Uφ (x)

where

Xt represents the �true� (unknown) state of the world at time ty1:t = y1, . . . , yt represents the data I have seen up until nowa1:t are the actions that I have taken up until nowU(x) is a utility function on state xθ are parameters of your world modelφ are parameters of your utility function

AD () January 2008 37 / 41

Page 38: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Decision making under uncertainty

We will later justify why your beliefs should be represented usingprobability theory.

It can also be shown that all your preferences can be expressed usinga scalar utility function (shocking, but true).

Machine learning is concerned with estimating models of the world(θ) given data, and using them to compute belief states (i.e.,probabilistic inference).

Preference elicitation is concerned with estimating models (φ) ofutility functions (we will not consider this issue).

AD () January 2008 38 / 41

Page 39: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Decision making under uncertainty: examples

Robot navigation: Xt=robot location, Yt = video image, At =direction to move, U(X ) = distance to goal.

Medical diagnosis: Xt = whether the patient has cancer, Yt =symptoms, At = perform clinical test, U(X ) = health of patient.

Financial prediction: Xt = true state of economy, Yt = observedstock prices, At = invest in stock i , U(X ) = wealth.

AD () January 2008 39 / 41

Page 40: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Tentative topics

Review of probability and Bayesian statistics

Linear and nonlinear regression and classi�cation

Nonparametric and model-based clustering

Finite mixture and hidden Markov models, EM algorithms.

Graphical models.

Variational methods.

MCMC and particle �lters.

PCA, Factor analysis

Boosting.

AD () January 2008 40 / 41

Page 41: CS 540: Machine Learning Lecture 1: Introductionarnaud/cs540/lec1_CS540_handouts.pdf · 2008-01-24 · Main Learning Tasks Supervised learning: given a training set of N input-output

Project

A list of potential projects will be listed shortly.

You can propose your own idea.

I expect the �nal project to be written like a conference paper.

AD () January 2008 41 / 41