machine learning 10-701 - carnegie mellon school of ...tom/10701_sp11/slides/pac... · machine...

13

1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 24, 2011 Today: • Computational Learning Theory • PAC learning theorem • VC dimension Recommended reading: • Mitchell: Ch. 7 • suggested exercises: 7.1, 7.2, 7.7 * see Annual Conference on Learning Theory (COLT)

Upload: others

Post on 11-Jun-2020

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

1

Machine Learning 10-701 Tom M. Mitchell

Machine Learning Department Carnegie Mellon University

February 24, 2011

Today:

•  Computational Learning Theory

•  PAC learning theorem •  VC dimension

Recommended reading:

•  Mitchell: Ch. 7 •  suggested exercises: 7.1,

7.2, 7.7

* see Annual Conference on Learning Theory (COLT)

Page 2: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

2

Target concept is the boolean-valued fn to be learned

c: X {0,1}

Function Approximation: The Big Picture

Page 3: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

3

Page 4: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

4

D

instances drawn at random from

Probability distribution P(x)

training examples

D

instances drawn at random from

Probability distribution P(x)

training examples

Can we bound

in terms of

??

Page 5: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

5

Probability distribution P(x)

training examples

Can we bound

in terms of

??

if D was a set of examples drawn from and independent of h, then we could use standard statistical confidence intervals to determine that with 95% probability, lies in the interval:

but D is the training data for h ….

c: X {0,1}

Page 6: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

6

Function Approximation: The Big Picture

true error less

Page 7: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

7

Any(!) learner that outputs a hypothesis consistent with all training examples (i.e., an h contained in VSH,D)

Page 8: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

8

What it means

[Haussler, 1988]: probability that the version space is not ε-exhausted after m training examples is at most

1. How many training examples suffice?

Suppose we want this probability to be at most δ

2. If then with probability at least (1-δ):

Page 9: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

9

Example: H is Conjunction of Boolean Literals

Consider classification problem f:XY: •  instances: X = <X1 X2 X3 X4> where each Xi is boolean •  learned hypotheses are rules of the form:

–  IF <X1 X2 X3 X4> = <0,?,1,?> , THEN Y=1, ELSE Y=0 –  i.e., rules constrain any subset of the Xi

How many training examples m suffice to assure that with probability at least 0.99, any consistent learner will output a hypothesis with true error at most 0.05?

Example: H is Decision Tree with depth=2

Consider classification problem f:XY: •  instances: X = <X1 … XN> where each Xi is boolean •  learned hypotheses are decision trees of depth 2, using

only two variables

How many training examples m suffice to assure that with probability at least 0.99, any consistent learner will output a hypothesis with true error at most 0.05?

Page 10: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

10

Sufficient condition:

Holds if learner L requires only a polynomial number of training examples, and processing per example is polynomial

Page 11: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

11

true error training error degree of overfitting

note ε here is the difference between the training error and true error

Additive Hoeffding Bounds – Agnostic Learning

•  Given m independent coin flips of coin with true Pr(heads) = θ bound the error in the maximum likelihood estimate

•  Relevance to agnostic learning: for any single hypothesis h

•  But we must consider all hypotheses in H

•  So, with probability at least (1-δ) every h satisfies

Page 12: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

12

General Hoeffding Bounds

•  When estimating parameter θ inside [a,b] from m examples

•  When estimating a probability θ is inside [0,1], so

•  And if we’re interested in only one-sided error, then

Question: If H = {h | h: X Y} is infinite, what measure of complexity should we

use in place of |H| ?

Page 13: Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/PAC... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

13

Question: If H = {h | h: X Y} is infinite, what measure of complexity should we

use in place of |H| ?

Answer: The largest subset of X for which H can guarantee zero training error (regardless of the target function c)

Machine Learning 10-701tom/10701_sp11/slides/Overfitting_ProbReview-1-1… · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January

10-701 Machine Learning: Assignment 4aarti/Class/10701_Spring14/assignments/hw4_solu… · 10-701 Machine Learning: Assignment 4 ... Your solutions for this assignment need to be

MECE 701 Fundamentals of Mechanical Engineering. MECE 701 Engineering Mechanics Machine Elements & Machine Design Mechanics of Materials Materials Science

Solution of Final Exam : 10-701/15-781 Machine Learning

Machine Learning 10-701tom/10701_sp11/slides/HMMs_3_22-2011_ann.pdf · 2011-03-23 · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University

10-701/15-781 Machine Learning - Midterm Exam, Fall 2010

10-701 Machine Learning: Assignment 4aarti/Class/10701_Spring14/... · 10-701 Machine Learning: Assignment 4 Due on April 27, 2014 at 11:59am Barnabas Poczos, Aarti Singh ... 0 100

Machine Learning 10-701tom/10701_sp11/slides/Kernels_SVM2_04_1… · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 12, 2011

Choice of activation gate - Carnegie Mellon School of ...lwehbe/10701_S19/files/Lecture_10.pdf · 10-701 Introduction to Machine Learning (PhD) Lecture 10: SVMs Leila Wehbe Carnegie

Machine Learning 10-701tom/10701_sp11/slides...Mar 29, 2011 · 1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University March 29, 2011 Today:

Announcementslwehbe/10701_S19/files/Lecture_5.pdf · 2019-02-05 · 10-701 Introduction to Machine Learning (PhD) Lecture 5: Naive Bayes Leila Wehbe Carnegie Mellon University Machine

Machine Learningtom/10701_sp11/slides/...2011/01/11 · 1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 11, 2011 Today: •

Machine Learning 10-701 - Carnegie Mellon School of …tom/10701_sp11/slides/NBayes-1-2… · · 2011-01-20– Special case: what if we add two copies: X i = X k. 11 ... Stimuli

Machine(Level,Programming,I:,Basics,latemar.science.unitn.it/LODE/Architettura_degli_elaboratori_2013/...Carnegie Mellon 2 Today:,Machine,Programming,I:,Basics, HistoryofIntelprocessorsandarchitectures

10-701 Final Exam, Spring 2007 - Carnegie Mellon University

Han Zhao Machine Learning Department Carnegie Mellon ...hzhao1/files/hanzhao_rs.pdf · Machine Learning Department Carnegie Mellon University [email protected] Background: The recent

Machine Learning - Carnegie Mellon University

Machine Learning | CMU | Carnegie Mellon …Machine Learning Department Carnegie Mellon University [email protected] Abstract We consider modeling time series data with time-varying

Machine Learning 10-701 - Carnegie Mellon University

Active Learning - Carnegie Mellon School of Computer Sciencetom/10701_sp11/recitations/Recitation_13.pdf · Active Learning Yi Zhang 10-701, Machine Learning, Spring 2011 April 20th,

Machine Learning Tuomas Sandholm Carnegie Mellon University Computer Science Department

10-701/15-781, Fall 2006, Midterm - Carnegie Mellon University

Machine Learning 10-701tom/10701_sp11/slides/Kernels_SVM_04_7...1 Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 7, 2011 Today:

Jure Leskovec Machine Learning Department Carnegie Mellon University

Tom M. Mitchell Machine Learning Department Carnegie Mellon …tom/10701_sp11/slides/MLE_MAP_1-18-11-ann... · 2011-01-18 · 1 Machine Learning 10-701 Tom M. Mitchell Machine Learning

Machine Learning 10-701 - Carnegie Mellon School of ...tom/10701_sp11/slides/Kernels_SVM_04_7... · Machine Learning 10-701 ... space, by operating on it in its original space

Spectral Clustering - Carnegie Mellon School of …aarti/Class/10701/slides/Lecture21_2.pdf · Spectral Clustering Aarti Singh Machine Learning 10-701/15-781 Nov 22, 2010 Slides Courtesy:

10-701/15-781 Machine Learning - Midterm Exam, Fall 2010aarti/Class/10701/exams/midterm2010f_sol.pdf · 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 ... You can use any

Machine Learning 10-701 - cs.cmu.edu

10-701 Introduction to Machine Learningepxing/Class/10701/files/homework3... · 2015-12-07 · 10-701 Introduction to Machine Learning Homework 3,version 1.4 Due Oct 30, 11:59 am

Machine Learning 10-701tom/10701_sp11/slides/GrMod2_2_15... · Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 15, 2011

Carnegie Mellon Today: Machine Programming I: Basics · DAC 2001 Tutorial ©R.A. Rutenbar, 2001 1 Carnegie Mellon 1 Today: Machine Programming I: Basics History of Intel processors

GraphLab: A New Framework For Parallel Machine … A New Framework For Parallel Machine Learning Yucheng Low Carnegie Mellon University [email protected] Joseph Gonzalez Carnegie Mellon

Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background

10-701/15-781, Machine Learning: Homework 5