supervised learning & classification, part i reading: dh&s, ch 1

Supervised Learning &

Classification, part I

Reading: DH&S, Ch 1

Administrivia...

•Pretest answers back today

•Today’s lecture notes online after class

•Apple Keynote, PDF, PowerPoint

•PDF & PPT auto-converted; may be flakey

Your place in history•Yesterday:

•Course administrivia

•Fun & fluffy philosophy

•Today:

•The basic ML problem

•Branches of ML: the 20,000 foot view

•Intro to supervised learning

•Definitions and stuff

Pretest results: trends

•Courses dominated by math, stat; followed by algorithms; followed by CS530; followed by AI & CS500

•Proficiencies: probability > algorithms > linear algebra

•μ=56%

•σ=28%

The basic ML problem

“Emphysema”

World

Super

vised

f(⋅)

•Our job: Reconstruct f() from observations

•Knowing f() tells us:

•Can recognize new (previously unseen) instances

•Classification or discrimination

Hashimoto-Pritzker


f(⋅) ???

•Our job: Reconstruct f() from observations•Knowing f() tells us:•Can synthesize new data (e.g., speech or images)•Generation


Randomsource

Emphysema

f(⋅)

The basic ML problem•Our job: Reconstruct f() from observations


•Can help us understand the process that generated data

•Description or analysis

•Can tell us/find things we never knew

•Discovery or data mining

f(⋅)

How many clusters (“blobs”) are there?Taxonomy of data?Networks of relationships?Unusual/unexpected things?Most important characteristics?

The basic ML problem•Our job: Reconstruct f() from observations


•Can help us act or perform better

•Control

Turn left?Turn right?Accelerate?Brake?Don’t ride in

the rain?

A brief taxonomyAll All MLML

(highly

abbreviat

ed)

- have “inputs”- have “outputs”- find “best” f()

- have “inputs”- no “outputs”- find “best” f()

- have “inputs”- have “controls”- have “reward”- find “best” f()

SupervisSuperviseded

UnsuperviUnsupervisedsed

ReinforcemReinforcementent

LearningLearning

A brief taxonomyAll All MLML

SupervisSuperviseded

UnsuperviUnsupervisedsed

ReinforcemReinforcementent

LearningLearning

(highly

abbreviat

ed)

ClassificatiClassificationon RegressionRegression

Discrete outputs Continuous outputs

A classic example: digitsThe post office wants to be able to

auto-scanenvelopes, recognize addresses, etc.

87131

???

Digits to bits

255, 255, 127, 35, 0, 0 ...

255, 0, 93, 11, 45, 6 ...

Feature vectorDigitize (sensors)

Measurements & features•The collection of numbers from the sensors:

•... is called a feature vector, a.k.a.,

•attribute vector

•measurement vector

•instance

255, 0, 93, 11, 45, 6 ...

•Written

•where

•d is the dimension of the vector

•Each is drawn from some range

•E.g., or or

Measurements & features

•Features (attributes, independent variables) can come in different flavors:

•Continuous

•Discrete

•Categorical or nominal

More on features

•We (almost always) assume that the set of features is fixed & of finite dimension, d

•Sometimes quite large, though (d≥100,000 not uncommon)

•The set of all possible instances is the instance space or feature space,

•

•

•

More on features

•We (almost always) assume that the set of features is fixed & of finite dimension, d

•Sometimes quite large, though (d≥100,000 not uncommon)

•The set of all possible instances is the instance space or feature space,

•

More on features

•Every example comes w/ a class

•A.k.a., label, prediction, dependent variable, etc.

•For classification problems, class label is categorical

•For regression problems, it’s continuous

•Usually called dependent or regressed variable

•We’ll write

•E.g.,

Classes

255, 255, 127, 35, 0, 0 ...

255, 0, 93, 11, 45, 6 ...

“7”

“8”

Classes, cont’d

•The possible values of the class variable is called the class set, class space, or range

•Book writes indiv classes as

•Presumably whole class set is:

•So

A very simple example

I. setosa I. versicolor I. virginica

Sepal lengthSepal widthPetal lengthPetal width

Feature space,

A very simple example

I. setosa I. versicolor I. virginica

Class space,

Training data•Set of all available data for learning == training data

•A.k.a., parameterization set, fitting set, etc.

•Denoted

•Can write as a matrix, w/ a corresponding class vector:

Finally, goals•Now that we have and , we have a (mostly) well defined job:

Find the function

that most closely approximates the “true” function

The supervised learning problem:

Goals?•Key Questions:

•What candidate functions do we consider?

•What does “most closely approximates” mean?

•How do you find the one you’re looking for?

•How do you know you’ve found the “right” one?

supervised learning & classification, part i reading: dh&s, ch 1

Documents

problem f

flakey slide

data mining f

set of features

measurements features

feature vector

features attributes

feature space