tying up loose ends. understand your data no answers available, only data

Tying up loose ends

Understand your data

No answers available, only data

No answers available, only dataClustering, SOM, Hebbian learning,

PCA…

Training includes inputs and correct answers

Training includes inputs and correct answers

Perceptron, Backprop, POS tagging

Probability of Y given X

Probability of Y given XOr the most likely Y given X

Probability of Y given XOr the most likely Y given XCollaborative Filtering – people who

like X probably like Y

Probability of Y given XOr the most likely Y given XCollaborative Filtering – people who

like X probably like YNeural Networks – input X triggers Y

output (behaviorism)

Input retrieves similarities or correlations as output

X is a…

X is a…X is A or B or C or D

X is a…X is A or B or C or DX is 1 or 0

X is a…X is A or B or C or DX is face or not-face

Goal is prediction

Goal is predictionClassification is a type of association

Goal is predictionClassification is a type of association Includes pattern recognition: OCR,

faces, diagnosis, speech, NLP…

Goal is predictionClassification is a type of association Includes pattern recognition: OCR,

faces, diagnosis, speech, NLP… Includes compression

If the output is a continuous number

If the output is a continuous numberEx. Automatic steering

inputs: sensors (video, GPS, proximity…)

output: degree of rotation of the wheel Ex. ALVINN

Backprop Neural Nets work for both

Different algorithms use different error calculations


Simplest : # wrong / # total ie. 2/5 = .4 or 40%


Simplest : # wrong / # total ie. 2/5 = .4 or 40%

Other examples: WER Mean Squared Error

InputInput OutputOutput






InputInput OutputOutputInputInput OutputOutput









InputInput OutputOutputInputInput OutputOutput Validation

Training









Fold 1

Fold 2

Fold 3

Fold 4

Fold 5









Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Train

Test

-> Learner 1 error = .01









Fold 1

Fold 2

Fold 3

Fold 5

Fold 4

Train

Test










Fold 1

Fold 2

Fold 3

Fold 5

Fold 3

Train

Test


If errors between folds vary greatly this indicated bias in training

Over-fitting – too much training

Over-fittingMisrepresentative data

tying up loose ends. understand your data no answers available, only data

Documents