instructor : omid sojoodi faculty of electrical, computer and...

343
Instructor : Omid Sojoodi Faculty of Electrical, Computer and IT Engineering Qazvin Azad University Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Upload: others

Post on 22-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 2: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Machine learning: the study of self-modifying computer systems that can acquire new knowledge and improve their own performance

Survey on machine learning techniques:

induction from examples Bayesian learning artificial neural networks instance-based learning genetic algorithms reinforcement learning unsupervised learning and biologically motivated learning algorithms

2

Page 3: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Text Book: Ethem Alpaydin (2010) “Introduction to Machine Learning”, 2nd edition. MIT Press. Tom Mitchell (1997) “Machine Learning”, McGraw-Hill

Grading: Assignments (20%) Presentation (30%) Final Examination (50%)

3

Page 4: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Machine Learning at AAAI Journal of Machine Learning Research Journal of Machine Learning Gossip (ML humor) mdl-research.org Machine Learning Database Repository at UC Irvine David Aha's list of machine learning resources Avrim Blum's Machine Learning Page UCI - Machine Learning Repository UTCS Machine Learning Research Group Microsoft Bayesian Network Editor (MSBNx) Weka 3 -- Machine Learning Software in Java Journal of AI Research (online text) C5/See5 MLC++, A Machine Learning Library in C++ Web->KB project DELVE-Data for Evaluating Learning

4

Page 5: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Introduction (A1, M1) (A: Alpaydin, M: Mitchell)

Supervised Learning (A2, M7) Bayesian Learning (A3, A16, M6) Parametric Methods (A4) Dimensionality Reduction (A6) Decision Tree (A9, M3) Multilayer Perceptron (A11, M4) Kernel Machine (A13) Reinforcement Learning (A18, M13) Clustering (A7) Machine Learning Experiments (A19) Genetic Algorithms (M9)

5

Page 6: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Neural Networks: perceptrons, backpropagation, etc.

Pattern analysis: Bayesian learning, instance-based learning Artificial intelligence:

decision trees (in some courses) Statistics:

hypothesis testing Data Mining:

associations rule and classification (Relatively) unique to this course:

concept learning, computational learning theory, genetic algorithms, reinforcement learning, decision trees (in depth treatment)

6

Page 7: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Theory-bounded Research: Networked Data Classification Classification of Uncertain Data Similarity-based Dimension Reduction (NCA, NDA, DNDA, SDA, …) Nearest Neighbor Classification in Non-stationary environments Learning Distance Metric Data Stream Classification Ensemble Classifiers …

7

Page 8: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Application-based Research: ML Methods in Sensor Network ML Methods in Control and Robotics ML Methods in Weather Forecasting ML Methods in Financial Problems ML Methods in Filtering (Spam Filtering/Document) ML Methods in Machine Vision ML Methods in Biomedical Engineering (Biomedical Image/Signal Processing) …

8

Page 9: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

How can machines (computers) learn? How can machines improve automatically with experience? Benefits:

Improved performance Automated optimization New uses of computers Reduced programming Insights into human learning and learning disabilities

9

Page 10: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Current status: Yet unsolved problem. Theoretical insights emerging. Practical applications. Huge data volume demands ML, and provides opportunity to ML (data mining).

State of the art: speech recognition medical predictions fraud detection drive autonomous vehicles (highway and desert) board games (backgammon, chess) theoretical bounds on error, number of inputs needed, etc.

10

Page 11: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

A program is said to learn from experience E with respect to task T and performance measure P, P in T increase with E.

Examples: Playing checkers, Handwriting recognition, Robot driving, etc.

Goal of ML: “define precisely a class of problems that encompasses interesting forms of learning, to explore algorithms that solve such problems, and to understand the fundamental structure of learning problems and processes” (Mitchell, 1997)

11

Page 12: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Training experience: direct vs. indirect (learning to play checkers)

problem of credit assignment degree of control over training examples (teacher-dependent or learner-generated) closeness of training example distribution to true distribution over which P is measured: in many cases, ML algorithms assume that both distributions are similar, which may not be the case in practice.

12

Page 13: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Remaining design choices:

The exact type of knowledge to be learned. A representation for this target knowledge. A learning mechanism.

13

Page 14: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Type of knowledge to be learned: for example, we want to learn the best move in a board game. Can represent as a function (B: board states, M: moves):

ChooseMove : B M, but it is hard to learn directly.

14

Page 15: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Another function (B: board states, R: real numbers):

V : B R, which gives the evaluation of each board state.

V (b = win) = 100 V (b = lose) = −100 V (b = draw) = 0 V (b = otherwise) = V (b0), where b0 is the best final

board state that can be reached from b. However, this is not efficiently computable, i.e., it is a

nonoperational definition. Goal of ML is to find an operational description of V ,

however, in practice, an approximation is all we can get.

15

Page 16: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Given an ideal target function V , we want to learn an approximate function :

Trade-off between rich and parsimonious representation. Example: as a linear combination of number of pieces, number of particular relational situations in the board (e.g., threatened), etc. (represented as xi) in board configuration b:

where wi are the weight values to be learned.

Advantage of the above representation: reduction of scope (or dimensionality) from the original problem.

16

n

iii xwwbV

10)(ˆ

Page 17: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Given board state and true V , we want to learn the weights wi that specify .

Start with a set of a large number of input-target pairs < b, Vtrain(b) >. Problem: cannot come up with a full set of

< b, Vtrain(b) > pairs. Solution: If Vtrain(b) is unknown, set it to the estimated of its successor board state:

Vtrain(b) = train(Successor(b)).

17

Page 18: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Last component in defining a learning algorithm: adjustment of weights.

Want to learn weights wi that best fit the set of training samples {< b, Vtrain(b) >}.

How to define best fit? Once we have we can calculate all (b) for all b in the training set, and calculate the error (here MSE)

How to reduce E?

18

||

))(ˆ)(()(,

2

settraining

bVbVE settrainingbVb

traintrain

Page 19: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Least Mean Squares (LMS) learning rule to minimize MSE: Until weights converge : For each training example < b, Vtrain(b) > 1) Use the current weights to calculate (b) 2) For each weight wi, update it as wi wi + η(Vtrain(b) − (b))xi, where η is a small learning rate constant

The error Vtrain(b) − (b) and the input xi both contribute to the weight update.

19

Page 20: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Training experience: against experts, against self, table of correct moves, ... Target function: board move, board value, ... Representation of target function: polynomial, linear function of small number of features, artificial neural network, … Learning algorithm: gradient descent, linear programming, Genetic Algorithm, ...

20

Page 21: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Useful to think of ML as searching a very large space of possible hypotheses to best fit the data and the learner’s prior knowledge. For example, the hypothesis space for would be all possible s with different weight assignment. Useful concepts regarding hypothesis space search:

Size of hypothesis space Number of training examples available/needed. Confidence in generalizing to new unseen data.

21

Page 22: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

What algorithms exist for generalizable learners given specific training set? Requirements for convergence? Which algorithms are best for a particular domain? How much training data needed? Bounds on confidence, based on data size? How long to train? Use of prior knowledge? How to choose best training experience? Impact of the choice? How to reduce ML problem to function approximation? How can learner alter the representation itself?

22

Page 23: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Supervised learning: input-target pairs given. Unsupervised learning: only input distribution is given. Reinforcement learning: sparse reward signal is given for action based on sensory input; environment-altering actions.

23

Page 24: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Can machines themselves formulate their own learning tasks?

Can they come up with their own representations? Can they come up with their own learning strategy? Can they come up with their own motivation?Can they come up with their own questions/problems?

What if the machines are faced with multiple, possibly conflicting tasks? Can there be a meta learning algorithm? What if performance is hard to measure (i.e., hard to quantify, or even worse, subjective)? 24

Page 25: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 26: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Machine learning is programming computers to optimize a performance criterion using example data or past experience. There is no need to “learn” to calculate payroll Learning is used when:

Human expertise does not exist (navigating on Mars), Humans are unable to explain their expertise (speech recognition) Solution changes in time (routing on a computer network) Solution needs to be adapted to particular cases (user biometrics)

2

Page 27: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learning general models from a data of particular examples Data is cheap and abundant (data warehouses, data marts); knowledge is expensive and scarce. Example in retail: Customer transactions to consumer behavior: People who bought “chips” also bought “yogurt” Build a model that is a good and useful approximation to the data.

3

Page 28: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

There is a connection between machine learning and data mining Data mining:

“…the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.” (Hand et al., 2001)

4

Page 29: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

5

Define Problem

Data Collection

Data Preparation

Data Modeling

Interpretation/Evaluation

Implement /Deploy Model

Machine Learning

Page 30: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Retail: Market basket analysis, Customer relationship management (CRM) Finance: Credit scoring, fraud detection Manufacturing: Control, robotics, troubleshooting Medicine: Medical diagnosis Telecommunications: Spam filters, intrusion detection Bioinformatics: Motifs, alignment Web mining: Search engines ...

6

Page 31: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Optimize a performance criterion using example data or past experience. Role of Statistics: Inference from a sample Role of Computer science: Efficient algorithms to

Solve the optimization problem Representing and evaluating the model for inference

7

Page 32: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Association Supervised Learning

Classification Regression

Unsupervised Learning Reinforcement Learning

8

Page 33: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Basket analysis: P (Y | X ) probability that somebody who buys

X also buys Y where X and Y are products/services.

Example: P ( chips | yogurt ) = 0.7

9

Page 34: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

Example: Credit scoring Differentiating between low-risk and high-risk customers from their income and savings

Discriminant: IF income > θ1 AND savings > θ2 THEN low-risk ELSE high-risk

Page 35: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Medical diagnosis: From symptoms to illnesses Biometrics: Recognition/authentication using physical and/or behavioral characteristics: Face, iris, signature, etc ...

11

Page 36: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

Training examples of a person

Test images

ORL dataset, AT&T Laboratories, Cambridge UK

Page 37: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Example: Price of a used car x : car attributes

y : price y = g (x | ) g ( ) model, parameters

13

y = wx+w0

Page 38: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Navigating a car: Angle of the steering wheel Kinematics of a robot arm

14

α1= g1(x,y) α2= g2(x,y)

α1

α2

(x,y)

Response surface design

Page 39: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Prediction of future cases: Use the rule to predict the output for future inputs Knowledge extraction: The rule is easy to understand Compression: The rule is simpler than the data it explains Outlier detection: Exceptions that are not covered by the rule, e.g., fraud

15

Page 40: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learning “what normally happens” No output Clustering: Grouping similar instances Example applications

Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs (sequence of amino acid) Document clustering

16

Page 41: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Given: a set (sample) of data (observations). Task: build a model of the process that generated the data. This time however, we’re not trying to learn about the

relationship between inputs and outputs, we just want to find (and/or take advantage of) structure in the data.

Sometimes called descriptive (rather than predictive) modeling. Problems that can be framed as unsupervised learning: dimensionality reduction, compression, probability density estimation.

17

Page 42: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learning a policy: A sequence of outputs No supervised output but delayed reward Credit assignment problem Game playing Robot in a maze Multiple agents, partial observability, ...

18

Page 43: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

UCI KDD Archive: http://kdd.ics.uci.edu/summary.data.application.html

Statlib: http://lib.stat.cmu.edu/

Delve: http://www.cs.utoronto.ca/~delve/

19

Page 44: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Journal of Machine Learning Research www.jmlr.org Machine Learning Neural Computation Neural Networks IEEE Transactions on Neural Networks IEEE Transactions on Pattern Analysis and Machine Intelligence Annals of Statistics Journal of the American Statistical Association ...

20

Page 45: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

International Conference on Machine Learning (ICML) European Conference on Machine Learning (ECML) Neural Information Processing Systems (NIPS) Uncertainty in Artificial Intelligence (UAI) Computational Learning Theory (COLT) International Conference on Artificial Neural Networks (ICANN) International Conference on AI & Statistics (AISTATS) International Conference on Pattern Recognition (ICPR) ...

21

Page 46: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 47: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Class C of a “family car” Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car?

Output: Positive (+) and negative (–) examples

Input representation: x1: price, x2 : engine power

2

Page 48: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Nt

tt ,r 1}{xX

negative is if positive is if

xx

01

r

3

2

1

xx

x

Page 49: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2121 power engine AND price eepp

4

Page 50: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

negative is says if positive is says if

)(xx

xhh

h01

N

t

tt rhhE11 x)|( X

5

Error of h on H

Page 51: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

most specific hypothesis, S

most general hypothesis, G

h H, between S and G is consistent and make up the version space (Mitchell, 1997)

Page 52: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Choose h with largest margin

7

Page 53: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

N points can be labeled in 2N ways as +/– H shatters N if there

exists h H consistent for any of these: VC(H ) = N

8

An axis-aligned rectangle shatters 4 points only !

Page 54: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ?

(Blumer et al., 1989)

Each strip is at most ε/4 Pr that we miss a strip 1‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4)N

Pr that N instances miss 4 strips 4(1 ‒ ε/4)N

4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)

4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

9

Page 55: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Use the simpler one becauseSimpler to use

(lower computational complexity)

Easier to train (lower space complexity)

Easier to explain (more interpretable)

Generalizes better (lower variance - Occam’s razor)

10

Page 56: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Nt

tt ,r 1}{xX

, i f i f

ijr

jt

it

ti C

Cxx

01

, i f i f

ijh

jt

it

ti C

Cxx

x01

11

Train hypotheses hi(x), i =1,...,K:

Page 57: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

01 wxwxg

012

2 wxwxwxg

12

N

t

tt xgrN

gE1

21X|

N

t

tt wxwrN

wwE1

20101

1X|,

tt

t

Nt

tt

xfr

r

rx 1,X

Page 58: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f

13

Page 59: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

There is a trade-off between three factors (Dietterich, 2003):

1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E, on new data As N↑ E As c (H)↑ first E and then E ↑

14

Page 60: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

To estimate generalization error, we need data unseen during training. We split the data as

Training set (50%) Validation set (25%) Test (publication) set (25%)

Resampling when there is few data

15

Page 61: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

1. Model:

2. Loss function:

3. Optimization procedure:

|xg

t

tt grLE |,| xX

16

X|min arg* E

Page 62: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 63: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

The world →unknown process →data.

Because of our lack of knowledge about the process, we model it as a random process and use probability theory to analyse it.

Result of tossing a coin is ϵ {Heads,Tails} Random var D ϵ{1,0}

Bernoulli: P {D=1} = poD (1 ‒ po)(1 ‒ D)

Sample: D = {Dt } Nt =1

Estimation: po = # {Heads}/#{Tosses} = ∑t Dt / N

Prediction of next toss: Heads if po > ½, Tails otherwise

2

Page 64: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk

Input: D = [D1,D2]T ,Output: h ϵ {0,1} Prediction:

otherwise 0 )|0()|1( if 1

choose

or otherwise 0

50)|1( if 1 choose

2121

21

hh

hh

,ddhP ,ddhP

. ,ddhP

3

Page 65: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

DpDpPDP hhh | |

1|1|000|11|

110

DPDpPDpPDpDp

PP

hhhhhh

hh

4

posterior

likelihood prior

evidence

Page 66: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

5 )()|(maxarg

)()()|(maxarg

)|(maxarg

hPhDPDP

hPhDP

DhPh

Hh

Hh

HhMAP

Generally want the most probable hypothesis given the training data Maximum a posteriori hypothesis hMAP :

DpDpPDP hhh | |

Page 67: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

If all hypotheses are equally probable a priori:

jij hhhPP ,,)(ih

then, hMAP reduces to:

)|(maxarg hDPhHh

ML

Maximum Likelihood hypothesis

Page 68: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

7

Does patient have cancer or not? A patient takes a lab test and the result comes back positive. The test returns a correct positive result in only 98%, of the cases in which the disease is actually present, and a correct negative result in only 97% of the cases in which the disease is not present. Furthermore, .008 of the entire population have this cancer. P(cancer)=0.008, P(+|cancer)=0.98, P(+|~cancer)=0.03, P(~cancer)=0.992, P(-|cancer)=0.02, P(-|~cancer)=0.97 How does P(cancer|+) compare to P(~cancer|+)? What is hMAP ?

Page 69: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

8

P(cancer|+) = P(+|cancer) P(cancer) / P(+) = (0.98)(0.008) / P(+) = 0.0078 / P(+) P(~cancer|+) = P(+|~cancer) P(~cancer) / P(+) = (0.03)(0.992) / P(+) = 0.0298 / P(+)

hMAP= ~cancer

Page 70: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

9

What is the most probable hypothesis given the training data, vs.

What is the most probable classification?

Example: P(h1|D)= 0.4 P(-|h1)=0 P(+|h1)=1 P(h2|D)= 0.3 P(-|h2)=1 P(+|h2)=0 P(h3|D)= 0.3 P(-|h3)=1 P(+|h3)=0 - New instance x is classified as - or + ? hMAP= h1 + But P(-|x)= .6 -

Page 71: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

If a new instance can take classification vj ϵ V , then the probability P(vj |D) of correct classification of new instance being vj is:

Hihiijj DhPhvPDvP )|()|()|(

Thus, the optimal classification is:

Hihiij

VjvDhPhvP )|()|(maxarg

Page 72: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

11

Example: P(h1|D)= 0.4 P(-|h1)=0 P(+|h1)=1 P(h2|D)= 0.3 P(-|h2)=1 P(+|h2)=0 P(h3|D)= 0.3 P(-|h3)=1 P(+|h3)=0

6.0)|()|(

4.0)|()|(

i

i

hii

hii

DhPhP

DhPhPx is classified as -

Page 73: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

Given attribute values <a1, a2, ..., an>, give the classification vϵV :

),,|(maxarg 1 njVjv

MAP aavPv

)()|,,(maxarg),,(

)()|,,(maxarg

1

1

1

jjnVjv

n

jjn

VjvMAP

vPvaaPaaP

vPvaaPv

Want to estimate P(a1, a2, ..., an|vj ) and P(vj ) from training data.

Page 74: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

• P(vj ) is easy to calculate: Just count the frequency. • The naive Bayes classifier simply assumes that the

attribute values are conditionally independent given the target value, thus

n

ijij

VvNB vaPvP

jv

1

)|()(maxarg

Page 75: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

14

Naive Bayes Learn(examples) For each target value vj Pe(vj ) estimate P(vj ) For each attribute value ai of each attribute a Pe(ai|vj ) estimate P(ai|vj ) Classify New Instance(x)

n

ijieje

VvNB vxPvP

jv

1

)|()(maxarg

Page 76: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

15

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Page 77: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

16

Consider Play Tennis again, and new instance: x = <Outlk = sunny, Temp = cool, Humid = high, Wind = stron> V = {Yes,No}

noxPlayTennisanswernostrongPnohighPnocoolPnosunnyPnoP

yesstrongPyeshighPyescoolPyessunnyPyesP

vaPvPvi

kiknoyesv

NBk

)(:1)|()|()|()|()(

005.0)|()|()|()|()(

)|()(maxarg],[

0.02

9/14 3/9

3/5

Page 78: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

In above example: P(Wind = strong| Play Tennis = no) = nc /n = 3/5 If nc ≈ 0 then P=1/k , k is the number of possible value for the attribute

K= 2 for wind attribute ( weak, strong) p=.5 m is a constant ( equivalent sample size)

17

0)|(i

ki vaP

mnmpnc m-estimate of probability

Page 79: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

General ways of measuring performance/error Actions: αi (e.g. assign class Ci to some input) Loss of αi when the state is Ck : λik Expected risk (Duda and Hart, 1973) (for taking action αi)

xx

xx

|min| if choose

||

kkii

k

K

kiki

RR

CPR1

18

Page 80: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

kiki

ik if if

10

x

x

xx

|

|

||

i

ikk

K

kkiki

CP

CP

CPR

1

1

19

For minimum risk, choose the most probable class

Page 81: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

101

10

otherwise

if if

,Kiki

ik

xxx

xx

|||

||

iik

ki

K

kkK

CPCPR

CPR

11

1

otherwise reject 1| and || if choose xxx ikii CPikCPCPC

20

Page 82: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Kigi ,, , 1xxx kkii ggC max if choose

xxx kkii gg max|R

ii

i

i

i

CPCpCPR

g||

|

xx

xx

21

K decision regions R1,...,RK

(ok for 0/1 loss)

Page 83: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:

otherwise if

choose2

1 0C

gC x

xx

||log

2

1

CPCP

22

Page 84: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

(given) Prob of state k given evidence x: P (Sk|x) (and) Utility of αi when state is k: Uik (then) Expected utility: i.e a rational choice –maximize expected utility (usually equivalent to minimizing expected risk).

xx

xx

| max| if Choose

||

jjii

kkiki

EUEUα

SPUEU

23

Page 85: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Association rule: X Y People who buy/click/visit/enjoy X are also likely to buy/click/visit/enjoy Y. A rule implies association, not necessarily causation.

24

Page 86: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Support (X Y):

Confidence (X Y): Lift (X Y):

25

customers and bought whocustomers

##, YXYXP

XYX

XPYXPXYP

bought whocustomers and bought whocustomers

|

##

)(,

)()|(

)()(,

YPXYP

YPXPYXP

Page 87: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

For (X,Y,Z), a 3-item set, to be frequent (have enough support), (X,Y), (X,Z), and (Y,Z) should be frequent. If (X,Y) is not frequent, none of its supersets can be frequent.Once we find the frequent k-item sets, we convert them to rules: X, Y Z, ...

and X Y, Z, ...

26

Page 88: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 89: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

X = { xt }t where xt ~ p (x) Our model is a specified probability distribution

Learning = estimating its parameters Parametric estimation: Assume a form for p (x |q ) and estimate q , its

sufficient statistics, using X e.g., N ( μ, σ2) where q = { μ, σ2} Density estimation (can then be used for

classification, etc.)

2

Page 90: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Likelihood of given the sample X l (θ|X) = p (X |θ) = ∏t p (xt|θ)

Log likelihood L(θ|X) = log l (θ|X) = ∑t log p (xt|θ)

Why? Small numbers converts product to sum, removes exp(?)

Maximum likelihood estimator (MLE) θ* = argmaxθ L(θ|X)

3

Page 91: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Bernoulli: Two states, failure/success, x in {0,1} P (x) = po

x (1 – po ) (1 – x)

L (po|X) = log ∏t poxt (1 – po )

(1 – xt) MLE: po = ∑t x

t / N

Multinomial: K>2 states, xi in {0,1} P (x1,x2,...,xK) = ∏i pi

xi L(p1,p2,...,pK|X) = log ∏t ∏i pi

xit MLE: pi = ∑t xi

t / N

4

Page 92: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2

2

2exp

2

1 x-xp

N

mxs

N

xm

t

t

t

t

2

2

p(x) = N ( μ, σ2)

MLE for μ and σ2:

5

μ σ

2

2

221 xxp exp

estimated values

True parameters

Page 93: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

Unknown parameter Estimator (of di = d (Xi) on sample Xi Bias: b (d) = E [d] – Variance: E [(d–E [d])2] Mean square error: r (d, ) = E [(d– )2] = (E [d] – )2 + E [(d–E [d])2] = Bias2 + Variance

Page 94: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

7

• Bias: how much the expected value of the estimator varies from the correct

• Variance: variation around the expected value.

• Examples: • Sample mean, m, is an unbiased estimator of the true mean.

It’s also a consistent estimator, since Var(m) tends to zero as N tends to infinity.

• Sample variance (as it turns out) is a biased estimator of the

true variance.

Page 95: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Treat θ as a random var with prior p (θ) Bayes’ rule: p (θ|X) = p(X|θ) p(θ) / p(X) Full: p(x|X) = ∫ p(x|θ) p(θ|X) dθ

i.e an average over predictions using all values of θ, weighted by the probability of each θ value.

Maximum a Posteriori (MAP): θMAP = argmaxθ p(θ|X) This uses a priori

Maximum Likelihood (ML): θML = argmaxθ p(X|θ) This doesn’t have a prior. If prior is flat, MAP == ML.

Bayes’: θBayes’ = E[θ|X] = ∫ θ p(θ|X) dθ

8

Page 96: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

xt ~ N (θ, σo2) and θ ~ N ( μ, σ2)

θML = m θMAP = θBayes’ =

220

2

220

20

11

1 ///

///|

Nm

NNE X

9

Page 97: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

iii

iii

CPCxpxg

CPCxpxg

log| loglyequivalentor

|

ii

iii

i

i

ii

CPxxg

xCxp

log log log

exp|

2

2

2

2

22

21

221

10

Page 98: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

11

Given the sample ML estimates are Discriminant becomes

Nt

tt ,rx 1}{X

x , i f

i f ijx

xr

jt

it

ti C

C01

t

ti

t

tii

t

i

t

ti

t

ti

t

it

ti

i r

rmxs

r

rxm

N

rCP

2

2 ˆ

ii

iii CP

smxsxg ˆ log log log 2

2

22

21

Page 99: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

Equal variances

Single boundary at halfway between means

Page 100: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

Variances are different

Two boundaries

Page 101: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2

20,N

,N

|~|

~

|:estimator

xgxrp

xgxfr

N

t

tN

t

tt

N

t

tt

xpxrp

rxp

11

1

log| log

, log|XL

14

We want to learn the parameters of our model via Max. Likelihood

Page 102: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2

1

2

12

2

2

1

|21|

|2

12log

2|exp

21 log|

N

t

tt

N

t

tt

ttN

t

xgrE

xgrN

xgr

X

XL

15

Important: so maximizing L is equivalent to minimizing MSE (assuming Gaussian noise)

Page 103: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

0101 wxwwwxg tt ,|

t

t

t

tt

t

t

t

t

t

t

xwxwxr

xwNwr

210

10

t

t

tt

t

t

t

t

tt

t

xr

r

ww

xx

xNyw

1

02A

yw 1A

16

Page 104: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

012

2012 wxwxwxwwwwwxg ttktkk

t ,,,,|

NNNN

k

k

r

rr

xxx

xxxxxx

2

1

22

2222

1211

1

11

r D

rw TT DDD1

17

Page 105: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2

121 N

t

tt xgrE ||X

18

Square Error: Relative Square Error: Absolute Error: E (θ |X) = ∑t

|rt – g(xt| θ)| ε-sensitive Error: E (θ |X) = ∑ t 1(|rt – g(xt| θ)|>ε) (|rt – g(xt|θ)| – ε)

2

1

2

1N

t

t

N

t

tt

rr

xgrE

||X

Page 106: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

19

222 ||| xgExgExgExrExxgxrEE XXXX

bias variance

222 xgxrExxrErExxgrE ||||

noise squared error

Page 107: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

ti

t i

tti

t

tt

xgM

xg

xgxgNM

g

xfxgN

g

1

1

1

2

22

Variance

Bias

20

M samples Xi={xti , rt

i}, i=1,...,M

are used to fit gi (x), i =1,...,M

Page 108: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Example: gi(x)=2 has no variance and high bias

gi(x)= ∑t rt

i/N has lower bias with variance As we increase complexity,

bias decreases (a better fit to data) and variance increases (fit varies more with

data) Bias/Variance dilemma: (Geman et al., 1992)

21

Page 109: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

22

bias

variance

f

gi g

f

Page 110: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

23

Best fit “min error”

Page 111: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

24

Best fit, “elbow”

Page 112: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Cross-validation: Measure generalization accuracy by testing on data unused during training Regularization: Penalize complex models

E’=error on data + λ model complexity Akaike’s information criterion (AIC),

Bayesian information criterion (BIC) Minimum description length (MDL): Kolmogorov complexity, shortest description of data Structural risk minimization (SRM)

25

Page 113: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

datamodel model|datadata|model

pppp

26

Prior on models, p(model)

Regularization, when prior favors simpler models Bayes, MAP of the posterior, p(model|data) Average over a number of models with high posterior (voting, ensembles: Chapter 17)

Page 114: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

27

Coefficients increase in magnitude as order increases: 1: [-0.0769, 0.0016] 2: [0.1682, -0.6657, 0.0080] 3: [0.4238, -2.5778, 3.4675, -0.0002] 4: [-0.1093, 1.4356, -5.5007, 6.0454, -0.0019]

i i

N

t

tt wxgrEtionregulariza 22

121 ww ||: X

Page 115: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 116: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Reduces time complexity: Less computation Reduces space complexity: Less parameters Saves the cost of observing the feature Simpler models are more robust on small datasets More interpretable; simpler explanation Data visualization (structure, groups, outliers, etc) if plotted in 2 or 3 dimensions

2

Page 117: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Feature selection: Choosing k<d important features, ignoring the remaining d – k

Subset selection algorithms Feature extraction: Project the

original xi , i =1,...,d dimensions to new k<d dimensions, zj , j =1,...,k Principal components analysis (PCA), linear

discriminant analysis (LDA), factor analysis (FA)

3

Page 118: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

There are 2d subsets of d features Forward search: Add the best feature at each step

Set of features F initially Ø. At each iteration, find the best new feature j = argmini E ( F xi ) Add xj to F if E ( F xj ) < E ( F )

Hill-climbing O(d2) algorithm Backward search: Start with all features and remove one at a time, if possible. Floating search (Add k, remove l)

4

Page 119: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Find a low-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = wTx Find w such that Var(z) is maximized

Var(z) = Var(wTx) = E[(wTx – wTμ)2] = E[(wTx – wTμ)(wTx – wTμ)] = E[wT(x – μ)(x – μ)Tw] = wT E[(x – μ)(x –μ)T]w = wT ∑ w where Var(x)= E[(x – μ)(x –μ)T] = ∑

5

Page 120: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

01max 1222222

wwwwwww

TTT

6

Maximize Var(z) subject to ||w||=1

∑w1 = αw1 that is, w1 is an eigenvector of ∑ Choose the one with the largest eigenvalue for Var(z) to

be max Second principal component: Max Var(z2), s.t., ||w2||=1 and orthogonal to w1

∑ w2 = α w2 that is, w2 is another eigenvector of ∑ and so on.

111111

wwwww

TTmax

Page 121: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

z = WT(x – m)

where the columns of W are the eigenvectors of ∑, and m is sample mean

Centers the data at the origin and rotates the axes

7

Page 122: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

dk

k

21

21

8

Proportion of Variance (PoV) explained

when λi are sorted in descending order

Typically, stop at PoV>0.9 Scree graph plots of PoV vs k, stop at “elbow”

Page 123: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

9

Page 124: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

Page 125: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Find a small number of factors z, which when combined generate x :

xi – μi = vi1z1 + vi2z2 + ... + vikzk + εi

where zj, j =1,...,k are the latent factors with E[ zj ]=0, Var(zj)=1, Cov(zi ,, zj)=0, i ≠ j ,

εi are the noise sources E[ εi ]= ψi, Cov(εi , εj) =0, i ≠ j, Cov(εi , zj) =0 , and vij are the factor loadings

11

Page 126: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

PCA From x to z z = WT(x – μ) FA From z to x x – μ = Vz + ε

12

x z

z x

Page 127: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

In FA, factors zj are stretched, rotated and translated to generate x

13

Page 128: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Find a low-dimensional space such that when x is projected, classes are well-separated. Find w that maximizes

tttT

tt

tttT

rmsr

rm

ssmmJ

21

211

22

21

221

xwxw

w

14

Page 129: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

TBB

T

TT

TTmm

2121

2121

221

221

mmmmww

wmmmmw

mwmw

SS where

15

Between-class scatter: Within-class scatter:

2121

21

111

111

21

21

SSSS

S

S

WWT

tt

Ttt

Ttt

TttT

tt

tT

ss

r

r

rms

where

where

ww

mxmx

wwwmxmxw

xw

Page 130: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

211 mmw Wc S

16

Find w that max LDA soln: Parametric soln:

wwmmw

wwwww

WT

T

WT

BT

JSS

S2

21

,~| when iiCp μμμ 21

1

Nxw

Page 131: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Ti

ti

tt

tii

K

iiW r mxmxSSS

1

17

Within-class scatter: Between-class scatter: Find W that max

K

ii

K

i

TiiiB K

N11

1 mmmmmm S

WSW

WSWW

WT

BT

J The largest eigenvectors of SW-1SB

Maximum rank of K-1

Page 132: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

18

Page 133: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 134: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

2

Learn to approximate discrete-valued target functions. Step-by-step decision making: It can learn disjunctive

expressions: Hypothesis space is completely expressive, avoiding problems with restricted hypothesis spaces.

Inductive bias: small trees over large trees.

Page 135: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Each instance holds attribute values. Instances are classified by filtering the attribute values down the decision tree, down to a leaf which gives the final answer.Internal nodes: attribute names or attribute values. Branching occurs at attribute nodes.

3

Page 136: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

4

Page 137: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Internal decision nodes Univariate: Uses a single attribute, xi

Numeric xi : Binary split : xi > wm Discrete xi : n-way split for n possible values

Multivariate: Uses all attributes, x

Leaves Classification: Class labels, or proportions Regression: Numeric; r average, or local fit

Learning is greedy; find the best split recursively (Breiman et al, 1984; Quinlan, 1986, 1993)

5

Page 138: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

• Each path from root to leaf is a conjunctions of constraints on the attribute values. (Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook = Overcast) ∨ (Outlook = Rain ∧ Wind = Weak)

Page 139: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Good at classification problems where:

Instances are represented by attribute-value pairs. The target function has discrete output values. Disjunctive descriptions may be required. The training data may contain errors. The training data may contain missing attribute values.

7

Page 140: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Given a set of examples (training set), both positive and negative, the task is to construct a decision tree that describes a concise decision path. Using the resulting decision tree, we want to classify new instances of examples (either as yes or no).

8

Page 141: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

A trivial solution is to explicitly construct paths for each given example. In this case, you will get a tree where the number of leaves is the same as the number of training examples. The problem with this approach is that it is not able to deal with situations where, some attribute values are missing or new kinds of situations arise. Consider that some attributes may not count much toward thefinal classification.

9

Page 142: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Memorizing all cases may not be the best way. We want to extract a decision pattern that can describe a large number of cases in a concise way. In terms of a decision tree, we want to make as few tests as possible before reaching a decision, i.e. the depth of the tree should be shallow.

10

Page 143: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Basic idea: pick up attributes that can clearly separate positive and negative cases. These attributes are more important than others: the final classification heavily depend on the value of these attributes.

11

Page 144: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Main loop: 1. A the “best” decision attribute for next node 2. Assign A as decision attribute for node 3. For each value of A, create new descendant of node 4. Sort training examples to leaf nodes 5. If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes

12

Page 145: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

A1 or A2? • With initial and final number of positive and negative examples based on the attribute just tested, we want to decide which attribute is better. • How to quantitatively measure which one is better?

Page 146: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Use Shannon’s information theory to choose the attribute that give the maximum information gain.

Pick an attribute such that the information gain (or entropy reduction) is maximized.

Entropy measures the average surprisal of events. Less probable events are more surprising.

14

Page 147: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Given two events, H and T (Head and Tail): Rare (uncertain) events give more surprise:

H more surprising than T if P(H) < P(T) H more uncertain than T if P(H) < P(T) How to represent “more surprising”, or “more uncertain”?

Surprise(H) > Surprise(T) if P(H) < P(T) 1/ P(H) > 1/ P(T) log(1/ P(H))> log(1/ P(T)) - log(P(H)) > - log(P(T))

15

Page 148: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

16

• S is a sample of training examples • p+ is the proportion of positive examples in S • p- is the proportion of negative examples in S • Entropy measures the average uncertainty in S Entropy(S) ≡ −p+ log2 p+ − p- log2 p-

Page 149: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

By performing some query, if you go from state S1 with entropy E(S1) to state S2 with entropy E(S2), where E(S1) > E(S2), your uncertainty has decreased. The amount by which uncertainty decreased, i.e., E(S1) − E(S2), can be thought of as information you gained (information gain) through getting answers to your query.

17

Page 150: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

18

Ciii ppSEntropy )(log)( 2

)(||||)(),(

)(v

Avaluesv

v SEntropySSSEntropyASGain

• C: categories (classifications) • S: set of examples • A: a single attribute • Sv: set of examples where attribute A = v.

Page 151: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

19

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

• Which attribute to test first?

Page 152: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Which attribute is the best classifier?

20

• +: # of positive examples; −: # of negative examples • Initial entropy = − (9/14) log (9/14) – (5/14) log (5/14) = 0.94. • You can calculate the rest. • Note: 0.0 × log 0.0 ≡ 0.0 even though log 0.0 is not defined.

Page 153: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

tt

m

ttt

mm

tmt m

t

mm

mm

brb

gbgrN

E

mb

xx

x

xxx

otherwise node reaches :i f

21

01 X

21

Error at node m: After splitting:

tt

mj

ttt

mjmjj

tmjt mj

t

mm

mjmj

brb

gbgrN

E

jmb

xx

x

xxx

'

otherwise branch and node reaches :i f

21

01 X

Page 154: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

22

Model Selection in Trees

Page 155: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Remove subtrees for better generalization (decrease variance)

Prepruning: Early stopping Postpruning: Grow the whole tree then prune subtrees which overfit on the pruning set

Prepruning is faster, postpruning is more accurate (requires a separate pruning set)

23

Page 156: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

24

C4.5Rules (Quinlan, 1993)

Page 157: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Rule induction is similar to tree induction but tree induction is breadth-first, rule induction is depth-first; one rule at a time

Rule set contains rules; rules are conjunctions of terms Rule covers an example if all terms of the rule evaluate to true for the example Sequential covering: Generate rules one at a time until all positive examples are covered IREP (Fürnkrantz and Widmer, 1994), Ripper (Cohen, 1995)

25

Page 158: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

26

Page 159: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

27

Page 160: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

28

Page 161: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 162: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Networks of processing units (neurons) with connections (synapses) between them Large number of neurons: 1010

Large connectitivity: 105

Parallel processing Distributed computation/memory Robust to noise, failures

2

Page 163: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Levels of analysis (Marr, 1982) 1. Computational theory 2. Representation and algorithm 3. Hardware implementation Reverse engineering: From hardware to theory Parallel processing: SIMD vs MIMD

Neural net: SIMD with modifiable local memory

Learning: Update by training/experience

3

Page 164: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Neuron switching time ~.001 second (1 ms) Number of neurons ~1010

Connections per neuron ~104−5

Scene recognition time ~.1 second (100 ms) 100 processing steps doesn’t seem like enough

[ ] much parallel computation

4

Page 165: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Many neuron-like threshold switching units (real-valued) Many weighted interconnections among units Highly parallel, distributed process Emphasis on tuning weights automatically: New learning algorithms, new optimization techniques, new learning principles.

5

Page 166: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Input is high-dimensional discrete or real-valued (e.g. raw sensor input) Output is discrete or real valued Output is a vector of values Possibly noisy data Long training time (may need occasional, extensive retraining) Form of target function is unknown Fast evaluation of learned target function Human readability of result is unimportant

6

Page 167: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Speech synthesis Handwritten character recognition Financial prediction, Transaction fraud detection Driving a car on the highway

7

Page 168: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

8

otherwisexwif

xo1

0.1)(

Sometimes we’ll use simpler vector notation:

otherwisexwxwwif

xxo nnn 1

0...1),...,( 110

1

Page 169: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

9

The tunable parameters are the weights w0,w1, ...,wn, so the space H of candidate hypotheses is the set of all possible combination of real-valued weight vectors:

}|{ )1(nRwwH

Page 170: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

Perceptrons can represent basic Boolean functions. Thus, a network of perceptron units can compute any Boolean function.

What about XOR or EQUIV?

Page 171: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

11

Perceptrons can only represent linearly separable functions. • Output of the perceptron:

1isoutputthen,01isoutputthen,0

1100

1100

tIWIWtIWIW

The hypothesis space is a collection of separating lines.

Page 172: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

Rearranging: 1isoutputthen,01100 tIWIW

We get (if W1>0) ,

10

1

01 W

tIWWI

where points above the line, the output is 1, and -1 for those below the line. Compare with ,

11

0

Wtx

WWy

Page 173: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

Without the bias (t = 0), learning is limited to adjustment of the slope of the separating line passing through the origin. Three example lines with different weights are shown.

Page 174: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

14

Only functions where the -1 points and 1 points are clearly separable can be represented by perceptrons.

The geometric interpretation is generalizable to functions of n arguments, i.e. perceptron with n inputs plus one threshold (or bias) unit.

Page 175: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

15

For functions that take integer or real values as arguments and output either -1 or 1. Left: linearly separable (i.e., can draw a straight line between the classes). Right: not linearly separable (i.e., perceptrons cannot represent such a function)

Page 176: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

16

Perceptrons cannot represent XOR!

Page 177: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

17

:1isoutputthen,01100 tIWIW1 -t ≤ 0 t ≥ 0 2 W1 - t > 0 W1 > t 3 W0 - t > 0 W2 > t 4 W0 + W1 – t ≤ 0 W0 + W1 ≤ t

2t < W0 + W1 < t (from 2,3 and 4), but t ≥ 0 (from 1), a contradiction.

Page 178: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

The weights do not have to be calculated manually. We can train the network with (input,output) pair according to the following weight update rule:

wi wi +η ( t – o ) xi

where η is the learning rate parameter. Proven to converge if input set is linearly separable and η is small.

18

Page 179: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

wi wi +η ( t – o ) xi

When t = o, weight stays. When t = 1 and o = −1, change in weight is:

η ( 1 – (-1) ) xi > 0 if xi are all positive. Thus will increase, thus eventually, output o will turn to 1.

When t = -1 and o = 1, change in weight is: η ( 1 – 1 ) xi < 0 if xi are all positive. Thus will increase, thus eventually, output o will turn to -1. 19

xw.

xw.

Page 180: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

The perceptron rule cannot deal with noisy data. The delta rule will find an approximate solution even when input set is not linearly separable. Use linear unit without the step function: Want to reduce the error by adjusting

20

xwxo .)(w

Dddd otwE 2)(

21)(

Page 181: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Want to minimize by adjusting Note: the error surface is defined by the training data D. A different data set will give a different surface. E(w0,w1) is the error function above, and we want to change (w0,w1) to position under a low E.

21

Dd dd otwEw 2)(21)(:

Page 182: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Gradient Training rule:

22

nwE

wE

wEwE ,...,][

10

][wEw

ii w

Ew

Page 183: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

23

ddidd

i

ddd

idd

ddd

idd

ddd

i

ddd

ii

xotwE

xwtw

ot

otw

ot

otw

otww

E

))((

).()(

)()(221

)(21

)(21

,

2

2

Since we want d diddii

i xotwwEw ,)(,

Page 184: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Gradient-Descent (training_examples, η ) Each training example is a pair of the form < , t>, where is the vector of input values, and t is the target output value. η is the learning rate (e.g., .05).

Initialize each wi to some small random value Until the termination condition is met, Do – Initialize each wi to zero. – for each < , t> in training_examples, Do * Input the instance to the unit and compute o * for each linear unit weight wi , Do Δwi Δwi + η (t – o ) xi

– for each linear unit weight wi , Do wi wi + Δwi

24

Page 185: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Gradient descent is effective in searching through a large or infinite H:

H contains continuously parameterized hypotheses, and the error can be differentiated w.r.t the parameters.

Limitations: convergence can be slow, and finds local minima (global minimum not guaranteed).

25

Page 186: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Avoiding local minima: Incremental gradient descent, or stochastic gradient descent.

Instead of weight update based on all input in D, immediately update weights after each input example:

Δwi = η ( t – o ) xi,Instead of

Can be seen as minimizing error function

26

,)(Dd

iddi xotw

.)(21)( 2

ddd otwE

Page 187: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

In the standard version, error is defined over entire D. In the standard version, more computation is needed per weight update, but η can be larger. Stochastic version can sometimes avoid local minima.

27

Page 188: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Perceptron training rule guaranteed to succeed if Training examples are linearly separable Sufficiently small learning rate η

Linear unit training rule using gradient descent Asymptotic convergence to hypothesis with minimum squared error Given sufficiently small learning rate η Even when training data contains noise Even when training data not separable by H

28

Page 189: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Differentiable threshold unit: sigmoid Interesting property: Output: Other function:

29

)exp(11)(

yy

))(1)(()( yydy

yd

).( xwo

1)2exp(1)2exp()tanh(

yyy

Page 190: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

30

Nonlinear decision surface

Another example: XOR

Page 191: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

31

Dddidddd

i

ddi

d dd

Dd ddii

xoootwE

otw

ot

otww

E

,

2

)1(

.

.

.

)()(221

)(21

Page 192: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Initialize all weights to small random numbers. Until satisfied, Do

For each training example, Do 1. Input the training example to the network and compute

the network outputs 2. For each output unit k 3. For each hidden unit h

4. Update each network weight wi,j

Note: wji is the weight from i to j 32

))(1( kkkkk otoo

outputsk kkhhhh woo )1(

ijjijijiji xwwherewww

Page 193: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

For output unit For hidden unit In sum, is the derivate times the error.

33

Error

kk

net

kkk otook

)()1()('

erroratedBackpropag

outputskkkh

net

hhh wooh )('

)1(

Page 194: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Different formula for output and hidden For output unit For hidden unit

34

ji

dji w

Ew

inputi

neterror

jjjjji

d xoootwE

j )('

)1()(

inputi

error

jDownstreamkkjk

net

jjji

d xwoowE

j

)()('

)1(

Page 195: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Gradient descent over entire network weight vector. Easily generalized to arbitrary directed graphs. Will find a local, not necessarily global error minimum:

In practice, often works well (can run multiple times with different initial weights).

Often include weight momentum Minimizes error over training examples:

Will it generalize well to subsequent examples?

Training can take thousands of iterations slow! Using the network after training is very fast

35

)1()( ,,, nwxnw jijijji

Page 196: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Boolean functions: every Boolean function representable with two layers (hidden unit size can grow exponentially in the worst case:

one hidden unit per input example, and “OR” them).

Continuous functions: Every bounded continuous function can be approximated with an arbitrarily small error (output units are linear). Arbitrary functions: with three layers (output units are linear).

36

Page 197: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

H-space = n-D weight space (when there are n weights). The space is continuous, unlike decision tree or

general-to-specific concept learning algorithms.

Inductive bias: Smooth interpolation between data points.

37

Page 198: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

38

Page 199: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

39

Page 200: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learned encoding is similar to standard 3-bit binary code. Automatic discovery of useful hidden layer representations is a key feature of ANN. Note: The hidden layer representation is compressed.

40

Page 201: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Error in two different robot perception tasks. Training set and validation set error. Early stopping ensures good performance on unobserved samples, but must be careful. Weight decay, use of validation sets, use of k-fold cross-validation, etc. to overcome the problem.

41

Page 202: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Applications:

Sequence recognition: Speech recognition Sequence reproduction: Time-series prediction Sequence association

Network architectures Time-delay networks (Waibel et al., 1989) Recurrent networks (Rumelhart et al., 1986)

42

Page 203: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

43

Page 204: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

44

Page 205: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

45

Page 206: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

46 (Le Cun et al, 1989)

Page 207: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

47

Page 208: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Destructive Weight decay:

48

ii

ii

i

wEE

wwEw

2

2'

Constructive Growing networks

(Ash, 1989) (Fahlman and Lebiere, 1989)

Page 209: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Consider weights wi as random vars, prior p(wi) Weight decay, ridge regression, regularization

cost=data-misfit + λ complexity 49

2

2

212

w

w

www

ww

wwww

EE

wcwpwpp

Cppp

pp

ppp

ii

ii

MAP

'

)/(

ˆ

exp where

log| log| log

| log max arg ||

XX

XX

XX

Page 210: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

50

Page 211: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

ANN learning provides general method for learning real-valued functions over continuous or discrete-valued attributed. ANNs are robust to noise. H is the space of all functions parameterized by the weights. H space search is through gradient descent: convergence to local minima. Backpropagation gives novel hidden layer representations. Overfitting is an issue. More advanced algorithms exist.

51

Page 212: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 213: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Discriminant-based: No need to estimate densities first Define the discriminant in terms of support vectors The use of kernel functions, application-specific measures of similarity No need to represent instances as vectors Convex optimization problems with a unique solution

2

Page 214: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

3

1

1111

11

0

0

0

0

2

1

wr

rw

rw

wCC

rr

tTt

ttT

ttT

t

tt

ttt

xw

xwxw

wxx

x

as rewritten be can which for for

that such and find if if

where,X

(Cortes and Vapnik, 1995; Vapnik, 1995)

Page 215: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

4

Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ||w||=1, and to max margin

wxw 0wtT

twr tTt

,wxw 0

twr tTt ,121

02 xww to subject min

Page 216: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

5

Page 217: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

00

0

21

121

121

10

1

110

2

10

2

02

N

t

ttp

N

t

tttp

N

t

tN

t

tTtt

N

t

tTttp

tTt

rwL

rL

wr

wrL

twr

xww

xww

xww

xww

to subject min ,

Page 218: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

7

t and to subject

tr

rr

rwrL

tttt

tsTtst

t s

st

t

tT

t t

ttt

t

tttTTd

,002121

21

0

xx

ww

xwww

Most αt are 0 and only a small number have αt >0; they are the support vectors

Page 219: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Not linearly separable Soft error New primal is

8

ttTt wxr 10w

t

t

ttt

tttTtt

tt

p wxrCL 121

02 ww

Page 220: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

9

Page 221: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

otherwise if

tt

tttt

hinge ryry

ryL1

10),(

Page 222: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

11

t

tttt

N

t s

sTtststd

tttTt

t

t

Nr

xxrrL

wr

N

,,

,,

100

21

00

21

1

0

2

t

to subject

to subject

1 - min

xw

w

n controls the fraction of support vectors

Page 223: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Preprocess input x by basis functions z = φ(x) g(z)=wTz g(x)=wT φ(x)

The SVM solution

12

t

tttt

TtttT

t

ttt

t

ttt

Krg

rg

rr

xxx

xφxφxφwx

xφzw

,

Page 224: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Polynomials of degree q:

13

qtTtK 1xxxx ,

T

T

xxxxxx

yxyxyyxxyxyx

yxyx

K

22

212121

22

22

21

2121212211

22211

2

2221

22211

1

,,,,,

,

x

yxyx

Page 225: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Radial-basis functions:

14

2

2

2sK

tt

xxxx exp,

Page 226: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Kernel “engineering” Defining good measures of similarity String kernels, graph kernels, image kernels, ... Empirical kernel map: Define a set of templates mi and score function s(x,mi)

(xt)=[s(xt,m1), s(xt,m2),..., s(xt,mM)] and K(x,xt)= (x)T (xt)

15

Page 227: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Fixed kernel combination Adaptive kernel combination

Localized kernel combination

16

yxyxyxyx

yxyx

,,,,

,,

21

21

KKKK

cKK

i

tii

t

ttt s i

stii

stst

t

td

i

m

ii

Krg

KrrL

KK

xxx

xx

yxyx

,)(

,

,,

21

1

i

tii

t

tt Krg xxxx ,|)(

Page 228: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

1-vs-all Pairwise separation Error-Correcting Output Codes (section 17.5) Single multiclass optimization

17

02

21

00

1

2

ti

ttii

tTiz

tTz

i t

ti

K

ii

ziww

C

tt

to subject

min

,,xwxw

w

Page 229: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Use a linear model (possibly kernelized) f(x)=wTx+w0 Use the є-sensitive error function

18

otherwisei f

, tt

tttt

frfr

frex

xx

0

t

ttC2

21 wmin

00

0

tt

ttT

tTt

rw

wr

,

xwxw

Page 230: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

19

Page 231: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

20

Polynomial Kernel Gaussian Kernel

Page 232: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Consider a sphere with center a and radius R

21

t

tt

N

t s

sTtstst

t

sTttd

ttt

t

t

C

xxrrxxL

Ra

R

10

0

1

2

2

,

,

to subject

to subject

C min

x

Page 233: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

22

Page 234: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Kernel PCA does PCA on the kernel matrix (equal to canonical PCA with a linear kernel) Kernel LDA

23

Page 235: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 236: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

How an autonomous agent that sense and act in the environment can learn to choose optimal actions to achieve its goals. Examples: mobile robot, optimization in process control, board games, etc. Ingredients: reward/penalty for each action, where the reinforcement signal can be significantly delayed. One approach: Q learning

2

Page 237: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Terminology: State: state of the environment, obtained through sensors Action: alter the state Policy: choosing actions that achieve a particular goal, based on the current state. Goal: desired configuration (or state).

Desired policy: From any initial state, choose actions that maximize the reward accumulated over time by the agent.

3

Page 238: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Agent has a state in an environment, takes an action and sometimes receives reward and the state changes Goal: learn to choose actions that maximize discounted, cumulative award: That is, we want to learn a policy : S A that maximizes the above, where S is the set of states, and A that of actions.

4

Agent

Environment

Action Reward State

...)/(2

)/(1

)/(0

221100 rarara SSS

.10...,22

10 whererrr

Page 239: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

5

Among K levers, choose the one that pays best Q(a): value of action a Reward is ra Set Q(a) = ra Choose a* if Q(a*)=maxa Q(a)

Rewards stochastic (keep an expected reward) aQaraQaQ tttt 11

Page 240: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Deterministic vs. nondeterministic action outcomes. With or without prior knowledge about the effect of action on environmental state. Partially or fully known environmental state (e.g., Partially Observable Markov Decision Process [POMDP]).

6

Page 241: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

st : State of agent at time t at: Action taken at time t In st, action at is taken, clock ticks and reward rt+1 is received and state changes to st+1 Next state prob: P (st+1 | st , at ) Reward prob: p (rt+1 | st , at ) Initial state(s), goal state(s) Episode (trial) of actions from initial state to goal (Sutton and Barto, 1998; Kaelbling et al., 1996)

7

Page 242: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Policy, Value of a policy, Finite-horizon: Infinite horizon:

8

tt sa : AS

tsV

T

iitTtttt rErrrEsV

121

rate discount the is 101

13

221

iit

itttt rErrrEsV

Page 243: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

9

1111

111

11

11

11

1

1

11

1

ttastttttt

ttttat

ts

ttttat

tta

iit

ita

iit

i

a

ttt

asQassPrEasQ

saasQsV

sVassPrEsV

sVrE

rrE

rE

ssVsV

tt

t

tt

t

t

t

,,,

,

,

,

**

**

**

*

*

max|

in of Valuemax

|max

max

max

max

max

Bellman’s equation

Page 244: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

10

• Immediate reward given only when entering the goal state G. • Given any initial state, we want to generate an action sequence to maximize V .

Page 245: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Discount rate: = 0.9 Top middle: 100 + 0 + 0 + ... = 100 Top left: 0 + 100 + 0 + ... = 90 Bottom left: 0 + 0 + 100 + ... = 81 Note that these values are supposed to be obtained using the optimal policy .

11

r(s,a) V*(s) values

Page 246: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is known There is no need for exploration Can be solved using dynamic programming Solve for Optimal policy

12

1111

ts

ttttat sVassPrEsVt

t

** ,|max

1111

ts

tttttta

t sVassPasrEstt

*,|,|max arg*

Page 247: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

Page 248: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

14

Page 249: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Environment, P (st+1 | st , at ), p (rt+1 | st , at ), is not known; model-free learning There is need for exploration to sample from

P (st+1 | st , at ) and p (rt+1 | st , at )

Use the reward received in the next time step to update the value of current state (action) The temporal difference between the value of the current action and the value discounted from the next state 15

Page 250: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

ε-greedy: With pr ε,choose one action at random uniformly; and choose the best action with pr 1-ε Probabilistic: Move smoothly from exploration/exploitation. Decrease ε Annealing

16

A

1bbsQ

asQsaP,exp

,exp|

A

1bTbsQ

TasQsaP/,exp

/,exp|

Page 251: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Deterministic: single possible reward and next state

used as an update rule (backup) Starting at zero, Q values increase, never

decrease

17

11111

1

ttastttttt asQassPrEasQ

tt

,max,|, **

1111

ttattt asQrasQt

,max,

1111

ttattt asQrasQt

,ˆmax,ˆ

Page 252: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Consider the value of action marked by ‘*’: If path A is seen first, Q(*)=0.9*max(0,81)=73 Then B is seen, Q(*)=0.9*max(100,81)=90

Or, If path B is seen first, Q(*)=0.9*max(100,0)=90 Then A is seen, Q(*)=0.9*max(100,81)=90

Q values increase but never decrease

18

γ=0.9

Page 253: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

When next states and rewards are nondeterministic (there is an opponent or randomness in the environment), we keep averages (expected values) instead as assignments Q-learning (Watkins and Dayan, 1992): Off-policy vs on-policy (Sarsa) Learning V (TD-learning: Sutton, 1988)

19

ttttattttt asQasQrasQasQt

,ˆ,ˆmax,ˆ,ˆ111

1

ttttt sVsVrsVsV 11

Page 254: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

20

Page 255: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

21

Page 256: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Keep a record of previously visited states (actions)

22

asaseasQasQasQasQr

aseaass

ase

tttttt

tttttt

t

ttt

,,,,,,,

,,

111

1

1otherwise

and if

Page 257: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

23

Page 258: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Tabular: Q (s , a) or V (s) stored in a table Regressor: Use a learner to estimate Q (s , a) or V (s)

24

zeros all with

yEligibilit

0θ1

111

111

2111

eee

θθ

θ

tttt

tttttt

tt

ttttttt

tttttt

asQasQasQr

asQasQasQrasQasQrE

t

t

,,,

,,,,,

Page 259: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

The agent does not know its state but receives an observation p(ot+1|st,at) which can be used to infer a belief about states Partially observable MDP

25

Page 260: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Two doors, behind one of which there is a tiger p: prob that tiger is behind the left door R(aL)=-100p+80(1-p), R(aR)=90p-100(1-p) We can sense with a reward of R(aS)=-1 We have unreliable sensors

26

Page 261: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

If we sense oL , our belief in tiger’s position changes

27

1

1301007090

110090

1308070100

180100

1307070

)|()(

)(.)(

.)'('

)|(),()|(),()|()(

)(.)(

.)'('

)|(),()|(),()|()(..

.)(

)()|()|('

LS

LL

LRRRLLLRLR

LL

LRRLLLLLLL

L

LLLLL

oaRoP

poPp

ppozPzarozPzaroaR

oPp

oPp

ppozPzarozPzaroaR

ppp

oPzPzoPozPp

Page 262: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

28

)()()()(

max

)())|(),|(),|(max()())|(),|(),|(max(

)()|(max'

pppppppp

oPoaRoaRoaRoPoaRoaRoaR

oPoaRV

RRSRRRLLLSLRLL

jj

jii

1100901263314643180100

Page 263: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

29

Page 264: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Let us say the tiger can move from one room to the other with prob 0.8

30

)'()'()'('

max'

)(..'

pppppp

V

ppp

11009012633180100

18020

Page 265: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

When planning for episodes of two, we can take aL, aR, or sense and wait:

31

1110090180100

2

'max)()(

maxV

pppp

V

Page 266: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 267: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Parametric: Assume a single model for p (x | Ci) Semiparametric: p (x | Ci) is a mixture of densities

Multiple possible explanations/prototypes: Different handwriting styles, accents in speech

Nonparametric: No model; data speaks for itself

2

Page 268: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

where Gi the components/groups/clusters, P ( Gi ) mixture proportions (priors), p ( x | Gi) component densities Gaussian mixture where p(x|Gi) ~ N ( μi , ∑i ) parameters

Φ = {P ( Gi ), μi , ∑i }ki=1

unlabeled sample X={xt}t (unsupervised learning)

3

k

iii GPGpp

1|xx

Page 269: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Supervised: X = { xt ,rt }t Classes Ci i=1,...,K

where p ( x | Ci) ~ N ( μi , ∑i )

Φ = {P (Ci ), μi , ∑i }Ki=1

4

K

iii Ppp

1CC|xx

tti

Ti

tt i

tti

i

tti

ttt

ii

tti

i

rr

rr

Nr

CP

mxmx

xm

S

ˆ

Unsupervised : X = { xt }t Clusters Gi i=1,...,k

where p ( x | Gi) ~ N ( μi , ∑i )

Φ = {P ( Gi ), μi , ∑i }ki=1

Labels, r ti ?

k

iii GPGpp

1|xx

Page 270: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Find k reference vectors (prototypes/codebook vectors/codewords) which best represent data Reference vectors, mj, j =1,...,k Use nearest (most similar) reference: Reconstruction error

5

jt

jit mxmx min

otherwisemini f

01

1

jt

jit

ti

t i itt

ikii

b

bE

mxmx

mxm X

Page 271: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

6

otherwise0minif1 j

t

jit

ti

mxmxb

Page 272: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

7

Page 273: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

8

Page 274: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Log likelihood with a mixture model Assume hidden variables z, which when known, make optimization much simplerComplete likelihood, Lc(Φ |X,Z), in terms of x and z Incomplete likelihood, L(Φ |X), in terms of x

9

t

k

iii

t

t

t

GPGp

p

1|log

|log|

x

xXL

Page 275: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Iterate the two steps 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ’ given z, X, and old Φ.

An increase in Q increases incomplete

likelihood

10

ll

lC

l E

|maxarg:step-M

|||:step-E

Q

X,ZX,LQ1

XLXL || ll 1

Page 276: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

zti = 1 if xt belongs to Gi, 0 otherwise (labels r ti of

supervised learning); assume p(x|Gi)~N(μi,∑i) E-step: M-step:

11

ti

lti

j jl

jt

il

it

lti

hGP

GPGpGPGpzE

,

,,X

x

xx

|

||,

tti

Tli

tt

li

ttil

i

tti

ttt

ili

tti

i

hh

hh

Nh

P

111

1

mxmx

xm

S

GUse estimated labels in place of unknown labels

Page 277: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

5.0)|( 11 hxGP

Page 278: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Regularize clusters 1. Assume shared/diagonal covariance matrices 2. Use PCA/FA to decrease dimensionality: Mixtures of

PCA/FA

Can use EM to learn Vi (Ghahramani and Hinton, 1997;

Tipping and Bishop, 1999)

13

iTiiiit Gp ψmx VV,| N

Page 279: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Dimensionality reduction methods find correlations between features and group features Clustering methods find similarities between instances and group instances Allows knowledge extraction through number of clusters, prior probabilities, cluster parameters, i.e., center, range of features.

Example: CRM, customer segmentation

14

Page 280: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Estimated group labels hj (soft) or bj (hard) may be seen as the dimensions of a new k dimensional space, where we can then learn our discriminant or regressor. Local representation (only one bj is 1, all others are 0; only few hj are nonzero) vs

Distributed representation (After PCA; all zj are nonzero)

15

Page 281: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

16

K

iii

k

jijiji

Ppp

GPGppi

1

1

CC

C

|

||

xx

xx

Page 282: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Cluster based on similarities/distances Distance measure between instances xr and xs

Minkowski (Lp) (Euclidean for p = 2) City-block distance

17

pd

j

psj

rj

srm xxd

/,

1

1xx

d

jsj

rj

srcb xxd

1xx ,

Page 283: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Start with N groups each with one instance and merge two closest groups at each iteration Distance between two groups Gi and Gj:

Single-link:

Complete-link:

Average-link, centroid

18

srji dGGd

js

ir

xxxx

,min,, GG

srji dGGd

js

ir

xxxx

,max,, GG

Page 284: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

19

Dendrogram

Page 285: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Defined by the application, e.g., image quantization Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) Manually check for meaning

20

Page 286: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 287: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%? Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?

Training/validation/test sets Resampling methods: K-fold cross-validation

2

Page 288: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability

Cost-sensitive learning

3

Page 289: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

4

Page 290: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Response surface design for approximating and maximizing the response function in terms of the controllable factors 5

Page 291: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

A. Aim of the study B. Selection of the response variable C. Choice of factors and levels D. Choice of experimental design E. Performing the experiment F. Statistical Analysis of the Data G. Conclusions and Recommendations

6

Page 292: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

The need for multiple training/validation sets {Xi,Vi}i: Training/validation sets of fold i

K-fold cross-validation: Divide X into k, Xi,i=1,...,K Ti share K-2 parts

7

121

3122

3211

KKKK

K

K

XXXTXV

XXXTXV

XXXTXV

2

1

Page 293: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

5 times 2 fold cross-validation (Dietterich, 1998)

8

1510

2510

259

159

124

224

223

123

112

212

211

111

XVXT

XVXT

XVXT

XVXT

XVXT

XVXT

Page 294: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws

that is, only 36.8% is new!

9

368011 1 .eN

N

Page 295: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives

= TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found

= TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity

10

Page 296: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

11

Page 297: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

12

Page 298: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

13

Page 299: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

X = { xt }t where xt ~ N ( μ, σ2) m ~ N ( μ, σ2/N)

14

1

950961961

950961961

22 Nzm

NzmP

Nm

NmP

mNP

mN

//

...

...

~ Z

100(1- α) percent confidence interval

Page 300: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

When σ2 is not known:

15

1

950641

950641

NzmP

NmP

mNP

..

..

1

1

1212

122

NStm

NStmP

tSmNNmxS

NN

Nt

t

,/,/

~ /

Page 301: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Reject a null hypothesis if not supported by the sample with enough confidence X = { xt }t where xt ~ N ( μ, σ2)

H0: μ = μ0 vs. H1: μ ≠ μ0 Accept H0 with level of significance α if μ0 is in the 100(1- α) confidence interval Two-sided test

16

220

// ,zzmN

Page 302: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

One-sided test: H0: μ ≤ μ0 vs. H1: μ > μ0 Accept if

Variance unknown: Use t, instead of z Accept H0: μ = μ0 if

17

zmN ,0

12120

NN ttS

mN,/,/ ,

Page 303: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Single training/validation set: Binomial Test If error prob is p0, prob that there are e errors or less in

N validation trials is

18

Accept if this prob is less than 1- α

jNjje

j

ppjN

eXP 001

1

1- α N=100, e=20

Page 304: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Number of errors X is approx N with mean Np0 and var Np0(1-p0)

19

Z~00

0

1 pNpNpX

Accept if this prob for X = e is less than z1-α

1- α

Page 305: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Multiple training/validation sets xt

i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s2 average and var of pi , we accept p0 or less error if

is less than tα,K-1

20

10 ~ KtS

pmK

Nx

pN

tti

i1

Page 306: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Single training/validation set: McNemar’s Test Under H0, we expect e01= e10=(e01+ e10)/2

21

21

1001

21001 1

X~ee

ee

Accept if < X2α,1

Page 307: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Use K-fold cv to get K training/validation folds pi

1, pi2: Errors of classifiers 1 and 2 on fold i

pi = pi1 – pi

2 : Paired difference on fold i The null hypothesis is whether pi has mean 0

22

12121

12

21

00

01

00

KKK

K

i iK

i i

ttts

mKsmK

Kmp

sK

pm

HH

,/,/ ,~

::

in if Accept

vs.

Page 308: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998) pi

(j) : difference btw errors of 1 and 2 on fold j=1, 2 of replication i=1,...,5

23

55

12

11

2221221

5

2

ts

p

ppppsppp

i i

iiiiiiii

~/

/

Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5

Page 309: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5

24

5105

12

5

1

2

1

2

2,~ F

s

p

i i

i jj

i

Page 310: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Errors of L algorithms on K folds We construct two estimators to σ2 .

One is valid if H0 is true, the other is always valid. We reject H0 if the two estimators disagree.

25

LH 210 :

KiLjX jij ,..., ,,...,,,~ 112N

Page 311: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

26

212

0

2212

2

1

22

2

221

2

1

0

1

1

L

jjL

j

j

L

j

j

j jL

j j

K

i

ijj

SSbH

mmKSSbKmm

Lmm

K

SKL

mmS

L

mm

KKX

m

H

X

X

N

~

~/

ˆ

/,~

:

have wetrue, is whenSo

namely, , is of estimator an Thus

true is If

2

Page 312: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

27

11210

11

22

212

212

2

2

2

1

21

22

20

11

11

1

11

KLLL

KLL

KLKj

j ijij

j i

jijL

j

jK

i jijj

j

FH

FKLSSwLSSb

KLSSw

LSSb

SSwSK

mXSSw

KLmX

LS

KmX

S

S

H

,,

,

:

~/

////

~~

ˆ

if

: variances group of averagethe is to estimator second our of Regardless

2

2

XX

Page 313: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

28

If ANOVA rejects, we do pairwise posthoc tests

)(~

: vs :

1

10

2 KLw

ji

jiji

tmm

t

HH

Page 314: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Comparing two algorithms: Sign test: Count how many times A beats B over N

datasets, and check if this could have been by chance if A and B did have the same error rate

Comparing multiple algorithms

Kruskal-Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error

If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference

29

Page 315: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Instructor : Omid Sojoodi

Faculty of Electrical, Computer and IT Engineering Qazvin Azad University

Contact Info: o_sojoodi@{ieee.org, m.ieice.org}

Page 316: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Based loosely on simulated evolution. Hypotheses: described in bit strings (subject to interpretation in specific domains). Search: population of hypotheses, refined through mutation and crossover to increase fitness. Applications: optimization problems, learning the topology and parameters in neural networks, and many more.

2

Page 317: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Lamarck and others: Species “transmute” over time (inheritance of acquired trait)

Darwin and Wallace: Consistent, heritable variation among individuals in population Natural selection of the fittest

Mendel and genetics: A mechanism for inheriting traits Genotype phenotype mapping

3

Page 318: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Mutation and crossover of hypotheses in the current population. Basically a generate-and-test beam search. Motivating factors:

Evolution is known to be successful. GAs can search hypotheses containing complex interacting parts. Easily parallelizable

4

Page 319: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Population: set of current hypotheses Fitness: predefined measure of success Elements of GA:

fitness test selection reproduction (mutation, crossover)

5

Page 320: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

GA(Fitness, Fitness threshold, p, r, m) Initialize: P p random hypotheses Evaluate: for each h in P, compute Fitness(h) While [maxh Fitness(h)] < Fitness threshold

1. Select: Probabilistically select (1−r)p members of P to add to Ps.

2. Crossover: Probabilistically select pairs of hypotheses from P. For each pair, <h1, h2>, produce two offspring by applying the Crossover operator. Add all offspring to Ps. 3. Mutate: Invert a randomly selected bit in m. p random members of Ps. 4. Update: P Ps 5. Evaluate: for each h in P, compute Fitness(h)

Return the hypothesis from P that has the highest fitness.

6

Page 321: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Represent (Outlook = Overcast _ Rain) ^ (Wind = Strong) by Outlook Wind 011 10 Represent IF Wind = Strong THEN PlayTennis = yes by Outlook Wind PlayTennis 111 10 10

7

Page 322: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

8

Page 323: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Fitness proportionate selection:

Tournament selection: Pick h1, h2 at random with uniform prob. With probability p, select the more fit.

Rank selection: Sort all hypotheses by fitness Probability of selection is proportional to rank

9

Page 324: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Learn disjunctive set of propositional rules, competitive with C4.5 Fitness: Fitness(h) = (percent_correct(h))2

Representation: IF a1 = T ^ a2 = F THEN c = T; IF a2 = T THEN c = F represented by a1 a2 c a1 a2 c 10 01 1 11 10 0 Genetic operators:

want variable length rule sets (as number of attributes can change) want only well-formed bitstring hypotheses

10

Page 325: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Start with a1 a2 c a1 a2 c

h1 : 10 01 1 11 10 0 h2 : 01 11 0 10 01 0

1. choose crossover points for h1, e.g., after bits 1, 8 2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g., <1, 3>, <1, 8>, <6, 8>.

if we choose <1, 3>, result is a1 a2 c

h3 : 11 10 0 a1 a2 c a1 a2 c a1 a2 c

h4 : 00 01 1 11 11 0 10 01 0 11

Page 326: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Add new genetic operators, also applied probabilistically: 1. AddAlternative: generalize constraint on ai by changing a 0 to 1 2. DropCondition: generalize constraint on ai by changing every 0 to 1

And, add new field to bitstring to determine whether to allow these

a1 a2 c a1 a2 c AA DC 01 11 0 10 01 0 1 0

So now the learning strategy also evolves. (Allowing this increased accuracy.)

12

Page 327: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14 Average performance on a set of 12 synthetic problems:

GABIL without AA and DC operators: 92.1% accuracy GABIL with AA and DC operators: 95.2% accuracy symbolic learning methods ranged from 91.2 to 96.6

13

Page 328: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

How to characterize evolution of population in GA? Schema = string containing 0, 1, * (“don’t care”)

Typical schema: 10**0* Instances of above schema: 101101, 100000, ... An instance of length 4, say 0010, can have 24 matching

schemas.

Characterize population by number of instances representing each possible schema:

m(s, t) = number of instances of schema s in pop at time t Want to estimate m(s, t + 1) given m(s, t) and other factors.

14

Page 329: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

m(s, t) can change as t changes, due to the following factors:

Selection: if individuals representing s get selected more often, m(s, ·) will increase. Crossover Mutation

Schema theorem: gives E[m(s, t + 1)].

15

Page 330: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

= average fitness of pop. at time t m(s, t) = instances of schema s in pop at time t

= average fitness of instances of s at time t : instances of schema s in the population at time t

Probability of selecting h in one selection step

Mean fitness of instances of s at time t:

16

Page 331: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Probability of selecting an instance of s in one step

Expected number of instances of s after n selections

17

Page 332: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

m(s, t) = instances of schema s in pop at time t = average fitness of pop. at time t

= ave. fitness of instances of s at time t pc = probability of single point crossover operator pm = probability of mutation operator l = length of single bit strings o(s) = number of defined (non “*”) bits in s d(s) = distance between leftmost, rightmost defined bits in s

18

Page 333: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Population of programs represented by tree

19

Page 334: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

20

Page 335: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Goal: spell UNIVERSAL Terminals:

CS (“current stack”) = name of the top block on stack, or F. TB (“top correct block”) = name of topmost correct block on stack NN (“next necessary”) = name of the next block needed above TB in the stack

21

Page 336: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

(MS x): (“move to stack”), if block x is on the table, moves x to the top of the stack and returns the value T. Otherwise, does nothing and returns the value F.

(MT x): (“move to table”), if block x is somewhere in the stack, moves the block at the top of the stack to the table and returns the value T. Otherwise, returns F.

(EQ x y): (“equal”), returns T if x equals y, and returns F otherwise. (NOT x): returns T if x = F, else returns F (DU x y): (“do until”) executes the expression x repeatedly until

expression y returns the value T

22

Page 337: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Trained to fit 166 test problems Using population of 300 programs, found this after 10 generations: (EQ (DU (MT CS)(NOT CS)) (DU (MS NN)(NOT NN)) )

23

Page 338: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Lamarck (19th century) Believed individual genetic makeup was altered by lifetime experience But current evidence contradicts this view

What is the impact of individual learning on population evolution?

24

Page 339: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Assume Individual learning has no direct influence on individual DNA But ability to learn reduces need to “hard wire” traits in DNA

Then

Ability of individuals to learn will support more diverse gene pool – Because learning allows individuals with various “hard wired” traits to be successful

More diverse gene pool will support faster evolution of gene pool

individual learning (indirectly) increases rate of evolution

25

Page 340: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Plausible example: 1. New predator appears in environment 2. Individuals who can learn (to avoid it) will be selected 3. Increase in learning individuals will support more diverse gene pool 4. resulting in faster evolution 5. possibly resulting in new non-learned traits such as instinctive fear of predator

26

Page 341: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Evolve simple neural networks: Some network weights fixed during lifetime, others trainable Genetic makeup determines which are fixed, and their weight values

Results: With no individual learning, population failed to improve over time When individual learning allowed

Early generations: population contained many individuals with many trainable weights Later generations: higher fitness, while number of trainable weights decreased

27

Page 342: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Coevolution: escalating effect or complementary dependence (insects and flowering plants) between two or more species. Cultural transmission: memes vs. genes.

28

Page 343: Instructor : Omid Sojoodi Faculty of Electrical, Computer and ...bayanbox.ir/view/3610404832254313105/Binder1e2.pdfRobot driving, etc. Goal of ML: “define precisely a class of problems

Conduct randomized, parallel, hill-climbing search through H Approach learning as optimization problem (optimize fitness) Nice feature: evaluation of Fitness can be very indirect

consider learning rule set for multistep decision making no issue of assigning credit/blame to individual steps

29