1 some rules no make-up exams ! if you miss with an official excuse, you get average of your scores...

32
1 Some rules No make-up exams ! If you miss with an official excuse, you get average of your scores in the other exams – at most once. WP only-if you get at least 40% in the exams before you withdraw. Grades (roughly): D, D+, C, C+, B, B+, A, A+ 45-52, 53-60, 61-65, 66-70, 71-75, 76-80, 81-85, > 85 Attendance: more than 9 absences DN You get bonus upto 2 marks (to push up grade) Absences < 4 and well-behaved

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Some rules

No make-up exams ! If you miss with an official excuse, you get average of

your scores in the other exams – at most once.

WP only-if you get at least 40% in the exams before you withdraw.

Grades (roughly): D, D+, C, C+, B, B+, A, A+ 45-52, 53-60, 61-65, 66-70, 71-75, 76-80, 81-85, > 85

Attendance: more than 9 absences DN You get bonus upto 2 marks (to push up grade)

Absences < 4 and well-behaved

2

Some rules

Never believe in anything but "I can!" It always leads to

"I will", "I did" and "I'm glad!"

3

Ch 1: Introduction to ML - Outline

What is machine learning? Why machine learning? Well-defined learning problem Example: checkers Questions that should be asked about ML

4

What is Machine Learning?

Definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Machine learning is Machine learning is the study of how to make the study of how to make computers learncomputers learn; the goal is to make ; the goal is to make computers improve their performance through computers improve their performance through experience.experience.

5

Successful Applications of ML

Learning to recognize spoken words SPHINX (Lee 1989)

Learning to drive an autonomous vehicle ALVINN (Pomerleau 1989)

Learning to classify celestial objects (Fayyad et al 1995)

Learning to play world-class backgammon TD-GAMMON (Tesauro 1992)

Designing the morphology and control structure of electro-mechanical artefacts GOLEM (Lipton, Pollock 2000)

6

Why Machine Learning?

Some tasks cannot be defined well, except by examples (e.g., recognizing people).

Relationships and correlations can be hidden within large amounts of data. Machine Learning/Data Mining may be able to find these relationships.

The amount of knowledge available about certain tasks might be too large for explicit encoding by humans (e.g., medical diagnostic).

7

Why Machine Learning?

Environments change over time. New knowledge about tasks is constantly

being discovered by humans. It may be difficult to continuously re-design systems “by hand”.

Time is right progress in algorithms & theory growing flood of online data computational power available budding industry

8

Why ML – 3 niches

Data mining medical records --> medical knowledge

Self customizing programs learning newsreader

Applications we can’t program autonomous driving speech recognition

9

Multidisciplinary Field

MachineMachineLearningLearning

Probability &Probability &StatisticsStatistics

ComputationalComputationalComplexityComplexity

TheoryTheory InformationInformationTheoryTheory

PhilosophyPhilosophy

NeurobiologyNeurobiology

ArtificialArtificialIntelligenceIntelligence

10

Learning Problem

Improving with experience at some task task T performance measure P experience E

Example: Handwriting recognition T: recognize & classify handwritten words in

images P: % of words correctly classified E: database with words & classifications

11

More examples...

Checkers T: play checkers P: % of games won E: play against self

Robot driving T: drive on a public highway using vision sensors P: average distance traveled before making an error E: sequence of images and steering commands

(observed from a human driver)

12

Designing a Learning System:

Problem Description – E.g., Checkers What experience? What exactly should be learned? How to represent it? What algorithm to learn it?

1. Choosing the Training Experience2. Choosing the Target Function3. Choosing a Representation for the Target Function4. Choosing a Function Approximation Algorithm5. Final Design

13

Type of Training Experience

Direct or indirect? Direct: board state -> correct move Indirect: outcome of a complete game

Move sequences & final outcomes Credit assignment problem Thus, more difficult to obtain

What degree of control over examples? Next slide

Is training experience representative of performance goal? Next slide

14

Training experience - control

Degree of control over examples rely on teacher (who selects informative board

states & correct moves) ask teacher (proposes difficult board states,

ask for move) complete control (play games against itself &

check the outcome) variations: experiment new states or play minor

variations of a promising sequence

15

Training experience - training data

How well does it represent the problem? Is the training experience representative of the

task the system will actually have to solve? It is best if it is, but such a situation cannot systematically be achieved! Distribution of examples Same as future test examples? Most theory makes this assumption

Checkers Training playing against itself Performance evaluated playing against world

champion

16

Choosing the Target function

Determine type of knowledge to be learned how this will be used

Checkers legal and best moves legal moves easy, best hard large class of tasks are like this

17

Target function

Program choosing the best move ChooseMove: Board -> Move

“improve P in T” reduces to finding a function choice is a key decision difficult to learn given (only) indirect examples

Alternative: assign a numerical score V: Board -> R Assign a numerical score to each board. Select the best move by evaluating all successor

states of legal moves

18

Definitions for V

Final board states V(b) = 100 if winning, -100 if loosing and 0 if

draw

Intermediate states? V(b) = V(b’) where b’ is the best final state accessible from b

playing optimal game correct but not effectively computable

19

The real target function

Operational V can be used & computed goal: operational description of the ideal target

function The ideal target function can often not be

learned and must be approximated

Notation ^V: function actually learned V: the ideal target function

20

Choosing a representation for V

Many possibilities collection of rules, neural network, arithmetic

function on board features, etc

Usual tradeoff: the more expressive the representation, the

more training examples are necessary to choose among the large number of “representable” possibilities

21

Simple representation

Linear function of board features x1: black pieces, x2: red pieces x3: black kings, x4: red kings x5: black threatened by red x6: red threatened by black

^V 0 + 1x1 + … + 6x6 wi: weights to be learned

22

Note

T, P & E are part of the specification V and ^V are design choices Here effect of choices is to

reduce the learning problem to finding numbers 0,…, 6

23

Approximation Algorithm

Obtaining training examples Vt(b): training value examples: <b, Vt(b)>

Follows procedure deriving <b, Vt(b)> from indirect

experience weight adjusting procedure to fit ^V to

examples

24

Estimating Vt(b)

Game was won/lost does not mean each state was good/bad early play good, late disaster -> loss

Simple approach: Vt(b) = ^V(b’) b’ is the next state where player is allowed to

move surprisingly successful intuition: ^V is more accurate at states close

game end

25

Adjusting weights

What is best fit to training data? One common approach: minimize squared

error E E = sum (Vt(b) - ^V(b))2

several algorithms known Properties we want

Incremental – in refining weights as examples arrive

robust to errors in Vt(b)

26

LMS update rule

“Least mean squares” REPEAT

select a random example b compute error(b) = Vt(b) - ^V(b) For each board feature fi, update weight

i i + fi error(b)

: learning rate constant, approx. 0.1

27

Notes about LMS

Actually performs stochastic gradient descent search in the weight space to minimize E --- see Ch. 4

Why works no error: no adjusting pos/neg error: weight increased/decr. if a feature fi does not occur, no adjustment to

its weight i is made

28

Final Design

Four distinct modules1. performance system

gives trace for the given board state (using ^V)

2. critic produces examples Vtr(b), from the trace

3. generalizer produces ^V from training data

4. experiment generator generates new problems (initial board state) for ^V

Expt.Gen.

Perform.system

General.

Critic

29

Sequence of Design Choices

Determine Type of Training Experience

Games againstexperts

Games against self

Table of correct moves

BoardMove

Determine Target Function

BoardValue

Determine Representationof Learned Function

polynomial Linear function of six features

Artificial neuralnetwork

Determine Learning Algorithm

Gradient descent Linear programming

30

Useful perspective of ML

Search in space of hypotheses Usually a large space All 6-tuples for checkers!

find the one best fitting to examples and prior knowledge

Different spaces depending on the target function and representation

Space concept gives basis to formal analysis – size of the space, number of examples, confidence in the hypothesis…

31

Issues in ML

Algorithms What generalization methods exist? When (if ever) will they converge? Which are best for which types of problems

and representations? Amount of Training Data

How much is sufficient? confidence, data & size of space

Reducing problems learning task --> function approximation

32

Issues in ML

Prior knowledge When & how can it help? Helpful even when approximate?

Choosing experiments Are there good strategies? How do choices affect complexity?

Flexible representations Can the learner automatically modify its

representation for better performance?