f. kaplan, p. oudeyer, e. kubinyi and a. miklosi mart van de sanden

19
F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Upload: louisa-sullivan

Post on 20-Jan-2016

270 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi

Mart van de Sanden

Page 2: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

AIBO As a Digital Creaturen Animal-like entertainment robot A companion How to teach it to do new things? Train like real pets? Through

interaction!

Page 3: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

How Does it Work For Real Pets? How to teach a dolphin to

jump? Show it to him? Explain it to him? It needs to discover on its

own! But what if the action is rare

or complex? We need to guide it!

The same goes for robots!

Page 4: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

How Not To Do It

Chanting while pushing the dog to sit Split attention between

learning a new move and listening to the trainer.

Which part of the behavior is sit?

Often the command is given while the dog is still standing.

Page 5: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Then How?

First teach the behavior. Then add the command!

Page 6: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Modelling (or molding)

Physically manipulating the animal into the desired position.

Then give positive feedback. Never used by professional trainers. The dog is not actively involved. Learning performance is poor. Used for teaching industrial robots! Not convenient for autonomous robots. Not good for teaching complex

movements.

Page 7: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Luring (or “magnet method”) Same as modeling, but with the use

of a lure. Gives satisfactory results for real

dogs. Can only teach positions or simple

movements. Not really used with robots.

Page 8: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Capturing

Exploits behavior that the animal performs spontaneously.

Wait for the correct behavior and give a positive reinforcement.

Takes to much time when multiple commands need to be learned.

Page 9: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

The use of imitation?

Animal anatomy mostly does not resemble ours.

Only higher animals (e.g. primates) are able to imitate.

Has been done with robotics. It can handle the learning of sequences

of actions and rare behaviors. Requires elaborate vision techniques.

Page 10: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Shaping

Breaks behaviors down into small steps.

Which can be trained used any of the mentioned techniques.

Clicker training!

Page 11: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Clicker Training

B.F. Skinner: Operant conditioning.

A Clicker emits a brief sharp sound.

Which is associated with a primary reinforcer. Foods, toys, etc. It becomes a secondary

reinforcer. It will act as a positive cue.

Page 12: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Clicker Training

The clicker can be used to guide animals in the right direction.

By only giving the clicker sound when the animal performs the desired behavior.

Page 13: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Clicker Training

Four steps: Charging the clicker. Getting the behavior. Adding the command word. Testing the behavior.

It can be used to learn rare behaviors.

It can be used to learn sequences of behaviors.

Page 14: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Discussion!

Do you want to train your robot using this way or do you rather use a computer to program it?

Or build in another way of training? Because clicker training does not exactly come natural.

Page 15: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden
Page 16: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Robotic Clicker Training

Robot: Hierarchical schemata based behavior

model. Behavior selection according to:

Opportunities in the environment Natural instincts Emotion of the robot User expectation model

(associative memory)

Page 17: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Charging the Clicker

Primary reinforcer -> event within 5 seconds.

After 30 times it becomes a secondary reinforcer.

TRAINER scratches the robot’s head and says “Good”.ROBOT learns association in user’s expectation module.TRAINER scratches the robot’s head and says “Good”.ROBOT learns association in user’s expectation module.Etc.

Page 18: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Guiding the Robot

The robot starts out just doing what it wants to do.

When the trainers says “good”, the training module reinforces the current top-level schemata.

This means that the robot does the underlying behaviors more often.

Page 19: F. Kaplan, P. Oudeyer, E. Kubinyi and A. Miklosi Mart van de Sanden

Adding the Command Word

When a word is heared, the expection modules associates it with all the reinforced actions in the training session.

It creates a new schema for them. A new schema has a confidence

level. After reaching a certain level it

becomes permanent.