more ai topics: reinforcement learning, semi-supervised ... › u › jliu › csc-242 ›...

More AI Topics: Reinforcement Learning, Semi-supervised Learning, and Active

Learning

Lecturer: Ji Liu

Some slides for active learning are from Yi Zhang

Outlines

● Reinforcement Learning● Semi-supervised Learning● Active Learning

Robotics

Reinforcement Learning (RL)

Your action influences the state of the world which determines its reward

Everybody is doing reinforcement learning in the real world

RL: Learning from rewards

pole-balancingwalking robot (applet)TD-Gammonhelicopter

ComplicationsUncertainties: random outcomes of your actions, random reward, delayed reward,environments may change.

What is RL?

● Learning from the interaction with the environment

● Goal-oriented learning● Learning the optimal strategy – how to map a

state to an action in order to maximize the average return in a long run

-0.01

A Simplified RL Example

-0.01 -0.01 -0.01 +1

-0.01 -0.01 -1

-0.01 -0.01 -0.01

actions: UP, DOWN, LEFT, RIGHT

UP

80% move UP10% move LEFT10% move RIGHT

• reward +1 at [4,3], -1 at [4,2]• reward -0.01 on other blocks (states)

• Goal: max: Expectation[R(s0) + r*R(s1) + r^2*R(s2) + … + r^n*R(sn)+ ...]

• what’s the optimal strategy to achieve max reward?

4

3

1

2

31 2

A Simplified RL Example: Optimal Strategy

+1

-1

actions: UP, DOWN, LEFT, RIGHT

UP

80% move UP10% move LEFT10% move RIGHT

4

3

1

2

31 2

Why?

• reward +1 at [4,3], -1 at [4,2]• reward -0.01 on other blocks (states)

• Goal: max: Expectation[R(s0) + r*R(s1) + r^2*R(s2) + … + r^n*R(sn)+ ...]

• what’s the optimal strategy to achieve max reward?

Video: RL really works

pole-balancinghttps://www.youtube.com/watch?v=Lt-KLtkDlh8

RL in the real life

● Video: you may want to watch as well: https://www.facebook.com/willyfoo/videos/10152801946794245/

Outlines


Scenarios: Short for Labels

● Time consuming, e.g., document classification● Expensive, e.g., medical decision (need

doctors)● Sometimes Dangerous, e.g., landmine

detection

Supervised Learning

Motivation for Semi-supervised Learning (SSL)

● Semi-Supervised =Supervised + Unsupervised

● Are unlabeled data are helpful to obtain a robust model?

Why unlabeled data are helpful?

A simple SSL algorithm: label propagation

-

-+

+Twitter data analysis: e.g., age prediction,gender prediction

Outlines


Motivation of AL

● Also take care of the scenario where labeled data are quite few

● But using a different strategy from SSL

SSL

Directly use unlabeled data

AL

Iteratively select a sample andquery its label

Procedure of AL

● Repeat– Use all labeled data to predict the model (like a

classifier)

– Select an unlabeled data to query its label (it becomes a labeled data in the next iteration)

Why do not randomly select samples to query?

● Samples have different importance

How to decide the importance?● The uncertainty based on the current model● Select the sample with most uncertainty

Another perspective: reduce version space

How many labels are enough?

Label Complexity

more ai topics: reinforcement learning, semi-supervised ... › u › jliu › csc-242 ›...

Documents