more ai topics: reinforcement learning, semi-supervised ... › u › jliu › csc-242 ›...
TRANSCRIPT
More AI Topics: Reinforcement Learning, Semi-supervised Learning, and Active
Learning
Lecturer: Ji Liu
Some slides for active learning are from Yi Zhang
Reinforcement Learning (RL)
Your action influences the state of the world which determines its reward
Everybody is doing reinforcement learning in the real world
ComplicationsUncertainties: random outcomes of your actions, random reward, delayed reward,environments may change.
What is RL?
● Learning from the interaction with the environment
● Goal-oriented learning● Learning the optimal strategy – how to map a
state to an action in order to maximize the average return in a long run
-0.01
A Simplified RL Example
-0.01 -0.01 -0.01 +1
-0.01 -0.01 -1
-0.01 -0.01 -0.01
actions: UP, DOWN, LEFT, RIGHT
UP
80% move UP10% move LEFT10% move RIGHT
• reward +1 at [4,3], -1 at [4,2]• reward -0.01 on other blocks (states)
• Goal: max: Expectation[R(s0) + r*R(s1) + r^2*R(s2) + … + r^n*R(sn)+ ...]
• what’s the optimal strategy to achieve max reward?
4
3
1
2
31 2
A Simplified RL Example: Optimal Strategy
+1
-1
actions: UP, DOWN, LEFT, RIGHT
UP
80% move UP10% move LEFT10% move RIGHT
4
3
1
2
31 2
Why?
• reward +1 at [4,3], -1 at [4,2]• reward -0.01 on other blocks (states)
• Goal: max: Expectation[R(s0) + r*R(s1) + r^2*R(s2) + … + r^n*R(sn)+ ...]
• what’s the optimal strategy to achieve max reward?
RL in the real life
● Video: you may want to watch as well: https://www.facebook.com/willyfoo/videos/10152801946794245/
Scenarios: Short for Labels
● Time consuming, e.g., document classification● Expensive, e.g., medical decision (need
doctors)● Sometimes Dangerous, e.g., landmine
detection
Motivation for Semi-supervised Learning (SSL)
● Semi-Supervised =Supervised + Unsupervised
● Are unlabeled data are helpful to obtain a robust model?
A simple SSL algorithm: label propagation
-
-+
+Twitter data analysis: e.g., age prediction,gender prediction
Motivation of AL
● Also take care of the scenario where labeled data are quite few
● But using a different strategy from SSL
Procedure of AL
● Repeat– Use all labeled data to predict the model (like a
classifier)
– Select an unlabeled data to query its label (it becomes a labeled data in the next iteration)
How to decide the importance?● The uncertainty based on the current model● Select the sample with most uncertainty