![Page 1: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/1.jpg)
Model-Based Active ExplorationPranav Shyam, Wojciech Jaskowski, Faustino Gomez
arxiv.org/abs/1810.12162
Presentation by Danijar Hafner
![Page 2: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/2.jpg)
Reinforcement Learning
sensorinput
motoroutput unknown
environmentlearning agent
objective
algorithm
![Page 3: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/3.jpg)
Intrinsic Motivation
sensorinput
motoroutput
learning agent
objective
algorithm
Reinforcement Learning
sensorinput
motoroutput unknown
environmentlearning agent
objective
algorithm
unknown environment
![Page 4: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/4.jpg)
Many Intrinsic Objectives
Information gain e.g. Lindley 1956, Sun 2011, Houthooft 2017
Prediction error e.g. Schmidhuber 1991, Bellemare 2016, Pathak 2017
Empowerment e.g. Klyubin 2005, Tishby 2011, Gregor 2016
Skill discovery e.g. Eysenbach 2018, Sharma 2020, Co-Reyes 2018
Surprise minimization e.g. Schrödinger 1944, Friston 2013, Berseth 2020
Bayes-adaptive RL e.g. Gittins 1979, Duff 2002, Ross 2007
![Page 5: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/5.jpg)
Without rewards, the agent can only learn about the environment.
Information Gain
![Page 6: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/6.jpg)
Without rewards, the agent can only learn about the environment.
A model W represents our knowledge. E.g.: input density, forward prediction
Information Gain
![Page 7: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/7.jpg)
p(W)
Without rewards, the agent can only learn about the environment.
A model W represents our knowledge. E.g.: input density, forward prediction
Need to represent uncertainty about W to tell how much we have learned.
Information Gain
![Page 8: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/8.jpg)
p(W)
Without rewards, the agent can only learn about the environment.
A model W represents our knowledge. E.g.: input density, forward prediction
Need to represent uncertainty about W to tell how much we have learned.
Information Gain
data collection
p(W | X)
![Page 9: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/9.jpg)
maxa I(X; W | A=a)
p(W)
Without rewards, the agent can only learn about the environment.
A model W represents our knowledge. E.g.: input density, forward prediction
Need to represent uncertainty about W to tell how much we have learned.
Information Gain
data collection
To gain the most information, we aim to maximize the mutual information between future sensory inputs X and model parameters W:
p(W | X)
Both W and X are random variables
![Page 10: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/10.jpg)
maxa I(X; W | A=a)
p(W)
Without rewards, the agent can only learn about the environment.
A model W represents our knowledge. E.g.: input density, forward prediction
Need to represent uncertainty about W to tell how much we have learned.
Information Gain
data collection
To gain the most information, we aim to maximize the mutual information between future sensory inputs X and model parameters W:
p(W | X)
Both W and X are random variables
= ?
![Page 11: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/11.jpg)
Expected Infogain
Need to search for actions that will lead to high information gain without additional environment interaction
Learn a forward model of the environment to search for actions by planning or learning in imagination
Computing the expected information gain requires computing entropies of a model with uncertainty estimates
Retrospective Infogain
Collect episodes, train world model, record improvement, reward the controller by this improvement
Infogain depends on agent's knowledge that keeps changing, making it a non-stationary objective
The learned controller will lag behind and go to states that were previously novel but are not anymore
e.g. VIME, ICM, RND e.g. MAX, PETS-ET, LD
I(X; W | A=a)KL[p(W|X,A=a) p(W|A=a)]||
![Page 12: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/12.jpg)
Retrospective Novelty
Episode 1 Everything unknown
![Page 13: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/13.jpg)
Retrospective Novelty
Episode 1 Random behavior
![Page 14: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/14.jpg)
Retrospective Novelty
Episode 1 High novelty
![Page 15: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/15.jpg)
Retrospective Novelty
Episode 1 Reinforce behavior
![Page 16: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/16.jpg)
Retrospective Novelty
Episode 2 Repeat behavior
![Page 17: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/17.jpg)
Retrospective Novelty
Episode 2 Reach similar states
![Page 18: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/18.jpg)
Retrospective Novelty
Episode 2 Not surprising anymore :(
![Page 19: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/19.jpg)
Retrospective Novelty
Episode 2 Unlearn behavior
![Page 20: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/20.jpg)
Retrospective Novelty
Episode 3 Repeat behavior
![Page 21: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/21.jpg)
Retrospective Novelty
Episode 3 Repeat behavior
![Page 22: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/22.jpg)
Retrospective Novelty
Episode 3 Still not novel
![Page 23: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/23.jpg)
Retrospective Novelty
Episode 3 Unlearn behavior
![Page 24: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/24.jpg)
Retrospective Novelty
Episode 4 Back to random behavior
The agent builds a map of where it was already and avoids those states.
![Page 25: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/25.jpg)
Expected Novelty
Episode 1 Everything unknown
![Page 26: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/26.jpg)
Expected Novelty
Episode 1 Consider options
![Page 27: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/27.jpg)
Expected Novelty
Episode 1 Execute plan
![Page 28: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/28.jpg)
Expected Novelty
Episode 1 Observe new data
![Page 29: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/29.jpg)
Expected Novelty
Episode 2 Consider options
![Page 30: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/30.jpg)
Expected Novelty
Episode 2 Execute plan
![Page 31: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/31.jpg)
Expected Novelty
Episode 2 Observe new data
![Page 32: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/32.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Ensemble of Dynamics Models
![Page 33: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/33.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Capture uncertainty as an ensemble of non-linear Gaussian predictors
Ensemble of Dynamics Models
![Page 34: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/34.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Capture uncertainty as an ensemble of non-linear Gaussian predictors
Ensemble of Dynamics Models
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertaintytotal uncertainty
![Page 35: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/35.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Capture uncertainty as an ensemble of non-linear Gaussian predictors
Ensemble of Dynamics Models
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertaintytotal uncertainty
Information gain targets uncertain trajectories with low expected noise
![Page 36: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/36.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Capture uncertainty as an ensemble of non-linear Gaussian predictors
Ensemble of Dynamics Models
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertaintytotal uncertainty
Information gain targets uncertain trajectories with low expected noise
Wide predictions mean high expected noiseOverlapping modes means less total uncertainty
![Page 37: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/37.jpg)
Learn dynamics both to represent knowledge and to plan for expected infogain
Capture uncertainty as an ensemble of non-linear Gaussian predictors
Ensemble of Dynamics Models
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertaintytotal uncertainty
Information gain targets uncertain trajectories with low expected noise
Wide predictions mean high expected noiseOverlapping modes means less total uncertainty
Narrow predictions mean low expected noiseDistant modes means large total uncertainty
![Page 38: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/38.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertaintytotal uncertainty
![Page 39: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/39.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
total uncertainty
![Page 40: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/40.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
total uncertainty
![Page 41: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/41.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty:
total uncertainty
![Page 42: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/42.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: ?
total uncertainty
![Page 43: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/43.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: 1/K Σ H(p(X | W=wk, A=a))
total uncertainty
![Page 44: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/44.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: 1/K Σ H(p(X | W=wk, A=a))
Total uncertainty:
total uncertainty
![Page 45: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/45.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: 1/K Σ H(p(X | W=wk, A=a))
Total uncertainty: ?
total uncertainty
![Page 46: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/46.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: 1/K Σ H(p(X | W=wk, A=a))
Total uncertainty: H(1/K Σ p(X | W=wk, A=a))
total uncertainty
![Page 47: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/47.jpg)
Expected Infogain Approximation
I(X; W | A=a) = H(X | A=a) − H(X | W, A=a)epistemic uncertainty aleatoric uncertainty
Ensemble members: p(X | W=wk, A=a)
Aggregate prediction: p(X | A=a) = 1/K Σ p(X | W=wk, A=a)
Aleatoric uncertainty: 1/K Σ H(p(X | W=wk, A=a))
Total uncertainty: H(1/K Σ p(X | W=wk, A=a))
total uncertainty
Gaussian entropy has a closed form, so we can compute the aleatoric uncertainty.GMM entropy does not, sample it or switch to Renyi entropy that has a closed form.
![Page 48: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/48.jpg)
![Page 49: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/49.jpg)
Compared Algorithms
Learning from imaginedtrajectories (Expected)
MAX: JSD infogainTVAX: State variance
Learning from experiencereplay (Retrospective)
JDRX: JSD infogainPERX: Prediction error
![Page 50: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/50.jpg)
Exploration Chain Domain
+0.001 +1
![Page 51: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/51.jpg)
State coverage of Ant Maze
![Page 52: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/52.jpg)
model-free with 10x data
Zero-Shot Adaptation
no exploration needed exploration needed
Learn evaluation policy inside of learned model given a known reward function
![Page 53: Model-Based Active Exploration Presentation by Danijar ... · Without rewards, the agent can only learn about the environment. A model W represents our knowledge. E.g.: input density,](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1a81694a6a692329148303/html5/thumbnails/53.jpg)
ConclusionsInformation gain is a principled task-agnostic objective
As a non-stationary objective, it should be optimized in expectation
This requires a dynamics model for planning to explore
Ensemble of Gaussian dynamics is a practical way to represent uncertainty