deep reinforcement learning

Deep RLAbecon 20.05.2016

RL-type Problems

• game of chess, GO, Space Invaders

• balancing a unicycle

• investing in stock market

• running a business

• making fast food

• life…!

Markov Decision Process

• S - set of states

• A - set of actions (or actions for state)

• P(s, s’ | a) - state change

• R(s, s’ | a) - reward

• ∈ [0, 1] - discount factor

Maximize the total discounted reward:

The GOAL

: discount factor0 instant gratification

patience1

Value Functions

http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html

http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Q-Learning

http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Reinforce.jshttp://cs.stanford.edu/people/karpathy/reinforcejs/

// create an environment objectvar env = {};env.getNumStates = function() { return 8; }env.getMaxNumActions = function() { return 4; }

// create the DQN agentvar spec = { alpha: 0.01 } agent = new RL.DQNAgent(env, spec);

setInterval(function(){ // start the learning loop var action = agent.act(s); // s is an array of length 8 agent.learn(reward);}, 0);

2013: Deep RL

http://arxiv.org/abs/1312.5602

2014: Google buys DeepMind

2015: AlphaGO

Deep Q-Learning1. Do a feedforward pass for the current state s to get predicted Q-values

for all actions.

2. Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’).

3. Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs.

4. Update the weights using backpropagation.

http://www.nervanasys.com/demystifying-deep-reinforcement-learning/

Deep Q-Learning

http://www.nervanasys.com/demystifying-deep-reinforcement-learning/

https://www.youtube.com/watch?v=32y3_iyHpBc

http://gabrielecirulli.github.io/2048/

Asynchronous Gradient Descent

http://arxiv.org/abs/1602.01783

http://www.rethinkrobotics.com/baxter/

deep reinforcement learning

Science

deep reinforcement learning with double q-learning

deep learning and reinforcement learning

survey of deep reinforcement learning for motion planning...

towards deeper deep reinforcement learning

10703 deep reinforcement learning and...

deep reinforcement learning for robotics

deep reinforcement learning an introduction

towards deep symbolic reinforcement learning

exploration in deep reinforcement learning ·...

learning about (deep) reinforcement...

reinforcement learning: cnns and deep q learning

thinking while moving: deep reinforcement learning in ......

reinforcement learning - deep reinforcement...

deep reinforcement learning at scale - github pages · deep...

deep reinforcement learning from human...

(deep) reinforcement learning - computer vision lab....

hierarchical deep reinforcement learning: integrating...

10703 deep reinforcement learning

from reinforcement learning to deep reinforcement...

deep reinforcement learning in system optimization ·...