![Page 1: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/1.jpg)
M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree‡ , J. Clemons‡ , J.Kautz‡
†University of Illinois at Urbana-Champaign, USA ‡ NVIDIA, USA
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
An ICLR 2017 paper
A github project
![Page 2: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/2.jpg)
M. Babaeizadeh†,‡, I.Frosio‡, S.Tyree‡, J. Clemons‡, J.Kautz‡
†University of Illinois at Urbana-Champaign, USA ‡ NVIDIA, USA
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
![Page 3: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/3.jpg)
M. Babaeizadeh†,‡, I.Frosio‡, S.Tyree‡, J. Clemons‡, J.Kautz‡
†University of Illinois at Urbana-Champaign, USA ‡ NVIDIA, USA
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
![Page 4: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/4.jpg)
4
GPU-BASED A3C FORDEEP REINFORCEMENT LEARNING
![Page 5: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/5.jpg)
5
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
Learning to accomplish a task
Image from www.33rdsquare.com
![Page 6: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/6.jpg)
6
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
Definitions
St
at = p(St)
~Rt
StRL agent
✓ Environment
✓ Agent
✓ Observable status St
✓ Reward Rt
✓ Action at
✓ Policy at = p(St)
, Rt
![Page 7: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/7.jpg)
7
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
Definitions
St, Rt
at = p(St)
~Rt
StDeep RL agent
![Page 8: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/8.jpg)
8
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
Definitions
St, Rt
at = p(St)
Dp(∙)
R0 R1 R2 R3 R4
~Rt
St
![Page 9: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/9.jpg)
9
GPU-BASED A3C FORDEEP REINFORCEMENT LEARNING
![Page 10: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/10.jpg)
10
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
Asynchronous Advantage Actor-Critic (Mnih et al., arXiv:1602.01783v2, 2015)
Dp(∙)
p’(∙)Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 11: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/11.jpg)
11
GPU-BASED A3C FORDEEP REINFORCEMENT LEARNING
![Page 12: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/12.jpg)
12
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
The GPU
![Page 13: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/13.jpg)
13
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
LOW OCCUPANCY
LOW OCCUPANCY (33%)
![Page 14: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/14.jpg)
14
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
HIGH OCCUPANCY (100%)
Batc
h s
ize
![Page 15: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/15.jpg)
15
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
HIGH OCCUPANCY (100%), LOW UTILIZATION (40%)
time time
![Page 16: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/16.jpg)
16
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
HIGH OCCUPANCY (100%), HIGH UTILIZATION (100%)
timetime
![Page 17: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/17.jpg)
17
GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING
BANDWIDTH LIMITED
timetime
![Page 18: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/18.jpg)
18
+
Dp(∙)
p’(∙)Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 19: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/19.jpg)
19
REGRESSION, CLASSIFICATION, …
MAPPING DEEP PROBLEMS TO A GPU
REINFORCEMENT LEARNING
Pear, pear, pear, pear, …
Empty, empty, …
Fig, fig, fig, fig, fig, fig,
Strawberry, Strawberry,
Strawberry, …
…
data
labels
100% utilization / occupancy
status,
reward
action
?
![Page 20: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/20.jpg)
20
A3C
Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 21: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/21.jpg)
21
A3C
Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 22: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/22.jpg)
22
A3C
Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 23: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/23.jpg)
23
A3C
Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 24: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/24.jpg)
24
A3C
Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
Dp(∙)
![Page 25: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/25.jpg)
25
A3C
Dp(∙)
p’(∙)Master
model
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
St, Rt
R0 R1 R2 R3 R4at = p(St)
…
Agent
1A
gent
2A
gent
16
![Page 26: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/26.jpg)
26
GPU-based A3C
GA3C
El Capitan big wall, Yosemite Valley
![Page 27: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/27.jpg)
27
GA3C (INFERENCE)
Master
model
… …
St
predictors
{St}
prediction queue
Agent
1A
gent
2
…
Agent
N
{at}
at
![Page 28: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/28.jpg)
28
GA3C (TRAINING)
… …
R0 R1 R2
R4
trainers
{St,Rt}
training queue
Dp(∙)
Agent
1A
gent
2
…
Agent
N
St,Rt
Master
model
![Page 29: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/29.jpg)
29
GA3C
Master
model
… …
St
predictors
{St}
prediction queue
Agent
1A
gent
2
…
Agent
N
{at}
at
… …
R0 R1 R2
R4
trainers
{St,Rt}
training queue
Dp(∙)
![Page 30: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/30.jpg)
30
GPU-based A3C
GA3C
El Capitan big wall, Yosemite Valley
![Page 31: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/31.jpg)
31
GPU-based A3C
GA3C
El Capitan big wall, Yosemite Valley
Learn how to balance
![Page 32: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/32.jpg)
32
GA3C: PREDICTIONS PER SECOND (PPS)
Master
model
… …
St
predictors
{St}
prediction queue
Agent
1A
gent
2
…
Agent
N
{at}
at
… …
R0 R1 R2
R4
trainers
{St,Rt}
training queue
Dp(∙)
![Page 33: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/33.jpg)
33
GA3C: TRAININGS PER SECOND (TPS)
Master
model
… …
St
predictors
{St}
prediction queue
Agent
1A
gent
2
…
Agent
N
{at}
at
… …
R0 R1 R2
R4
trainers
{St,Rt}
training queue
Dp(∙)
![Page 34: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/34.jpg)
34
AUTOMATIC SCHEDULINGBalancing the system at run time
ATARI Boxing ATARI Pong
NP = # predictors, NT = # trainers, NA = # agents, TPS = training per seconds
![Page 35: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/35.jpg)
35
THE ADVANTAGE OF SPEEDMore frames = faster convergence
![Page 36: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/36.jpg)
36
LARGER DNNS
A3C (ATARI) Others (robotics)
Timothy P. Lillicrap et al., Continuouscontrol with deep reinforcement learning,International Conference on LearningRepresentations, 2016.
S. Levine et al., End-to-end training of deepvisuomotor policies, Journal of MachineLearning Research, 17:1-40, 2016.
For real world applications (e.g. robotics, automotive)
Conv 16
8x8
filters,
stride 4
Conv 32
4x4
filters,
stride 2
FC 256
Conv 32
8x8
filters,
stride 1,
2, 3, 4
Conv 32
4x4
filters,
stride 2
FC 256
Conv 64
4x4
filters,
stride 2 ?
![Page 37: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/37.jpg)
37
GA3C VS. A3C*: PREDICTIONS PER SECONDS* Our Tensor Flow implementation on a CPU
1
10
100
1000
10000
Small DNN Large DNN,stride 4
Large DNN,stride 3
Large DNN,stride 2
Large DNN,stride 1
4x 11x 12x20x
45x
PP
S
A3C
GA3C
![Page 38: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/38.jpg)
38
CPU & GPU UTILIZATION IN GA3CFor larger DNNs
0
10
20
30
40
50
60
70
80
90
100
Small DNN Large DNN,stride 4
Large DNN,stride 3
Large DNN,stride 2
Large DNN,stride 1
Uti
lizat
ion
(%
)
CPU %
GPU %
![Page 39: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/39.jpg)
39
Master
model
… …
St
predictors
{St}
prediction queue
Agent
1A
gent
2
…
Agent
N
{at}
… …
R0 R1 R2
R4
trainers
{St,Rt}
training queue
Dp(∙)
GA3C POLICY LAGAsynchronous playing and training
DNN A
DNN B
![Page 40: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/40.jpg)
40
STABILITY AND CONVERGENCE SPEEDReducing policy lag through min training batch size
![Page 41: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/41.jpg)
41
GPU-based A3C
GA3C (45x faster)
El Capitan big wall, Yosemite Valley
Balancing computational
resources, speed and
stability.
![Page 42: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/42.jpg)
42
THEORY
RESOURCES
M. Babaeizadeh, I. Frosio, S. Tyree, J.Clemons, J. Kautz, ReinforcementLearning through AsynchronousAdvantage Actor-Critic on a GPU,ICLR 2017 (available athttps://openreview.net/forum?id=r1VGvBcxl¬eId=r1VGvBcxl).
CODING
GA3C, a GPU implementation of A3C (open source at https://github.com/NVlabs/GA3C).
A general architecture to generate and consume training data.
![Page 43: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/43.jpg)
43
QUESTIONS
QUESTIONS
Policy lag
Multiple GPUs
Why TensorFlow
Replay memory
…
ATARI 2600
Github: https://github.com/NVlabs/GA3C ICLR 2017: Reinforcement Learning through
Asynchronous Advantage Actor-Critic on a GPU.
![Page 44: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/44.jpg)
BACKUP SLIDES
![Page 45: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/45.jpg)
45
POLICY LAG IN GA3C
Potentially large time lag between training data generation and network update
![Page 46: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/46.jpg)
46
SCORES
![Page 47: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/47.jpg)
47
Training a larger DNN
![Page 48: GPU-BASED A3C FOR DEEP REINFORCEMENT LEARNING · M. Babaeizadeh†,‡ , I.Frosio‡ , S.Tyree ‡ , J. Clemons , J.Kautz‡ †University of Illinois at Urbana-Champaign, USA ‡](https://reader034.vdocuments.site/reader034/viewer/2022042923/5f6f599dcffb822e781c2de4/html5/thumbnails/48.jpg)
48
BALANCING COMPUTATIONAL RESOURCESThe actors: CPU, PCI-E & GPU
Trainings per second Training queue size Prediction queue size