monte-carlo go reinforcement learning experimentsgames/go/seminar/notes/...reinforcement learning...

Background Monte-Carlo Go Reinforcement Learning and Monte-Carlo Go Experiments Conclusion

Monte-Carlo Go Reinforcement LearningExperiments

Paper by Bruno Bouzy, Guillaume Chaslot, IEEE 2006

Presented by Markus Enzenberger.Go Seminar, University of Alberta.

November 16, 2006

Paper by Bruno Bouzy, Guillaume Chaslot, IEEE 2006 Monte-Carlo Go Reinforcement Learning Experiments

BackgroundIndigo project

Monte-Carlo GoBasic Monte-Carlo GoMonte-Carlo Go with specific knowledge

Reinforcement Learning and Monte-Carlo GoThe randomness dimension

ExperimentsExperiment 1: On program vs. populationExperiment 2: relative difference at MC level

Conclusion

Indigo project

Basic Monte-Carlo Go

I General MC model (Abramson)

I all-moves-as-first heuristic (Brugmann)

I Standard deviation, random games, 9×9 ≈ 35

I Precision: 1 point

I 1000 games ≡ 68 % confidence

I 4000 games ≡ 95 % confidence

Progressive pruning

Advantages of MC

I Bouzy, Helmstetter: MC programs on par with Indigo2002

I MC takes advantage of faster computers

I Heavily knowledge based programs less robust

I Global view (avoids errors due to decomposition)

I Easy to develop

Monte-Carlo Go with specific knowledge

Indigo2003

Two ways of associating Go knowledge with MC

I Pre-select moves for MC

I Insert knowledge in random games(“pseudo-random” := non-uniform probability)

Two Modules of Indigo2003

Pseudo-Random (PR) player

I Capture-escape urgency: fill last liberty of one-liberty stringurgency linear in string size

I Cut-connect urgency: 3×3 patterns(smaller: insufficient; larger: lower urgency)

I Total urgency is sum of both urgencies

I Probability of move proportional to total urgency

MC (Manual) vs MC(Zero)

I Manual: urgencies assigned to 3×3 patterns assigned byhuman expert

I correspond to Go concepts such as cut and connect

I contains non-zero and high values only very sparsely

Reinforcement Learning and Monte-Carlo Go

I Q-learning for learning action values

I p1 better than p2 (at random level)does not necessarily mean thatMC (p1) better than MC (p2)(atMClevel)

I Randomization vs. determinism

The randomness dimension

Experiment 1: On program vs. population

Experiment 1: One program vs. population

I Patterns have action values Q ∈]− 1,+1[I Urgency:

U = (1 + Q

1− Q)k

I Value updates after playing a block of b games:Qplay = Qplay + λbQlearn

Experiment 1a: one unique learning program

I RLPR � Zero

I RLPR < Manual

I MC(RLPR) � MC(Manual)

I Highly dependent on initial conditions

I Q-values learnt in first block,later only these Q-values are increased

Experiment 1b: a population of learning programs

I Evolutionary computing

I Test against Zero and Manual

I Select and duplicate best programs according to some scheme

Starting population = Zero

I RLPR = Zero + 180

I RLPR = Manual - 50

I MC(RLPR) � MC(Manual)

Starting population = Manual

I RLPR = Zero + 180

I RLPR = Manual + 50

I MC(RLPR) = MC(Manual) - 20

I RLPR programs have a tendency to determinism

I Using EC lowers the problem but does not remove itcompletely

Experiment 2: relative difference at MC level

I MC(RLPR) playes agains itself again

I Update rule to avoid determinismfor a pair of patterns a, b:

exp(C (Va − Vb)) = ua/ub

δ = Qa − Qb − C (Va − Vb)

Qa = Qa − αδ

Qb = Qb − αδ

Experiment 2: relative difference at MC level

I MC(RLPR) = MC(Manual) + 3

19×19 (learning on 9×9)

I MC(RLPR) = MC(Manual) - 30

Conclusion

I Determinism was identified as obstacle

I Evolutionary computing was used to avoid determinism(strong dependence on intial conditions)

I Relative differences were used to avoid determinism

monte-carlo go reinforcement learning experimentsgames/go/seminar/notes/...reinforcement learning...

Documents

reinforcement learning das reinforcement learning-problem...

reinforcement learning lecture inverse reinforcement...

reinforcement learning and deep reinforcement...

eick: reinforcement learning. reinforcement learning...

a contribution to reinforcement learning; application to...

inverse reinforcement learning -...

reinforcement learning introduction passive reinforcement...

cooperative inverse reinforcement learning...cooperative...

coevolutionary reinforcement learning for othello and ......

bayesian reinforcement learning -...

reinforcement learning chapter 13 what is reinforcement...

reinforcement learning - multi-agent reinforcement

reinforcement learning: learning algorithms

inverse reinforcement learning cs885 reinforcement

deep learning for reinforcement learning in · pdf filedeep...

from reinforcement learning to deep reinforcement...

generalization in reinforcement learning: successful...

reinforcement learning or active...

a general reinforcement learning ... -...

reinforcement learning