reinforcement learning for the soccer dribbling task

A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures

Reinforcement Learning for the Soccer Dribbling TaskArthur CarvalhoRenato Oliveira1IntroductionRoboCup soccer simulationScoringA Data Mining Approach to Solve the Goal Scoring ProblemPassingA NewPassingStrategy Based on Q-Learning Algorithm inRoboCupDribbling?This research started when we were developing a RoboCup soccer simulation team. Our idea was to create a team where all the relevant soccer skills could be in someway learned by the soccer agents, pretty much like humans do.We then listed a couple of individual skills that we considered relevant like Scoring skills, Passing and dribbling.For most of the skills, we realized that a lot of work had been done before, except for the dribbling.Perhaps, the reason is that it is not easy to define dribble. For example, how can we measure this skill? Or how can we possibly say that an agent is a good dribbler. 2Soccer Dribbling Task

Thats when we defined the soccer dribbling task. The idea is that we have a soccer agent, the dribbler, who must go from the beginning to the end of a region, in this case this rectangle, keeping possession of the ball, while an adversary attempts to gain possession. Now, we can measure the quality of the dribbling skill of a soccer agent. We just need to count the number of times that the dribbler successfully performed this task.

3OutlineThe soccer dribbling task as a RL problemRL solutionExperimentsConclusionNow, Id like to talk about how we map the soccer dribbling task onto a reinforcement learning framework4The Soccer Dribbling Task as a RL ProblemCoachSetting positionsDribbler is placed in the center-left region together with the ballAdversary is placed in a random positionManage the playAdversary wins when he gains possession or when the ball goes out of the field Dribbler wins when he crosses the field with the ballFor doing this, we use a coach, who is responsible for setting the positions of the players and of the ball inside the field in the beginning of each episodeThe dribbler is placed in the center-left region of the field together with the ballAdversary is placed in a random positionThe coach is also responsible for managing the learning processWhenever the adversary gains possession of when the ball goes out of the field, then the adversary is the winner of the episodeWhen the dribbler crosses the entire field with the ball, then he is the winner of the episode.5The Soccer Dribbling Task as a RL ProblemWhen an episode ends, the coach starts a new oneRoboCup soccer simulator operates in discrete time stepsEpisodic reinforcement-learning frameworkWhenever an episode ends, the coach starts a new one. So, we have a series of episodesAnother important point is that, the RoboCup soccer simulator operates in discrete time stepsThese two facts together allow us to map the soccer dribbling task onto an episodic reinforcement learning framework.Consequently, we can use traditional RF algortihms to solve this problem6The Soccer Dribbling Task as a RL ProblemActionsHoldBall()Dribble(, k)Dribble(30, 5), Dribble(330, 5), Dribble(0, 5), Dribble(0, 10)The dribbler can kick the ball forward (strongly and weakly), diagonally upward, and diagonally downward.

Now, we need to define the actions available to the dribbler.The first one is holdball, where the dribbler holds the ball close to his body, keeping it in a position that is difficult for the adversary to gain possession;The second one is Dribble, where the dribbler turns its body towards the global angle alpha, kicks the ball k meters ahead of him, and moves to intercept the ball.For the second action, we used 4 different parametersIn words, this means that the dribbler can kick the ball forward (strongly and weakly), diagonally upward, and diagonally downward.

7The Soccer Dribbling Task as a RL ProblemState VariableMeaning posY (dribbler)Vertical position of the dribblerang(dribbler)Global angle of the dribblerang(dribbler; adversary)The relative angle between the dribbler and the adversaryang(ball; adversary)The relative angle between the ball and the adversarydist(ball; adversary)Distance between the ball and the adversaryNow, we need to define the state representation used by the dribbler.We used 5 state variables, where they help the dribbler to locate himself and the adversary inside the field8OutlineThe soccer dribbling task as a RL problemRL solutionExperimentsConclusionNow, let me talk about our Reinforcement learning solution to the soccer dribbling task9RL SolutionIt combines the reinforcement learning algorithm Sarsa with CMAC for function approximationIn detail, Sarsa estimates the traditional action-value function Q for the current policy pi and for all state-action pairs (s,a)The updating happens as follows: the value of the action a at time t given the state s is updated to its previous value plus a constant alpha times the traditional temporal-difference error delta, where alpha is the learning rate parameterDelta in turns is equal to the reward at time t + 1 plus the discount rate lamda times the current action-value minus the previous-oneA few words about the reward, we set it to 1 whenever the dribbler wins, -1 when the adversary wins, and 0 otherwise.10RL SolutionCMAC Partitioning the state space into several receptive fields (hyper-rectangles)Each one is associated with a weightMultiple partitions of the state space (layers) are usually usedThe CMACs response to a given input is equal to the sum of the weights of the excited receptive fields11RL Solution

I have an example here that may make this clear. We have two state variables, X and Y. The black dot can be seen as a state. It excites two receptive fields, one from each layer. And the response of cmac to that input is equal to the sum of the weights of the excited receptive fieldsI forgot to mention, but CMAC is trained by using the traditional delta rule12OutlineThe soccer dribbling task as a RL problemRL solutionExperimentsConclusionLet me talk about the experiments that we did.13ExperimentsWe used the standard RoboCup soccer simulatorA 20 by 20 regionThe dribbler used an epsilon greedy policy, where he chooses a random action 1% of the timeWe used alpha equal to 0.125And lambda equal to 114ExperimentsAdversaryFixed policyIt computes a near-optimal interception point (UvA Trilearn 2003 team)Two phasesTrainingTesting

The adversary used in our experiments uses a fixed policy, where he uses an iterative scheme to compute a near-optimal interception point to take the ball. It is based on UvA Trilearn 2003 team.Our experiments had two phases: training and testing

15ExperimentsTraining Phase: 5 independent runs, each one lasting 50,000 episodes

53%During the training phase, we run the experiment 5 independent times, each one having 50000 episodes.At the end, we computed the average number of wins by the dribblerHere is the histogram with that information. We used bins of 500 episodesAs we can see, the dribbler greatly improves his performance with the number of episodesAt the end of the training processhe is winning on average 53% of the time16ExperimentsQualitativelyRule #1

Qualitatively, the dribbler seems to learn 2 different rules. In the first one, the dribbler kicks the ball to the opposite side in which the adversary is located until the angle between them makes difficult for the adversary to intercept to ball. After that, the dribbler starts to kick the ball forward17ExperimentsQualitativelyRule #2

In the second one, the dribbler seems to keep holding the ball until he feels safe to go ahead straight. Usually this happens when the adversary is close to him18Experiments19OutlineThe soccer dribbling task as a RL problemRL solutionExperimentsConclusionConclusionDribble Soccer dribbling taskReinforcement learning solutionBenchmarkStart point for dribbling tasks in other sports gamesE.g., hockey, basketball, and footballIn conclusion, we studied dribble in the robocup soccer simulationWhere we defined the soccer dribbling taskAnd proposed a reinforcement learning solution. We got very results on that.We believe that this task can be used as benchmark for comparing different machine learning techniquesOur work can be a good start point for dribbling tasks in other sports games, like hockey and basketball

21Thank you!Source code available at:http://sites.google.com/site/soccerdribbling

Arthur Carvalho Renato Oliveira [email protected] [email protected] is the end of my presentation. Thank you for listing. And a brief note, the source code of my algorithm is available at this web address. Thank you22

reinforcement learning for the soccer dribbling task

Documents