evolving heuristics for searching games

Evolving Heuristics forSearching Games

Evolutionary Computation and Artificial Life

Supervisor: Moshe Sipper

Achiya ElyasafJune, 2010

2

Overview

Searching Games State-Graphs• Representation• Uninformed Search• Heuristics• Informed Search

Rush Hour• Domain Specific Heuristic• Evolving Heuristics• Coevolving Game Boards• Results

Freecell• Domain Specific Heuristic• Coevolving Game Boards• Learning Methods• Results

3

Every puzzle/game can be represented as a state graph:

• Single player games such as puzzles, board games etc.: every piece move can be counted as a different state

• Multi player games such as chess, robocode etc. – the place of the player / the enemy, rest of the parameters (health, shield…) define a state

Searching Games State-GraphsRepresentation

4


Rush Hour:

5


Blocksworld:

6

Searching Games State-GraphsUninformed Search

BFS – Exponential in the search depth DFS – Linear in the length of the current search

path. BUT:• We might “never” track down the right path.• Usually games contain cycles

Iterative Deepening: Combination of BFS & DFS• Each iteration DFS with a depth limit is performed.• Limit grows from one iteration to another

• Worst case - traverse the entire graph

7

Searching Games State-GraphsUninformed Search

Most of the game domains are PSPACE-Complete!

Worst case - traverse the entire graph We need an informed-search!

8

Searching Games State-GraphsHeuristics

h:states -> Real. • For every state s, h(s) is an estimation of the

minimal distance/cost from s to a solution• h is perfect: an informed search that tries states

with highest h-score first – will simply stroll to solution

• Bad heuristic means the search might never get to answer

• For hard problems, finding h is hard

We need a good heuristic function to guide informed search

10

Searching Games State-Graphs Informed Search (Cont.)

IDA*: Iterative-Deepening with A*• The expanded nodes are pushed to the DFS stack

by descending heuristic values• Let g(si) be the min depth of state si: Only nodes

with f(s)=g(s)+h(s)<depth-limit are visited

Near optimal solution (depends on path-limit) The heuristic need to be admissible

14

Overview




15

Rush HourDomain Specific Heuristic

GP-Rush [Hauptman et al, 2009]Hand Crafted heuristics: Goal distance – Manhattan distance Blocker estimation – lower bound

(Admissble) Hybrid blockers distance – combine the two

above Is Move To Secluded – did the car enter a

secluded area Is Releasing move

20

For H1, … , Hn – building blocksHow should we choose the fittest heuristic?• Minimum? Maximum? Linear combination?

GA/GP may be used for:1. Building new heuristics from existing building blocks2. Finding weights for each heuristic (for applying

linear combination)3. Finding conditions for applying each

• Probably, H should fit stage of search• E.g. “goal” heuristics when assuming we’re close

GA/GP

21

GA/GP (Cont.)

If

And

≤

H1 0.4

≥

H2 0.7

+

H3 *

H1 0.5

*

H5 /

H1 0.1

Condition True

False

22

GA/GP (Cont.)Back to Rush Hour

Functions & Terminals:

Genetic Operators: Cross-Over & Mutation on trees as Koza describes

Conditions ResultsTerminals IsMoveToSecluded, isReleasingMove, g,

PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel,

BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, … , 0.9 , 1

BlockersLowerBound, GoalDistance, Hybrid,

0, 0.1, … , 0.9 , 1

Sets If, AND , OR , ≤ , ≥ + , *

23

Fitness measure? Cross-over? Mutation?

GA/GP (Cont.)Policies

Condition ResultCondition 1 Heuristics Weights 1Condition 2 Heuristics Weights 2

Condition n Heuristics Weights nDefault Heuristics Weights

.

.

.

.

.

.

24

Co-Evolving Difficult Solvable 8x8 Boards

Our enhanced IDA* search solved over 90% of the 6x6 problems

We wanted to demonstrate our method’s scalability to larger boards

24

25

Co-Evolving Difficult Solvable 8x8 Boards

Fitness measure? Cross-over? Mutation?

25

C

B AP

M

IK

S F GH

F

C

B AP

M

IK

S F GH

F

26

Rush Hour Results

Average percentage of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search:

27

Rush Hour Results (Cont.)

Time (in seconds) required to solve problems JAM01 . . . JAM40:

28

Overview




29

FreecellIntro

FreeCell remained relatively obscure until Windows 95

There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, which has eluded solution so far

3030

Freecells Foundations

Cascades

FreecellIntro (Cont.)

31

Lowest card at Foundations Number of well placed cards Num of cards not at Foundations Num of Freecells and free Cascades Sum of the Cascades bottom cards Highest home card – lowest home card

31

FreecellHeuristics

32

As opposed to Rush-Hour, blind search could not solve even one problem

The best solver to date solves 89% of Microsoft 32K

Reasons:• High branching factor• Hard to generate a good heuristic

FreecellLearning methods

33

In Rush Hour:• Hyper-Heuristics population• Each generation – all individuals solve 5

different randomly selected instances• Test set - 20% of the problems• Training set – the rest

In Freecell:• This method failed


34

First try:

Sort the problems by difficulty Learn gradually the whole training set

FAILED:• Days of training• Over fitting and forgetness


35

Second try:

Co-evolution:• First population – Hyper-Heuristics• Second population – Game boards with Hillis

“Hall of Fame”

FAILD:• Ambiguous reason for low fitness


36

Third try:

Co-evolution:• First population – Hyper-Heuristics• Second population – Group of 8 game boards

SUCCESS:• Fast learning process• No ambiguity• We create the right competioin


37

Freecell Results

Reduction

RunNode

reductionTime

reductionSolution Length

% of solved problems

HSD 100% 100% 100% 89%GA-1 23% 31% 1% 71%GA-2 23% 30% -3% 70%GP - - - -

Policy 28% 36% 6% 74%GA with

Co-Evolution 60% 69% 37% 98%

Policy withCo-Evolution 59% 69% 30% 99%

evolving heuristics for searching games

Documents

games stategraphsheuristicsh

state graph

board games

single player games

depth of state si

current search path

search depthdfs linear

open nodes