evolving heuristics for searching games
DESCRIPTION
Evolving Heuristics for Searching Games. Evolutionary Computation and Artificial Life Supervisor : Moshe Sipper Achiya Elyasaf June, 2010. Overview. Searching Games State-Graphs Representation Uninformed Search Heuristics Informed Search Rush Hour Domain Specific Heuristic - PowerPoint PPT PresentationTRANSCRIPT
Evolving Heuristics forSearching Games
Evolutionary Computation and Artificial Life
Supervisor: Moshe Sipper
Achiya ElyasafJune, 2010
2
Overview
Searching Games State-Graphs• Representation• Uninformed Search• Heuristics• Informed Search
Rush Hour• Domain Specific Heuristic• Evolving Heuristics• Coevolving Game Boards• Results
Freecell• Domain Specific Heuristic• Coevolving Game Boards• Learning Methods• Results
3
Every puzzle/game can be represented as a state graph:
• Single player games such as puzzles, board games etc.: every piece move can be counted as a different state
• Multi player games such as chess, robocode etc. – the place of the player / the enemy, rest of the parameters (health, shield…) define a state
Searching Games State-GraphsRepresentation
4
Searching Games State-GraphsRepresentation
Rush Hour:
5
Searching Games State-GraphsRepresentation
Blocksworld:
6
Searching Games State-GraphsUninformed Search
BFS – Exponential in the search depth DFS – Linear in the length of the current search
path. BUT:• We might “never” track down the right path.• Usually games contain cycles
Iterative Deepening: Combination of BFS & DFS• Each iteration DFS with a depth limit is performed.• Limit grows from one iteration to another
• Worst case - traverse the entire graph
7
Searching Games State-GraphsUninformed Search
Most of the game domains are PSPACE-Complete!
Worst case - traverse the entire graph We need an informed-search!
8
Searching Games State-GraphsHeuristics
h:states -> Real. • For every state s, h(s) is an estimation of the
minimal distance/cost from s to a solution• h is perfect: an informed search that tries states
with highest h-score first – will simply stroll to solution
• Bad heuristic means the search might never get to answer
• For hard problems, finding h is hard
We need a good heuristic function to guide informed search
10
Searching Games State-Graphs Informed Search (Cont.)
IDA*: Iterative-Deepening with A*• The expanded nodes are pushed to the DFS stack
by descending heuristic values• Let g(si) be the min depth of state si: Only nodes
with f(s)=g(s)+h(s)<depth-limit are visited
Near optimal solution (depends on path-limit) The heuristic need to be admissible
14
Overview
Searching Games State-Graphs• Representation• Uninformed Search• Heuristics• Informed Search
Rush Hour• Domain Specific Heuristic• Evolving Heuristics• Coevolving Game Boards• Results
Freecell• Domain Specific Heuristic• Coevolving Game Boards• Learning Methods• Results
15
Rush HourDomain Specific Heuristic
GP-Rush [Hauptman et al, 2009]Hand Crafted heuristics: Goal distance – Manhattan distance Blocker estimation – lower bound
(Admissble) Hybrid blockers distance – combine the two
above Is Move To Secluded – did the car enter a
secluded area Is Releasing move
20
For H1, … , Hn – building blocksHow should we choose the fittest heuristic?• Minimum? Maximum? Linear combination?
GA/GP may be used for:1. Building new heuristics from existing building blocks2. Finding weights for each heuristic (for applying
linear combination)3. Finding conditions for applying each
• Probably, H should fit stage of search• E.g. “goal” heuristics when assuming we’re close
GA/GP
21
GA/GP (Cont.)
If
And
≤
H1 0.4
≥
H2 0.7
+
H3 *
H1 0.5
*
H5 /
H1 0.1
Condition True
False
22
GA/GP (Cont.)Back to Rush Hour
Functions & Terminals:
Genetic Operators: Cross-Over & Mutation on trees as Koza describes
Conditions ResultsTerminals IsMoveToSecluded, isReleasingMove, g,
PhaseByDistance, PhaseByBlockers, NumberOfSyblings, DifficultyLevel,
BlockersLowerBound, GoalDistance, Hybrid, 0, 0.1, … , 0.9 , 1
BlockersLowerBound, GoalDistance, Hybrid,
0, 0.1, … , 0.9 , 1
Sets If, AND , OR , ≤ , ≥ + , *
23
Fitness measure? Cross-over? Mutation?
GA/GP (Cont.)Policies
Condition ResultCondition 1 Heuristics Weights 1Condition 2 Heuristics Weights 2
Condition n Heuristics Weights nDefault Heuristics Weights
.
.
.
.
.
.
24
Co-Evolving Difficult Solvable 8x8 Boards
Our enhanced IDA* search solved over 90% of the 6x6 problems
We wanted to demonstrate our method’s scalability to larger boards
24
25
Co-Evolving Difficult Solvable 8x8 Boards
Fitness measure? Cross-over? Mutation?
25
C
B AP
M
IK
S F GH
F
C
B AP
M
IK
S F GH
F
26
Rush Hour Results
Average percentage of nodes required to solve test problems, with respect to the number of nodes scanned by a blind search:
27
Rush Hour Results (Cont.)
Time (in seconds) required to solve problems JAM01 . . . JAM40:
28
Overview
Searching Games State-Graphs• Representation• Uninformed Search• Heuristics• Informed Search
Rush Hour• Domain Specific Heuristic• Evolving Heuristics• Coevolving Game Boards• Results
Freecell• Domain Specific Heuristic• Coevolving Game Boards• Learning Methods• Results
29
FreecellIntro
FreeCell remained relatively obscure until Windows 95
There are 32,000 solvable problems (known as Microsoft 32K), except for game #11982, which has eluded solution so far
3030
Freecells Foundations
Cascades
FreecellIntro (Cont.)
31
Lowest card at Foundations Number of well placed cards Num of cards not at Foundations Num of Freecells and free Cascades Sum of the Cascades bottom cards Highest home card – lowest home card
31
FreecellHeuristics
32
As opposed to Rush-Hour, blind search could not solve even one problem
The best solver to date solves 89% of Microsoft 32K
Reasons:• High branching factor• Hard to generate a good heuristic
FreecellLearning methods
33
In Rush Hour:• Hyper-Heuristics population• Each generation – all individuals solve 5
different randomly selected instances• Test set - 20% of the problems• Training set – the rest
In Freecell:• This method failed
FreecellLearning methods
34
First try:
Sort the problems by difficulty Learn gradually the whole training set
FAILED:• Days of training• Over fitting and forgetness
FreecellLearning methods
35
Second try:
Co-evolution:• First population – Hyper-Heuristics• Second population – Game boards with Hillis
“Hall of Fame”
FAILD:• Ambiguous reason for low fitness
FreecellLearning methods
36
Third try:
Co-evolution:• First population – Hyper-Heuristics• Second population – Group of 8 game boards
SUCCESS:• Fast learning process• No ambiguity• We create the right competioin
FreecellLearning methods
37
Freecell Results
Reduction
RunNode
reductionTime
reductionSolution Length
% of solved problems
HSD 100% 100% 100% 89%GA-1 23% 31% 1% 71%GA-2 23% 30% -3% 70%GP - - - -
Policy 28% 36% 6% 74%GA with
Co-Evolution 60% 69% 37% 98%
Policy withCo-Evolution 59% 69% 30% 99%