artificial intelligence in game design n-grams and decision tree learning
TRANSCRIPT
Artificial Intelligence in Game Design
N-Grams and Decision Tree Learning
Predicting Player Actions
Type of games where works best:• Player has choice between few possible actions
– Attack left
– Attack right
• Character can take simultaneous counteraction– Each player action has “correct” counteraction
• Attack left defend left
• Attack right defend right
• Goal: Character should learn to “anticipate” current player action based on past actions
Probabilistic Approach
• Keep track of last n actions taken by player– Size of n is “window” of character memory
• Compute probability of next player action based on these– Base decision about counteraction on that probability
• Example: Last 10 player actions L L R L R L L L R L
• Estimated probabilities of next action:
Left attack 70%
Right attack 30%
Probabilistic Actions
• Simple majority approach– Since left attack has highest probability, defend left– Problem: Character will take same action for long periods
• Too predictable!
– Example: • Player’s next 4 attacks are from right
• Character will still choose defend left since left attacks still have highest probability
• Character looks very stupid!
player: L L R L R L L L R L R R R Rcharacter: L L L L
Left attack still majority player action in last 10 moves
Probabilistic Actions
• Probabilistic Approach:Choose action with same probability as corresponding player action
• Biased towards recent player actions• Far less predictable
– Player may notice that character is changing tactics without being aware of how this is done
Left attack 70%
Right attack 30%
Left defend 70%
Right defend 30%
Window Size
• Key question: What is good value for n?
L L R L R L L L R L R R L
• Can be too small (n = 2, for example)
L L R L R L L L R L R R L
• Can be too large (n = 20, for example)
LLLLLLLLLLLLRRRRRRRR
• No best solution– Will need to experiment
Size of “window” used to determine probabilities
Character has no “memory” of past player actions
Left defend 50%
Right defend 50%
Too slow to react to changes in player tactics
Left defend 60%
Right defend 40%
N-Grams
• Conditional probabilities based on sequence of user actions• Example:
– “Last two player attacks were left and then right”
– “What has player done next after last times the attacked left then right?”
• After last 4 left then right attacks:– Attacked right 3 times– Attacked left 1 time
• Conclusion: Player has 75% chance of attacking right next
N-Grams Example
• Example:– Window of memory = last 12 actions– Base decision on last two actions taken by player (past
sequence length = 2)
• Goal: Determine what action player is likely to take next given last two actions L R
– Previous actions:L L R L R R L L R L L R ?
– Previous cases of L R:• Followed by L twice• Followed by R once
Left attack 67%
Right attack 33%
N-Grams and Sequence Size
• Number of statistics to keep grow exponentially with length of past sequences– Number of possible actions = a– Past sequences length = L– Number of possible configurations of past sequences = aL
• L must be small (no more than 2 or 3)
Storing N-Grams
• Algorithm:– Keep statistics for all possible action strings of length L
• Based on “window” of past actions
• Example: L L L L R R L L R L L R
Previous action string
Instances of next action
Probability next action is L
Probability next action is R
L L L L R R R 40% 60%
L R L R 50% 50%
R L L L 100% 0%
R R L 100% 0%
N-Grams and Updating
• When player takes new action– Add instance for that action and update statistics– Remove oldest action from list and update statistics
• Move “window”
• Example: L L L L R R L L R L L R LNew action
Now outside “window”
Previous action string
Instances of next action
Probability next action is L
Probability next action is R
L L L L R R R 25% 75%
L R L R L 67% 33%
R L L L 100% 0%
R R L 100% 0%
Decision Trees
Simple tree traversal•Node = question•Branch to follow = answer•Leaf = final action to take
Can create dynamically from a set of examples
Hit points < 5?
yes no
Obstacle betweenmyself and player?
yes
hide
no
run
Within one unit of player?
yes
attack
no
Path to playerclear?
yes
run towardsplayer
no
runsideways
Decision Tree Learning
• Based on a set of training examples– Attribute values about current situation
• From world or from character state
– Target action character should take on basis of those attribute values
• Example: Estimating driving time– Hour of departure: 8, 9, or 10– Weather: sunny, cloudy or rainy– Accident on route: yes or no– Stalled car on route: yes or no
– Commute time: short, medium, or long
attributes
desired action
Training ExamplesExample Hour Weather Accident Stall Commute
1 8 sunny no no long
2 8 cloudy no yes long
3 10 sunny no no short
4 9 rainy yes no long
5 9 sunny yes yes long
6 10 sunny no no short
7 10 cloudy no no short
8 9 rainy no no medium
9 9 sunny yes no long
10 10 cloudy yes yes long
11 10 rainy no no short
12 8 cloudy yes no long
13 9 sunny no no medium
Decision Tree Building
node BuildTree(examples[ ] E) {
if (all examples in E have same target action T) {
return new leaf with action T
}
else {
choose “best” attribute A to create branches for examples E set question at this node to A
for (all possible values V for attribute A) {
create branch with value V
EV = all examples in E that have value V for attribute A
attach BuildTree(EV ) to that branch }
}
}
Decision Tree Building
• Start tree by using BuildTree(all examples) to create root• Example: Suppose Hour were chosen at root:
Hour
Examples 1 – 13 4 short, 2 medium, 7 long
another question
8
Examples 3, 6, 7, 10, 11 4 short, 0 medium, 1 long
another question
10
Examples 4, 5, 8, 9, 130 short, 2 medium, 3 long
9
Examples 1, 2, 120 short, 0 medium, 3 long
Long
ID3 Decision Tree Learning
• Choose attribute with highest information gain• Based on definition of entropy from information theory
• Entropy over examples E with target actions T:
Entropy(E) = - | ET | log2 | ET | T | E | | E |
• Example: entropy for training set =-(4/13) log2(4/13) -(2/13) log2(2/13) -(7/13) log2(7/13) = 1.42 4 short examples 2 medium examples 7 medium examples
ID3 Decision Tree Learning
• Expected entropy remaining after attribute A used =– Sum of entropies of child nodes down branches V– Weighted by percentage of examples down that branch
| EV | Entropy (EV) V | E |
• Information gain for attribute A = that value subtracted from original entropy
Attribute Entropy Information Gain
hour 0.65 0.77
weather 1.29 0.13
accident 0.92 0.50
stall 1.17 0.25
Best attribute at root
Using ID3 in Games
• ID3 efficient enough for on-line learning– Execution time proportional to number of examples
• Best applied to games if:– Few possible actions for NPC to choose from
(attack, retreat, defend)
– Key attributes have few possible values(or continuous values partitioned into predefined ranges)
– Have easy way to determine desired actions NPC should take based on significant number of player actions
• May require tweaking rules of game!
Black and White Game
• Player given “creature” at beginning of game
• Creature “trained” by player by observing player actions in different situations– What other armies to attack in what
circumstances
• Later in game (after creature grows), creature takes those same actions
Black and White Game
• Sample training set:
Example Allegiance Defense Tribe Attack
1 friendly weak Celtic no
2 enemy weak Celtic yes
3 friendly strong Norse no
4 enemy strong Norse no
5 friendly weak Greek no
6 enemy medium Greek yes
7 enemy strong Greek no
8 enemy medium Aztec yes
9 friendly weak Aztec no
Black and White Game
• Tree created from examples:
Allegiance
Defense
friendly
No attack
enemy
weak
Attack
strong
No attack
medium
Attack
Black and White Game
• Nature of game suited to decision tree learning:– Large number of examples created by player actions– Known target actions based on player actions– Small number of possible actions, attribute values
• Game specifically designed around algorithm!