269202 algorithms for isne

269202 ALGORITHMS FOR ISNEDR. KENNETH COSHWEEK 12

REVIEW Searching

Uninformed Searches Breadth First Depth First Iterative

Informed Searches Best First Heuristics

Hill Climbing Simulated Annealing

THIS WEEK

Adversarial Searches Investigating how to plan ahead in environments where other agents are planning against us.

Game Theory

WHO STOLE THE FINAL EXAM?OK, I know one of you stole the final exam…

You have 3 choices, “Admit it”, “Accuse someone else” or “Deny knowledge”.

If you “Deny Knowledge” you will get a D, unless everyone denies knowledge, when you’ll get a B

If you “Accuse someone” you will get a C, or if everyone else accuses the same person you will get a B, unless they admit it when you get a D.

If you “Admit it” you get an F, unless everyone else accuses you, when you get an A.

GAME THEORY

That example is similar to the famous “Prisoner’s Dilemma” “Two suspects, A and B, are arrested by the police. The police have insufficient evidence

for a conviction, and, having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the full 10-year sentence. If both stay silent, the police can sentence both prisoners to only six months in jail for a minor charge. If each betrays the other, each will receive a two-year sentence. Each prisoner must make the choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what choice the other prisoner will make. So the question this dilemma poses is: What will happen? How will the prisoners act?”

GAME THEORY

A branch of economics investigates any multiagent environment where one agents actions impact on another agent, whether co-operatively or competitively.

Games are deterministic, turn-taking, two-player, zero-sum games with perfect information. In other words a deterministic and fully observable environment with two

agents who alternate actions. Utility values at the end of the game are equal and opposite - the winner of chess receives ‘1’, the loser ‘-1’ (or possibly ‘0’ in stalemate).

GAMES

Games fit well with A.I. as they are easy to represent and agents are restricted to a small number of actions, the results of the actions are well defined. This is in contrast to physical games such as football (or American

Rugby) - here there are imprecise rules and a wide variety of actions. Computer game manufacturers attempts at producing such games often result in limited and repetitive versions.

GAME SUCCESS AI has had some success at game playing;

Machines are better than humans in Othello and checkers.

Sometimes they can win at chess and backgammon. Why not more success?

At Go AI only performs amateurly. The average game of chess has a branching factor of

35, with roughly 50 moves per player - which is 35100 or 10154 nodes on a search graph!

So we have to make a decision, even if it isn’t optimal.

FORMAL DEFINITION Initial State - the initial board position indicating who’s

move.Successor Function - returns a list of moves and their

resulting states.Terminal Test - determines if it is a terminal state, i.e. if

the game is over.Utility Function - An objective function giving a numeric

value for states - terminal states could be -1, 0 or 1.

SEARCHING

From that definition we can create a game tree, which can be searched for optimal moves.

Normal searches produce a sequence of actions leading to a terminal win state, but in games the other agent has an impact.

Therefore the search needs to find a contingent strategy . The optimal strategy is one which finds an outcome at least as good

as any other against an infallible opponent.

GAME TREESX X X X X X X X XInitial Moves (X):

X X X X X X XXO O O O O O O OMove 2 (O): ….

XO XO XO XO XO XOXOX X X X X X X ….Move 3 (X):

…. ….….

XOX XOX XOXXOOX XXX XOOO OOTerminal States:

Utility: -1 0 1

The search tree even for a game as simple as tic tac toe, is v. complex

MINIMAX ALGORITHM Assuming the opponent will play optimally, the minimax algorithm is designed to maximise the

worst case outcome - in tictactoe this is a draw.

In this simple example - which action should A take (a1, a2 or a3)?

B

A

C D

a1 a2 a3

b1 b2b3 c1

c2c3 d1

d2d3

4 7 3 2 12 6 10 11 2

MINIMAX ALGORITHM

Even though the best utility is found with ‘a2’ (12), and the best average is found with ‘a3’ (7.67), ‘a1’ is the choice as we assume that the opponent will choose their optimum strategy.

B

A

C D

a1 a2 a3

b1 b2b3 c1

c2c3 d1

d2d3

3

4 7 3

2

2 12 6

2

10 11 2

MULTIPLAYER GAMES

In games with more than 2 players, the utility is scored as a vector containing utilities for each player (5,2,3). Minimax functions will choose an action which still leads to their best worst case scenario.

In multiplayer games, often ‘alliances’ are formed, with more than one player joining forces in order to stop another player from reaching a terminal win state.

MINIMAX EFFICIENCY

If the maximum depth of the tree is ‘m’ and there are ‘b’ legal moves at each point, then time complexity is O(bm) and space complexity is O(bm), i.e. Exponential time efficiency.

We can improve the efficiency, for instance through Alpha-Beta Pruning; If a player has a better choice ‘m’ either at the parent node of ‘n’ or any node higher up, then ‘n’

will never be reached, so effectively we can prune it.

PRUNING EXAMPLE

First we examine choice ‘a1’ with depth first search, finding the worst case ‘3’.B

A

C D

a1 a2 a3

b1 b2b3 c1

c2c3 d1

d2d3

3

4 7 3

2

2 12 6

2

10 11 2

B

Aa1 a2 a3

b1 b2b3

3

4 7 3

PRUNING EXAMPLE 2

Examining choice ‘a2’, we immediately find a worse worst case ‘2’, so no need to continue searching

B

A

C

a1 a2 a3

b1 b2b3 c1

c2c3

3

4 7 3

2

2

B

A

C D

a1 a2 a3

b1 b2b3 c1

c2c3 d1

d2d3

3

4 7 3

2

2

2

10 11 2

When we look at choice ‘a3’, we have to search all 3 leaves before we find a worse worst case.

PRUNING EFFICIENCY

Pruning choices in this way has dramatic effects on efficiency. We only need to examine O(bd/2) nodes

This is assuming that we can examine successors that are likely to be best first. - You know what ASS U ME means.

Even with random selection, the efficiency improves, however, for many games this time inefficiency makes playing the game impractical - a minimax decision for chess would take longer than the allowed 2 hour game.

IMPERFECT, REAL-TIME DECISIONS

Our algorithm needs to be able to make a decision within a specified time period, based on imperfect information.

Humans use (sub-consciously) a heuristic utility function to judge the value of a position as humans are even more limited in searching capabilities - so a similar heuristic is needed here

I.e. Cut the search off at a non-terminal state.

DESIGNING A HEURISTIC

A heuristic needs to comprise various features, weighted by their importance - perhaps using probabilities of success based on previous experiences (in this case 72% of previous instances have lead to success).

Human experience can be used to guide the heuristic, for example in chess; Material Value - each piece can be given a value, pawn = 1, queen =8 Pawn Structure - a good pawn layout might be worth more. King Safety - a well protected King might be worth a few points. 2 bishops together are worth a little over 2*1bishop. Bishops are more valuable during the end game than at the start.

SEARCH CUT-OFF

Given the sophisticated heuristic function, which can be applied to various states, a cut off point must be chosen so the amount of time used is acceptable.

The wrong cut off point could give rise to what is known as the ‘horizon effect’ - where a poor worst case is just over the horizon of the search.

ALTERNATIVE HORIZONS

An alternative vision of the horizon effect comes from this situation.

A computer playing as black almost inexplicably chose to move e7-e8 - why?

GAMES WITH CHANCE

Some games, such as Backgammon combine chance with skill, where chance is controlled by the roll of a dice. We can’t construct a game tree as we don’t know what our opponents possible moves are - but we

do know what their likely moves are. So the game tree needs to be expanded to include probabilities of the opponent receiving certain

dice rolls.

PROBABILITY

Probability is based on our percepts of the environment - what we know about the environment. My doctor said golf caused my shoulder injury as soon as he knew I played golf - even though the

shoulder injury dates from before I started playing golf. When we pick a card there is a 1/52 chance it is the Ace of Spades, after we look at it, the chance is

either 0 or 1. Probabilities can change when more evidence is acquired.

UTILITY THEORY

Utility Theory represents preferences for certain states - the quality of a state being useful; Is it preferable to choose plan C1440 (leaving for the airport 24 hours early) which has a 99.999%

probability of success over plan A60, which has a 95% probability of success? Given the poor utility of the long wait, perhaps not.

Should I stop playing golf to improve my ‘lack of shoulder pain’ utility, and risk lowering my ‘playing golf pleasure’ utility?

DECISION THEORY

Decision Theory = Probability Theory + Utility Theory. An agent is rational if they choose the action which yields the highest utility averaged

across all possible outcomes of the action. The principle of Maximum Expected Utility (MEU).

ACTING UNDER UNCERTAINTY


Suppose we need to check in at the airport and need to choose a time to set off. Plan A60 involves leaving 60 minutes before check-in - we could assume this was a good plan.

We can’t say “Plan A60 will get us to the airport in time”, only “Plan A60 will get us to the airport in time so long as we have enough gas, and there are no accidents, and the car doesn’t break down, and check-in doesn’t close early, and…”

Perhaps Plan B90 would be better?


While Plan B90 increases the ‘degree of belief’ that we will get to the airport on time, it also introduces a likely long unproductive wait at the airport.

To maximise an agents performance, the relative importance of both goals needs to be considered; ‘getting to the airport’ ‘avoiding a long wait’

The rational decision is A60, but how do we draw that conclusion?

DIAGNOSIS

Diagnosis is a task aimed at dealing with uncertainty normally using probability theory to construct a degree of belief in various statements. Your car won’t start, so there’s a 80% chance the battery is dead. You’ve got pain in your left calf so there’s a 70% chance you’ve pulled a muscle, and a 2% chance

your leg has fallen off. Probability provides a way of summarising the uncertainty that comes from our laziness

and ignorance.

DOMAIN RANDOM VARIABLES

Boolean Random Variables Late<True, False>

Discrete RandomVariables Weather<Sunny, Rainy, Snowy, Cloudy>

Continuous Random Variables Temperature = 30.2

ATOMIC EVENTS

Atomic Events are a particular combination of states; Late = True, Weather = Cloudy Late = False, Weather = Sunny

The existence of certain atomic events can lead to certain understandings; Late = False, Weather = Rainy, means <Rainy => Late> = False

PRIOR PROBABILITY

Unconditional Probabilities can be assigned to each state - degree of belief with no other information; P(Weather = Sunny) = 0.8 P(Weather = Rainy) = 0.1 P(Weather = Cloudy) = 0.0999 P(Weather = Snowy) = 0.0001

JOINT PROBABILITY

The probabilities for a combination of random variables can be stored in a grid - 2 or more dimensional. Take a simple example with 3 boolean variables, Late, Male, AM.

Late Not LateMale Not Male Male Not Male

AM 0.22 0.25 0.08 0.02not AM 0.1 0.18 0.1 0.05

JOINT PROBABILITY

The sum of all probabilities adds up to 1. Suppose we want to know the probability of being male OR late;

P(Male OR Late) = 0.22+0.25+0.1+0.18+0.08+0.1 = 0.93 Or the probability of being male AND late;

P(Male AND Late) = 0.22+0.1 = 0.32 Thus we can deduce probabilities given certain inputs.

JOINT PROBABILITY

Suppose we add a further random variable - the weather. We have to expand our table, in reality adding 4 times the size of the weather, one for each

weather condition. In doing this it is reasonable to ask how the former and latter table are related; how does

P(Late, AM, Male, Weather = sunny) relate to P(Late, AM, Male)?

PRODUCT RULE

We can use the ‘Product Rule’ The probability of a and b is the same as the probability of b multiplied by the

probability of a given that b is the case; P(a^b) = P(a|b)P(b)

Or in our case; P(Late, AM, Male, Weather = sunny) = P(Weather = sunny | Late, AM,

Male)P(Late, AM, Male) The probability of a man being late on a sunny morning is the same as the

probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning.

HANG ON A MINUTE!

Let’s review that; “The probability of a man being late on a sunny morning is the

same as the probability of it being sunny given that a man is late in the morning, multiplied by the probability of a man being late in the morning.”

Unless we are a weathermonger, the probability of it being sunny isn’t influenced by a man’s morning tardiness!

PRODUCT RULE

The Product Rule can be stated in 2 ways; P(a^b) = P(a|b)P(b) P(a^b) = P(b|a)P(a)

Or in our case; P(Late, AM, Male, Weather = sunny) = P(Late, AM, Male | Weather =

sunny)P(Weather = sunny) The probability of a man being late on a sunny morning is the same as the

probability of a man is late in the morning given that it is sunny, multiplied by the probability of it being sunny.

JOINT PROBABILITIES There is intuitively something more satisfactory about this

statement - perhaps the weather could be a factor in determining lateness.

We could expand our knowledge about the domain by adding further random variables - perhaps adding Transport<Car, Bike, Walk> or FavouriteFood<Icecream, Steak, Fish>.

Transport ‘might’ have a direct influence over tardiness - it could be argued that ‘walkers’ should set off earlier, but surely FavouriteFood can be considered ‘Independent’.

When Independence can be found, the subset of data to be analysed can be greatly reduced.

BAYES’ LAW Requires 2 unconditional probabilities and 1 conditional

probability, to calculate 1 further conditional probability! But, if we have those probabilities, then it can be very useful.

Meningitis causes patients to have stiff necks 50% of the time. The probability of having meningitis is 1 in 50,000 and the probability that any patient has a stiff neck is 1 in 20. P(s|m) = 0.5 P(m) = 1/50000 P(s) = 1/20 P(m|s) = P(s|m)P(m)/P(s) = (0.5*(1/50000))/(1/20) = 0.0002 or

1 in 5,000

WELCOME TO WUMPUS WORLD

Wumpus World is a cave consisting of rooms connected by passageways. Somewhere in the cave is the evil Wumpus beast who eats anyone who enters the room. An agent can kill the Wumpus though by shooting an arrow - the arrow goes straight until it hits the Wumpus or a wall. Wumpus World also contains deathly pits in the floor which trap anyone who enters the room (except the Wumpus). On the positive side there is a big heap of gold that can be won by a successful agent. In rooms surrounding the Wumpus the agent can perceive a stench, in rooms surrounding a pit the agent perceives a breeze, and in rooms surrounding the gold the agent perceives a glitter. Agents can move through the 4*4 grid by turning left or right (90 degrees) or walking forward.

WUMPUS WORLD

PIT

PIT

PIT

Breeze

BreezeBreeze

Breeze

BreezeBreeze

GOLD

WUMPUS

Stench

Stench

Stench Glitter

Glitter

Glitter

WUMPUS WORLD

OK

OK

WUMPUS WORLD

Breeze ?P

?POK

WUMPUS WORLD

Breeze P!

Stench

W!

OK

WUMPUS WORLD

Breeze P!

Stench

W! OK?G

OK?GGlitter

ASSIGNMENT – PART 1

To what extent does Bayes’ Law affect the probability of locating a pit? Consider the layout to

the right, what can you deduce about the probability of where the pit is?

Breeze

Breeze

ASSIGNMENT – PART II

Assuming obvious robotic and vision constraints have been solved; how could the approaches to game playing that we’ve discussed today be applied to robot pool. Could a robot win a game of pool?

SUMMARY

Games are an excellent test for A.I. The minimax algorithm assumes the opponent is infallible and chooses the best worst case

scenario for each turn. First developing a search tree and then performing depth first search on it.

The search tree is pruned by alpha beta pruning, but in many cases it is still too large a search tree - so decisions have to be made when to cut off the search.

269202 algorithms for isne

Documents