CS 561: Artificial Intelligence
Instructor: Sofus A. Macskassy, [email protected]: Nadeesha Ranashinghe ([email protected])
William Yeoh ([email protected])Harris Chiu ([email protected])
Lectures: MW 5:00-6:20pm, OHE 122 / DENOffice hours: By appointmentClass page: http://www-rcf.usc.edu/~macskass/CS561-Spring2010/
This class will use http://www.uscden.net/ and class webpage- Up to date information- Lecture notes - Relevant dates, links, etc.
Course material:[AIMA] Artificial Intelligence: A Modern Approach,by Stuart Russell and Peter Norvig. (2nd ed)
Previously: Problem-Solving
Problem solving:
◦ Goal formulation
◦ Problem formulation (states, operators)
◦ Search for solution
Problem formulation:
◦ Initial state
◦ Operators
◦ Goal test
◦ Path cost
Problem types:
◦ single state: accessible and deterministic environment
◦ multiple state: inaccessible and deterministic environment
◦ contingency: inaccessible and nondeterministic environment
◦ exploration: unknown state-space
2CS561 - Lecture 05-06 - Macskassy - Spring 2010
Previously: Finding a solution
Function General-Search(problem, strategy) returns a solution, or failure
initialize the search tree using the initial state problem
loop do
if there are no candidates for expansion then return failure
choose a leaf node for expansion according to strategy
if the node contains a goal state then return the corresponding solution
else expand the node and add resulting nodes to the search tree
end
Solution: is a sequence of operators that bring you from current state to the goal state
Basic idea: offline, systematic exploration of simulated state-space by generating successors of explored states (expanding)
Strategy: The search strategy is determined by the order in which the nodes are expanded.
3CS561 - Lecture 05-06 - Macskassy - Spring 2010
Previously: Evaluation of search strategies
A search strategy is defined by picking the order of node expansion.
Search algorithms are commonly evaluated according to the following four criteria:
◦ Completeness: does it always find a solution if one exists?
◦ Time complexity: how long does it take as function of num. of nodes?
◦ Space complexity: how much memory does it require?
◦ Optimality: does it guarantee the least-cost solution?
Time and space complexity are measured in terms of:
◦ b – max branching factor of the search tree
◦ d – depth of the least-cost solution
◦ m – max depth of the search tree (may be infinity)
4CS561 - Lecture 05-06 - Macskassy - Spring 2010
Previously: Uninformed search strategies
Use only information available in the problem formulation
Breadth-first
Uniform-cost
Depth-first
Depth-limited
Iterative deepening
5CS561 - Lecture 05-06 - Macskassy - Spring 2010
Now: heuristic search [AIMA Ch. 4]
Informed search:
Use heuristics to guide the search◦ Best first
◦ A*
◦ Heuristics
◦ Hill-climbing
◦ Simulated annealing
6CS561 - Lecture 05-06 - Macskassy - Spring 2010
Review: Tree Search
A strategy is defined by picking the order of node expansion
7CS561 - Lecture 05-06 - Macskassy - Spring 2010
Best-first search
Idea: use an evaluation function for each node ◦ estimate of “desirability”
Expand most desirable unexpanded node.
Implementation:
fringe is a queue sorted in decreasing order of
desirability
Special cases:greedy search
A* search
8CS561 - Lecture 05-06 - Macskassy - Spring 2010
Romania with step costs in km
374
329
253
9CS561 - Lecture 05-06 - Macskassy - Spring 2010
Greedy search
Evaluation function h(n) (heuristic)
= estimate of cost from n to closest goal
For example:
hSLD(n) = straight-line distance from n to Bucharest
Greedy search expands first the node that appears to be
closest to the goal, according to h(n).
10CS561 - Lecture 05-06 - Macskassy - Spring 2010
CS561 - Lecture 05-06 - Macskassy - Spring 2010 11
Greedy search example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 12
Greedy search example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 13
Greedy search example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 14
Greedy search example
Properties of Greedy Search
Complete?
No – can get stuck in loops
e.g., Iasi > Neamt > Iasi > Neamt > …
Complete in finite space with
repeated-state checking.
Time?
O(bm) but a good heuristic can give
dramatic improvement
Space?
O(bm) – keeps all nodes in memory
Optimal?
No.
15CS561 - Lecture 05-06 - Macskassy - Spring 2010
A* search
Idea: avoid expanding paths that are already expensive
evaluation function: f(n) = g(n) + h(n)
where:
g(n) – cost so far to reach n
h(n) – estimated cost to goal from n
f(n) – estimated total cost of path through n to goal
A* search uses an admissible heuristic, that is,
h(n) h*(n) where h*(n) is the true cost from n.
For example: hSLD (n) never overestimates actual road
distance.
Theorem: A* search is optimal
16CS561 - Lecture 05-06 - Macskassy - Spring 2010
CS561 - Lecture 05-06 - Macskassy - Spring 2010 17
A* Search Example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 18
A* Search Example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 19
A* Search Example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 20
A* Search Example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 21
A* Search Example
CS561 - Lecture 05-06 - Macskassy - Spring 2010 22
A* Search Example
1
Optimality of A* (standard proof)
Suppose some suboptimal goal G2 has been generated and is in the queue.
Let n be an unexpanded node on a shortest path to an optimal goal G1.
23CS561 - Lecture 05-06 - Macskassy - Spring 2010
Optimality of A* (more useful proof)
24CS561 - Lecture 05-06 - Macskassy - Spring 2010
f-contours
How do the contours look like when h(n) =0?
25CS561 - Lecture 05-06 - Macskassy - Spring 2010
Properties of A*
Complete?
Time?
Space?
Optimal?
Yes, unless there are infinitely many nodes with f f(G)
Exponential in [relative error in h x length of soln.]
Keeps all nodes in memory.
Yes – cannot expand fi+1 until fi is finished.
A* expands all nodes with f(n) < C*A* expands some nodes with f(n) = C*A* expands no nodes with f(n) > C*
26CS561 - Lecture 05-06 - Macskassy - Spring 2010
Proof of lemma: pathmax
27CS561 - Lecture 05-06 - Macskassy - Spring 2010
Proof of lemma: consistency
28CS561 - Lecture 05-06 - Macskassy - Spring 2010
Admissible heuristics
29CS561 - Lecture 05-06 - Macskassy - Spring 2010
CS561 - Lecture 05-06 - Macskassy - Spring 2010 30
Dominance
Relaxed Problems
Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem.
If the rules of the 8-puzzle are relaxed so that a tile can
move anywhere, then h1(n) gives the shortest solution.
If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution.
Key point: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem
31CS561 - Lecture 05-06 - Macskassy - Spring 2010
CS561 - Lecture 05-06 - Macskassy - Spring 2010 32
Relaxes problems (cont’d)
CS561 - Lecture 05-06 - Macskassy - Spring 2010 33
So far…
Heuristic functions estimate costs of shortest paths
Good heuristics can dramatically reduce search cost
Greedy best-first search expands lowest h
◦ incomplete and not always optimal
A* search expands lowest g + h
◦ complete and optimal
◦ also optimally efficient (up to tie-breaks, for forward search)
Admissible heuristics can be derived from exact solution of relaxed problems
CS561 - Lecture 05-06 - Macskassy - Spring 2010 34
… let’s continue
Hill-climbing
Simulated annealing
Genetic algorithms (briefly)
Local search in continuous spaces (very briefly)
Iterative improvement
In many optimization problems, path is irrelevant;the goal state itself is the solution.
Then, state space = space of ―complete‖ configurations.Algorithm goal:
- find optimal configuration (e.g., TSP), or,- find configuration satisfying constraints
(e.g., n-queens)
In such cases, can use iterative improvement algorithms: keep a single ―current‖ state, and try to improve it.
35CS561 - Lecture 05-06 - Macskassy - Spring 2010
Iterative improvement example: Traveling salesperson problem
Start with any complete tour, perform pairwise exchanges
Variants of this approach get within 1% of optimal very quickly with thousands of cities
36CS561 - Lecture 05-06 - Macskassy - Spring 2010
Iterative improvement example: n-queens
Goal: Put n chess-game queens on an n x n board, with no two queens on the same row, column, or diagonal.
Here, goal state is initially unknown but is specified by constraints that it must satisfy.
Move a queen to reduce number of conflicts
Almost always solves n-queens problems almost instantaneously for very large n, e.g., n =1 million
37CS561 - Lecture 05-06 - Macskassy - Spring 2010
Hill climbing (or gradient ascent/descent)
Iteratively maximize ―value‖ of current state, by replacing it by successor state that has highest value, as long as possible.
38CS561 - Lecture 05-06 - Macskassy - Spring 2010
Hill climbing
Note: minimizing a ―value‖ function v(n) is equivalent to maximizing –v(n),
thus both notions are used interchangeably.
Notion of ―extremization‖: find extrema (minima or maxima) of a value function.
39CS561 - Lecture 05-06 - Macskassy - Spring 2010
Hill climbing
Problem: depending on initial state, may get stuck in local extremum.
40CS561 - Lecture 05-06 - Macskassy - Spring 2010
Minimizing energy
Let’s now change the formulation of the problem a bit, so that we can employ new formalism:
- let’s compare our state space to that of a physical system that is subject to natural interactions,
- and let’s compare our value function to the overall potential energy E of the system.
On every updating,
we have DE 0B
C
A
Basin of
Attraction for C
D
E
41CS561 - Lecture 05-06 - Macskassy - Spring 2010
Minimizing energy
Hence the dynamics of the system tend to move E toward a minimum.
We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.
B
C
A
Basin of
Attraction for C
D
E
42CS561 - Lecture 05-06 - Macskassy - Spring 2010
Local Minima Problem
Question: How do you avoid this local minimum?
startingpoint
descenddirection
local minimum
global minimum
barrier to local search
43CS561 - Lecture 05-06 - Macskassy - Spring 2010
Consequences of the Occasional Ascents
Help escaping the local optima.
desired effect
Might pass global optimaafter reaching it
adverse effect
(easy to avoid bykeeping track ofbest-ever state)
44CS561 - Lecture 05-06 - Macskassy - Spring 2010
Boltzmann machines
B
C
A
Basin of
Attraction for C
D
E
h
The Boltzmann Machine of
Hinton, Sejnowski, and Ackley (1984)
uses simulated annealing to escape local minima.
To motivate their solution, consider how one might get a ball-bearing
traveling along the curve to "probably end up" in the deepest minimum.
The idea is to shake the box "about h hard" — then the ball is more
likely to go from D to C than from C to D. So, on average, the ball
should end up in C's valley.
45CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing: basic idea
From current state, pick a random successor state;
If it has better value than current state, then ―accept the transition,‖ that is, use successor state as current state;
Otherwise, do not give up, but instead flip a coin and accept the transition with a given probability (that is lower as the successor is worse).
So we accept to sometimes ―un-optimize‖ the value function a little with a non-zero probability.
46CS561 - Lecture 05-06 - Macskassy - Spring 2010
Boltzmann’s statistical theory of gases
In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states.
The 19th century physicist Ludwig Boltzmanndeveloped a theory that included a probability distribution of temperature (i.e., every small region of the gas had the same kinetic energy).
Hinton, Sejnowski and Ackley’s idea was that this distribution might also be used to describe neural interactions, where low temperature T is replaced by a small noise term T (the neural analog of random thermal motion of molecules). While their results primarily concern optimization using neural networks, the idea is more general.
47CS561 - Lecture 05-06 - Macskassy - Spring 2010
Boltzmann distribution
At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as:
where E(A) and E(B) are the energies associated with states A and B.
)/)(exp(
)/)(exp()()(exp
)(
)(
TAE
TBE
T
BEAE
BP
AP
48CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing
Kirkpatrick et al. 1983:
Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states.
The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy.
He heats the metal, then slowly cools it as he hammers the blade into shape. ◦ If he cools the blade too quickly the metal will form patches
of different composition;
◦ If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.
49CS561 - Lecture 05-06 - Macskassy - Spring 2010
Real annealing: Sword
He heats the metal, then slowly cools it as he hammers the blade into shape. ◦ If he cools the blade too
quickly the metal will form patches of different composition;
◦ If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.
50CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing in practice
- set T
- optimize for given T
- lower T (see Geman & Geman, 1984)
- repeat
51CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing in practice
- set T
- optimize for given T
- lower T
- repeat
MDSA: Molecular Dynamics Simulated Annealing52CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing in practice- set T- optimize for given T- lower T (see Geman & Geman, 1984)- repeat
Geman & Geman (1984): if T is lowered sufficiently slowly (with respect to the number of iterations used to optimize at a given T), simulated annealing is guaranteed to find the global minimum.
Caveat: this algorithm has no end (Geman & Geman’sT decrease schedule is in the 1/log of the number of iterations, so, T will never reach zero), so it may take an infinite amount of time for it to find the global minimum.
53CS561 - Lecture 05-06 - Macskassy - Spring 2010
Simulated annealing algorithm
Idea: Escape local extrema by allowing ―bad moves,‖ but gradually decrease their size and frequency.
Note: goal here is tomaximize E.
54CS561 - Lecture 05-06 - Macskassy - Spring 2010
Note on simulated annealing: limit cases
Boltzmann distribution: accept ―bad move‖ with DE<0 (goal is to maximize E) with probability P(DE) = exp(DE/T)
If T is large: DE < 0
DE/T < 0 and very small
exp(DE/T) close to 1
accept bad move with high probability
If T is near 0: DE < 0
DE/T < 0 and very large
exp(DE/T) close to 0
accept bad move with low probability
Random walk
Deterministicdown-hill
55CS561 - Lecture 05-06 - Macskassy - Spring 2010
CS561 - Lecture 05-06 - Macskassy - Spring 2010 56
Local beam search
Idea: keep k states instead of 1; choose top k of all their successors
Not the same as k searches run in parallel!
Searches that find good states recruit other searches to join them
Problem: quite often, all k states end up on same local hill
Idea: choose k successors randomly, biased towards good ones
Observe the close analogy to natural selection!
CS561 - Lecture 05-06 - Macskassy - Spring 2010 57
Genetic Algorithms
CS561 - Lecture 05-06 - Macskassy - Spring 2010 58
Genetic algorithms (cont’d)
GAs require states encoded as strings (GPs use programs)
Crossover helps iff substrings are meaningful components
GAs ≠ evolution: e.g., real genes encode replication machinery!
CS561 - Lecture 05-06 - Macskassy - Spring 2010 59
Continuous state spaces
CS561 - Lecture 05-06 - Macskassy - Spring 2010 60
Summary
Best-first search = general search, where the minimum-cost nodes (according to some measure) are expanded first.
Greedy search = best-first with the estimated cost to reach the goal as a heuristic measure.
- Generally faster than uninformed search- not optimal- not complete.
A* search = best-first with measure = path cost so far + estimated path cost to goal.
- combines advantages of uniform-cost and greedy searches
- complete, optimal and optimally efficient- space complexity still exponential
CS561 - Lecture 05-06 - Macskassy - Spring 2010 61
Summary
Time complexity of heuristic algorithms depend on quality of heuristic function. Good heuristics can sometimes be constructed by examining the problem definition or by generalizing from experience with the problem class.
Iterative improvement algorithms keep only a single state in memory.
Can get stuck in local extrema; simulated annealing provides a way to escape local extrema, and is complete and optimal given a slow enough cooling schedule.