15-780: grad ai lecture 7: optimization, gamesggordon/780-fall07/lectures/07... · 15-780: grad ai...

102
15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff Hollinger, Henry Lin

Upload: vandung

Post on 09-Sep-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

15-780: Grad AILecture 7: Optimization, Games

Geoff Gordon (this lecture)Ziv Bar-Joseph

TAs Geoff Hollinger, Henry Lin

Page 2: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Admin

Questions on HW1?

2

Page 3: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Review3

Page 4: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Linear and PO planners

Linear plannersforward and backward chaining

Partial-order planningaction orderings, open preconditions, guard intervals, plan refinement

Monkey & bananas example

4

Page 5: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Plan graphs

Plan graphs for propositional planningHow to build them

mutex conditions for literals, actionsHow to use them

direct search, conversion to SAT

5

Page 6: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Optimization & Search

Classes of optimization problemLP, ILP, MILPlinear constraints, objective, integrality

Using search for optimizationpruning w/ lower bounds on objectivestopping early w/ upper bounds

6

Page 7: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Relaxation

Relaxation = increase feasible regionGood way to get upper bounds on maxParticularly, LP relaxation of an ILPAnd its dual

7

Page 8: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality

How to find dual of an LP or ILPInterpretations of dual

linearly combine constraints to get a new constraint orthogonal to objectivefind best prices for scarce resources

8

Page 9: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality w/ equality

9

Page 10: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Recall duality w/ inequality

Take a linear combination of constraints to bound objective(a + 2b) w + (a + 5b) d ≤ 4a + 12bprofit = 1w + 2dSo, if 1 ≤ (a + 2b) and 2 ≤ (a + 5b), we know that profit ≤ 4a + 12b

10

Page 11: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Equality example

minimize y subject tox + y = 12y – z = 1x, y, z ≥ 0

11

Page 12: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Equality example

Want to prove bound y ≥ …Look at 2nd constraint:

2y – z = 1 ⇒

y – z/2 = 1/2Since z ≥ 0, dropping –z/2 can only increase LHS ⇒

y ≥ 1/212

Page 13: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality w/ equalities

In general, could start from any linear combination of equality constraints

no need to restrict to +ve combinationa (x + y – 1) + b (2y – z – 1) = 0a x + (a + 2b) y – b z = a + b

13

Page 14: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality w/ equalities

a x + (a + 2b) y – b z = a + bAs long as coefficients on LHS ≤ (0, 1, 0),

objective = 0 x + 1 y + 0 z ≥ a + bSo, maximize a + b subject to

a ≤ 0a + 2b ≤ 1–b ≤ 0

14

Page 15: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality example

15

Page 16: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Path planning LP

Find the min-cost path: variables

16

Page 17: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Path planning LP

17

Page 18: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Optimal solution

psy = pyg = 1, psx = pxg = 0, cost 3

18

Page 19: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Matrix form

19

λs

λx

λy

λg

Page 20: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Dual

20

Page 21: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Optimal dual solution

0

1

3

3

Any solution which adds a constant to all λs also works; λx = 2 also works

21

Page 22: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Interpretation

Dual variables are prices on nodes: how much does it cost to start there?Dual constraints are local price constraints: edge xg (cost 3) means that node x can’t cost more than 3 + price of node g

22

Page 23: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Search in ILPs

23

Page 24: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Simple search algorithm

Run DFSnode = partial assignmentneighbor = set one variable

Prune if a constraint is unsatisfiableE.g., in 0/1 prob, setting y = 0 in x + 3y ≥ 4

If we reach a feasible full assignment, calculate its value, keep best

24

Page 25: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

More pruning

Constraint from best solution so far: objective ≥ M (for maximization problem)Constraint from optimal dual solution: objective ≤ MCan we find more pruning to do?

25

Page 26: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

First idea

Analog of constraint propagation or unit resolutionWhen we set x, check constraints w/ x in them to see if they restrict the domain of another variable yE.g., setting x to 1 in implication constraint (1–x) + y ≥ 1

26

Page 27: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Example

0/1 variables x, y, zmaximize x subject to

2x + 2y – z ≤ 22x – y + z ≤ 2–x + 2y – z ≤ 0

27

Page 28: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Problem w/ constraint propagation

Constraint propagation doesn’t prune as early as it could:

2x + 2y – z ≤ 22x – y + z ≤ 2–x + 2y – z ≤ 0

Consider z = 1

2x + 2y ≤ 3

2x – y ≤ 1

x – 2y ≥ –1

28

Page 29: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Generalizing constraint propagation

Try adding two constraints, then propagating

2x + 2y ≤ 3(2x – y ≤ 1) * 26x ≤ 5

⇒ objective = x = 0

2x + 2y ≤ 3

2x – y ≤ 1

x – 2y ≥ –1

29

Page 30: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Using the dual

We just applied the duality trick to the LP after fixing z = 1Used a linear combination of two constraints to get a bound on the objectiveLeads to an algorithm called branch and bound

30

Page 31: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Branch and bound

Each time we fix a variable, solve the resulting LPGives a tighter upper bound on value of objective in this branchIf this upper bound < value of a previous solution, we can pruneCalled fathoming the branch

31

Page 32: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Can we do more?

Yes: we can make bounds tighter by looking at the…

32

Page 33: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality gap33

Page 34: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Factory LP

Widgets →

Doo

dads

→Ca: w + d ≤ 4

Cb: 2w + 5d ≤ 12

(1/3) Ca + (1/3) Cb34

Page 35: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Duality gap

We got bound of 5 1/3 either from primal LP relaxation or from dual LPCompare to actual best profit of 5 (respecting integrality constraints)Difference of 1/3 is duality gap

Term is also used for ratio 5 / (5 1/3)Pretty close to optimal, right?

35

Page 36: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Unfortunately…

Widgets →

Doo

dads

→ profit = w + 2d

36

Page 37: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Bad gap

In this example, duality gap is 3 vs 8.5, or about a ratio of 0.35Ratio can be arbitrarily bad

37

Page 38: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Aside: bounding the gap

Can often bound gap for classes of ILPsE.g., straightforward ILP from MAX SAT

MAX SAT: satisfy as many clauses as possible in a CNF formula

Gap no worse than 1–1/e = 0.632…

38

Page 39: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Early stopping

A duality gap this large won’t let us prune or stop our search earlyTo fix this problem: cutting planes

39

Page 40: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Cutting plane

A cutting plane is a new linear constraint that

cuts off some of the non-integral points in the LP relaxation while leaving all integral points feasible

40

Page 41: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Cutting plane

Widgets →

Doo

dads

constraint from dual optimum

cutting plane

41

Page 42: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Cutting plane method

Solve the LP relaxationUse solution to find a cutting planeAdd cutting plane to LP

LP is now a stronger relaxationRepeat

until solution to LP is integral

42

Page 43: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

How can we find a cutting plane?

One suggestion: Gomory cutsR. E. Gomory, 1963

First to guarantee finite termination of cutting plane methodExample above was a Gomory cut

43

Page 44: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Gomory cut example

A linear combination of constraints:w + 2d ≤ 5 1/3

Since w, d are integers, so is w + 2dSo we also have

w + 2d ≤ 5Can (but won’t) generalize recipe

44

Page 45: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Cutting planes

How good is the Gomory cut in general?Sadly, not so great.Other general cuts have been proposed, but best cuts are often problem-specific

45

Page 46: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Branch and Cut

46

Page 47: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Branch and cut

Cutting planes recipe doesn’t use branchingWhat if we try to interleave search with cut generation?Resulting branch and cut methods are some of the most popular algorithms for solving ILPs and MILPs

47

Page 48: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Recipe

DFS as for branch and boundEvery so often, solve LP relaxation

prune if bound shows branch uselesswhile not bored, use solution to generate cut, re-solve

Branch on next variable, repeat

48

Page 49: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Tension of cutting v. branching

After a branch it may become easier to generate more cuts

so easier as we go down the treeCuts at a node N are valid at N’s children

so it’s worth spending more effort higher in the search tree

49

Page 50: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Example: robot task assignment

Team of robots must explore unknown area50

Page 51: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Points of interest

Base

51

Page 52: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Exploration plan

52

Page 53: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

ILP

Variables (all 0/1):zij = robot i does task jxijkt = robot i uses edge jk at step t

Cost = path cost – task bonus∑ xijkt cijkt - ∑ zij tij

53

Page 54: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Constraints

Assigned tasks: ∀i, j, ∑kt xikjt ≥ zij

One edge per step: ∀i, t, ∑jk xijkt = 1

self-loops @ base to allow idlingFor each i, xijkt forms a tour from base:∀i, j, t, ∑k xikjt = ∑k xijk(t+1)

edges used into node = edges used out

54

Page 55: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

More on duality, search, optimization

55

Page 56: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

General optimization

minimize f(x) over region defined by piecesgi(x) = 0 or gi(x) ≤ 0

assume f(x) convex, so difficulty is g56

Page 57: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Minimization

Unconstrained: set ∇f(x) = 0

E.g., minimize f(x, y) = x2 + y2 + 6x – 4y + 5

∇f(x, y) = (2x + 6, 2y – 4)

(x, y) = (–3, 2)

57

Page 58: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Equality constraints

Equality constraint:minimize f(x) s.t. g(x) = 0

can’t just set ∇f = 0 (might violate g(x) = 0)

Instead, objective gradient should be along constraint normal

any motion that decreases objective will violate the constraint

58

Page 59: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Example

Minimize x2 + y2 subject to x + y = 2

Lagrange Multipliers

Lagrange multipliers are a way to solve constrained optimization problems.

For example, suppose we want to minimize the function

f !x, y" ! x2 " y2

subject to the constraint

0 ! g!x, y" ! x " y# 2

Here are the constraint surface, the contours of f , and the solution.

lp.nb 3

59

Page 60: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Lagrange multipliers

Minimize f(x) s.t. g(x) = 0Constraint normal is ∇g

(1, 1) in our exampleWant ∇f parallel to ∇g

Equivalently, want ∇f = λ∇gλ is a Lagrange multiplier

Lagrange Multipliers

Lagrange multipliers are a way to solve constrained optimization problems.

For example, suppose we want to minimize the function

f !x, y" ! x2 " y2

subject to the constraint

0 ! g!x, y" ! x " y# 2

Here are the constraint surface, the contours of f , and the solution.

lp.nb 3

60

Page 61: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Lagrange multipliers

Original constraint: x + y = 2∇f = λ∇g: (2x, 2y) = λ(1, 1)

x + y = 22x = λ

2y = λ

61

Page 62: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

More than one constraint

With multiple constraints, use multiple multipliers:

min x2 + y2 + z2 st x + y = 2x + z = 2

(2x, 2y, 2z) = λ(1, 1, 0) + µ(1, 0, 1)

62

Page 63: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

5 equations, 5 unknowns

x + y = 2x + z = 2

2x = λ + µ

2y = λ

2z = µ

63

Page 64: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Two constraints: the picture

Multiple Constraints: the Picture

The solution to the above equations is

!p ! "4#####3

, q ! "4#####3

, x ! 4#####3

, y! 2#####3

, z ! 2#####3"

Here are the two constraints, together with a level surface of the objective

function. Neither constraint is tangent to the level surface; instead, the

normal to the level surface is a linear combination of the normals to the

two constraint surfaces (with coefficients p and q).

lp.nb 7

64

Page 65: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

What about inequalities?

If minimum is in interior, can get it by setting ∇f = 0

65

Page 66: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

What about inequalities?

If minimum is on boundary, treat as if boundary were an equality constraint (use Lagrange multiplier)

66

Page 67: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

What about inequalities?

Minimum could be at a corner: two boundary constraints activeIn n dims, up to n linear inequalities may be active (more if degenerate) 67

Page 68: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Search

So, a strategy for solving problems with inequality constraints: search through sets of constraints that might be activeFor each active set, solve linear system of equations, get a possible solutionTest whether solution is feasibleIf so, record objective value

68

Page 69: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Search

Search space: node = active set of constraints

corresponds to a setting of variables (solve linear system)

objective = as given, plus penalty for constraint violationsneighbor = add, delete, or swap constraints

69

Page 70: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Connection to duality

Linear combination of constraint normals = gradient of objective

70

Page 71: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Connection to duality

Each active set defines Lagrange multipliers λ

active set G(x) = 0∇f = ∇G λ

Multipliers at optimal solution are optimal dual solution

71

Page 72: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

LPs and Simplex

72

Page 73: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Back to LP

Widgets →

Doo

dads

→w + d ≤ 4

2w + 5d ≤ 12

profit = w + 2d

feasible

73

Page 74: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Back to LP

Widgets →

Doo

dads

74

Page 75: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Back to LP

Widgets →

Doo

dads

75

Page 76: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Back to LP

Widgets →

Doo

dads

76

Page 77: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Simplex

Objective increased monotonically throughout searchTurns out, this is always possible—leads to a lot of pruning!We have just defined the simplex algorithm

77

Page 78: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Connection to duality

Each active set defines Lagrange multipliers

min c’x s.t. Ax = b (A, b = active set)∇(c’x) = c

∇(Ax - b) = A’So, A’ λ = c

Multipliers at optimal solution are optimal dual solution

78

Page 79: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Game search79

Page 80: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Games

We will consider games like checkers and chess:

sequentialzero-sumdeterministic, alternating movescomplete information

Generalizations later80

Page 81: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Chess

Classic AI challenge problemIn late ’90s, Deep Blue became first computer to beat reigning human champion 81

Page 82: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

History:

Minimax with heuristic: 1950

Learning the heuristic: 1950s (Samuels’ checkers)

Alpha-beta pruning: 1966

Transposition tables: 1967 (hash table to find dups)

Quiescence: 1960s

DFID: 1975

End-game databases: 1977 (all 5-piece and some 6)

Opening books: ?

82

Page 83: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Game tree

83

Page 84: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

84

Page 85: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Game tree for chess

85

Page 86: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Minimax search

For small games, we can determine the value of each node in the game tree by working backwards from the leavesMy move: node’s value is maximum over childrenOpponent move: value is minimum over children

86

Page 87: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Minimax example: 2x2 tic-tac-toe

87

11

Page 88: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Synthetic example

88

Page 89: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Principal variation

89

Page 90: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Making it work

Minimax is all well and good for small gamesBut what about bigger ones? 2 answers:

cutting off search early (big win)pruning (smaller win but still useful)

90

Page 91: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Heuristics

Quickly and approximately evaluate a position without searchE.g., Q = 9, R = 5, B = N = 3, P = 1Build out game tree as far as we can, use heuristic at leaves in lieu of real value

might want to build it out unevenly (more below)

91

Page 92: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Heuristics

Deep Blue used: materiel, mobility, king position, center control, open file for rook, paired bishops/rooks, … (> 6000 total features!)Weights are context dependent, learned from DB of grandmaster games then hand tweaked

92

Page 93: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Pruning

Idea: don’t bother looking at parts of the tree we can prove are irrelevant

93

Page 94: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Pruning example

94

Page 95: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Pruning example

95

Page 96: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Alpha-beta pruning

Do a DFS through game treeAt each node n on stack, keep boundsα(n): value of best deviation so far for MAX along path to nβ(n): value of best deviation so far for MIN along path to n

96

Page 97: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Alpha-beta pruning

Deviation = way of leaving the path to nSo, to get α,

take all MAX nodes on path to nlook at all their children that we’ve finished evaluatingbest (highest) of these children is α

Lowest of children of MIN nodes is β97

Page 98: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Example of alpha and beta

98

Page 99: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Alpha-beta pruning

At max node:receive α and β values from parent

expand children one by oneupdate α as we go

if α ever gets higher than β, stopwon’t ever reach this node (return α)

99

Page 100: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Alpha-beta pruning

At min node:receive α and β values from parent

expand children one by oneupdate β as we go

if β ever gets lower than α, stopwon’t ever reach this node (return β)

100

Page 101: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

Example

101

Page 102: 15-780: Grad AI Lecture 7: Optimization, Gamesggordon/780-fall07/lectures/07... · 15-780: Grad AI Lecture 7: Optimization, Games Geoff Gordon (this lecture) Ziv Bar-Joseph TAs Geoff

How much do we save?

Original tree: bd nodesb = branching factord = depth

If we expand children in random order, pruning will touch b3d/4 nodesLower bound (best node first): bd/2

Can often get close to lower bound w/ move ordering heuristics

102