agents and environments
DESCRIPTION
Agents and environments. Environment types. Fully observable (vs. partially observable): An agent's sensors give it access to the complete state of the environment at each point in time . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/1.jpg)
AGENTS AND ENVIRONMENTS
![Page 2: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/2.jpg)
Environment types Fully observable (vs. partially observable): An
agent's sensors give it access to the complete state of the environment at each point in time.
Deterministic (vs. stochastic): The next state of the environment is completely determined by the current state and the action executed by the agent.
Episodic (vs. sequential): The agent's experience is divided into atomic "episodes" (each episode consists of the agent perceiving and then performing a single action), and the choice of action in each episode depends only on the episode itself.
![Page 3: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/3.jpg)
Environment types Static (vs. dynamic): The environment is
unchanged while an agent is deliberating. Discrete (vs. continuous): A limited number
of distinct, clearly defined percepts and actions.
Single agent (vs. multiagent): An agent operating by itself in an environment.
Adversarial (vs. benign): There is an opponent in the environment who actively trying to thwart you.
![Page 4: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/4.jpg)
Example Some of these descriptions can be ambiguous,
depending on your assumptions and interpretation of the domainContinuo
usStochastic
Partially Observable
Adversarial
Chess, CheckersRobot SoccerPokerHide and SeekCards SolitaireMinesweeper
![Page 5: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/5.jpg)
Environment typesChess with Chess without
Taxi driving a clock a clock
Fully observable Yes Yes No Deterministic Yes Yes No Episodic No No No Static Semi Yes No Discrete Yes Yes NoSingle agent No No No?
The real world is partially observable, stochastic, sequential, dynamic, continuous, multi-agent
![Page 6: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/6.jpg)
GAMES(I.E. ADVERSARIAL
SEARCH)
![Page 7: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/7.jpg)
Games vs. search problems Search: only had to worry about your
actions Games: opponent’s moves are often
interspersed with yours, need to consider opponent’s action
Games typically have time limits Often, an ok decision now is better than a
perfect decision later
![Page 8: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/8.jpg)
Games Card games Strategy games FPS games Training games …
![Page 9: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/9.jpg)
Single Player, Deterministic Games
![Page 10: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/10.jpg)
Two-Player, Deterministic, Zero-Sum Games
Zero-sum: one player’s gain (or loss) of utility is exactly balanced by the losses (or gains) of the utility of other player(s) E.g., chess, checkers, rock-paper-scissors,
…
![Page 11: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/11.jpg)
Two-Player, Deterministic, Zero-Sum Games : the initial state : defines which player has the move in a state : defines the set of legal moves : the transition model that defines the result of
the move : returns true if the game is over. In that case
is called a terminal state. : a utility function (objective function) that
defines the numeric value of the terminal state for player
![Page 12: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/12.jpg)
Minimax
![Page 13: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/13.jpg)
Game tree (2-player, deterministic, turns)
![Page 14: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/14.jpg)
Minimax
![Page 15: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/15.jpg)
Minimax “Perfect play” for deterministic games Idea: choose move to position with highest
minimax value = best achievable payoff against best play
![Page 16: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/16.jpg)
Is minimax optimal? Depends
If opponent is not rational could be a better play
Yes With assumption both players always make
best move
![Page 17: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/17.jpg)
Properties of minimax Complete?
Yes (if tree is finite)
Space complexity? O(bd) (depth-first exploration)
Optimal? Yes (against an optimal opponent)
Time complexity? O(bd)
For chess, b ≈ 35, d ≈100 for "reasonable" games
≈ 10154
exact solution completely infeasible
![Page 18: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/18.jpg)
How to handle suboptimal opponents?
Can build model of opponent behavior Use that to guide search rather than MIN
Reinforcement learning (later in the semester) provides another approach
![Page 19: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/19.jpg)
α-β pruning Do we need to explore every node in the
search tree?
Insight: some moves are clearly bad choices
![Page 20: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/20.jpg)
α-β pruning example
![Page 21: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/21.jpg)
α-β pruning example
![Page 22: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/22.jpg)
What is the value of this node?
![Page 23: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/23.jpg)
And this one?
![Page 24: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/24.jpg)
First option is worth 3, so root is at least that good
![Page 25: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/25.jpg)
Now consider the second option
![Page 26: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/26.jpg)
What is this node worth?
![Page 27: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/27.jpg)
At most 2
![Page 28: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/28.jpg)
But, what if we had these values?
1 99
It doesn’t matter, they won’t make any difference so don’t look at them.
![Page 29: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/29.jpg)
α-β pruning example
![Page 30: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/30.jpg)
α-β pruning example
![Page 31: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/31.jpg)
α-β pruning example
![Page 32: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/32.jpg)
Why didn’t we check this node first?
![Page 33: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/33.jpg)
Properties of α-β Pruning does not affect final result
i.e. returns the same best move (caveat: only if can search entire tree!)
Good move ordering improves effectiveness of pruning
With "perfect ordering," time complexity = O(bm/2) Can come close in practice with various heuristics
![Page 34: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/34.jpg)
Bounding search Similar to depth-limited search:
Don’t have to search to a terminal state, search to some depth instead
Find some way of evaluating non-terminal states
![Page 35: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/35.jpg)
Evaluation function Way of estimating how good a position is
Humans consider (relatively) few moves and don’t search very deep
But they can play many games well evaluation function is key
A LOT of possibilities for the evaluation function
![Page 36: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/36.jpg)
A simple function for chess White = 9 * # queens + 5 *# rooks + 3 *
# bishops + 3 * # knights + # pawns Black= 9 * # queens + 5 *# rooks + 3 *
# bishops + 3 * # knights + # pawns
Utility= White - Black
![Page 37: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/37.jpg)
Other ways of evaluating a game position?
Features: Spaces you control How compressed your pieces are Threat-To-You – Threat-To-Opponent How much does it restrict opponent options
![Page 38: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/38.jpg)
Interesting ordering
Game Branching factor Computer qualityGo 360 << human
Chess 35 ≈ humanOthello 10 >> human
![Page 39: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/39.jpg)
Implications
Game Branching factor Computer qualityGo 360 << human
Chess 35 ≈ humanOthello 10 >> human
• Larger branching factor (relatively) harder for computers
• People rely more on evaluation function than on search
![Page 40: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/40.jpg)
Deterministic games in practice
Othello: human champions refuse to compete against computers, who are too good.
Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997.
Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. In 2007 developers announced that the program has been improved to the point where it cannot lose a game.
Go: human champions refuse to compete against computers, who are too bad.
![Page 41: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/41.jpg)
More on checkers
Checkers has a branching factor of 10 Why isn’t the result like Othello?
Complexity of imagining moves: a move can change a lot of board positions A limitation that does not affect computers
Game Branching factor Computer qualityGo 360 << human
Chess 35 ≈ humanOthello 10 >> human
![Page 42: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/42.jpg)
Summary Games are a core (fun) part of AI
Illustrate several important points about AI Provide good visuals and demos
Turn-based games (that can fit in memory) are well addressed
Make many assumptions (optimal opponent, turn-based, no alliances, etc.)
![Page 43: Agents and environments](https://reader031.vdocuments.site/reader031/viewer/2022020921/56815d64550346895dcb6c6d/html5/thumbnails/43.jpg)
Questions?