artificial intelligence csc 361commercials. write down a markov chain that describes of what is...

13
ARTIFICIAL INTELLIGENCE CSC 361 Tutorial#6

Upload: others

Post on 29-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

ARTIFICIAL INTELLIGENCE CSC 361

Tutorial#6

EXERCISE#1

Suppose there are only four items on TV: Good Shows, Bad Shows, Good Show Commercials and Bad Show Commercials. Write down a Markov Chain that describes of what is available on TV. Assume that we have the following transition matrix: (Pij denotes the probability of going from i to j):

3

First we label the states: •  1. Good show. •  2. Good show Ad. •  3. Bad show. •  4. Bad show Ad.

4

EXERCISE#2

In micro-blackjack, you repeatedly draw a card (with replacement) that is equally likely to be a 2, 3, or 4. You can either Draw or Stop if the total score of the cards you have drawn is less than 6. Otherwise, you must Stop. When you Stop, your utility is equal to your total score (up to 5), or zero if you get a total of 6 or higher. When you Draw, you receive no utility. There is no discount. What are the states and the actions for this MDP? • The state is the current sum of your cards, plus a terminal

state: 0, 2, 3, 4, 5, Done. • The actions are: {Draw, Stop}.

6

What is the transition function and the reward function for this MDP? The transition function is: • T(s,Stop,Done)=1 • T(s,Draw,s`) = {

•  1/3 if s=2 and s`=Done. •  2/3 if s=3 and s`=Done. •  1 if s= 4 or 5 and s`=Done.

•  0 otherwise. } The reward function is • R(s,Stop,s`)= s; s<=5 • R(s,a,s`)=0 otherwise

7

EXERCISE#3

Consider the following MDP: Apply Value Iteration until convergence

9

Value Iteration (Cont.)

10

Value Iteration (Cont.)

11

Value Iteration (Cont.)

12

QUESTIONS..