artificial intelligence - ucsbyuxiangw/classes/cs165a...artificial intelligence cs 165a mar 7, 2019...
TRANSCRIPT
Artificial IntelligenceCS 165A
Mar 7, 2019
Instructor: Prof. Yu-Xiang Wang
® Reinforcement Learning® Logic
1
Announcement
• HW4 will be released by midnight
• In the TA’s discussion class this Friday– MP1
– MDP examples
• Next Thursday (Mar 14)– HW4 due
– MP2 due
– Review lecture
2
Recap: Tabular MDP
• State, Action, Reward and Observation
• Policy:
• Max expected (cumulative) reward– Finite horizon (episodic) RL:
– Infinite horizon RL:
• Parameters of a tabular MDP:– Transition dynamics P(s’| s,a)– Expected immediate reward E[r | s,a]– Initial state distribution P(s) 3
⇡ : S ! A<latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit><latexit sha1_base64="JSR2KBCrB1Pfm6GeQfT6grf0tX8=">AAACEnicbZDLSsNAFIYnXmu9RV26GSyCbkoiguKq6sZlRXuBJpTJdNIOncyEmYlSQp7Bja/ixoUibl25822ctAG19YeBj/+cw5zzBzGjSjvOlzU3v7C4tFxaKa+urW9s2lvbTSUSiUkDCyZkO0CKMMpJQ1PNSDuWBEUBI61geJnXW3dEKir4rR7FxI9Qn9OQYqSN1bUPvZieQehFSA8wYulNBj1J+wONpBT3P/551rUrTtUZC86CW0AFFKp37U+vJ3ASEa4xQ0p1XCfWfoqkppiRrOwlisQID1GfdAxyFBHlp+OTMrhvnB4MhTSPazh2f0+kKFJqFAWmM19RTddy879aJ9HhqZ9SHieacDz5KEwY1ALm+cAelQRrNjKAsKRmV4gHSCKsTYplE4I7ffIsNI+qruHr40rtooijBHbBHjgALjgBNXAF6qABMHgAT+AFvFqP1rP1Zr1PWuesYmYH/JH18Q0lz53J</latexit>
St 2 S<latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit><latexit sha1_base64="H4OVRyT8Zmoun872yVAuceDZNfk=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY9FLx4rtbXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbXV+7TCA3wnpCMM+6uV9v2i17DrRKnJI0oUTHr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCB3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8f6YGUWQ==</latexit>
At 2 A<latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit><latexit sha1_base64="xX32X1fWfQeu2hPnv8gbSWA79Eo=">AAAB+3icbVBNS8NAFHypX7V+1Xr0slgETyURQY+tXjxWsLXQhLDZbtqlm03Y3Ygl5K948aCIV/+IN/+NmzYHbR1YGGbe481OkHCmtG1/W5W19Y3Nrep2bWd3b/+gftjoqziVhPZIzGM5CLCinAna00xzOkgkxVHA6UMwvSn8h0cqFYvFvZ4l1IvwWLCQEayN5NcbHV+7TCA3wnpCMM86uV9v2i17DrRKnJI0oUTXr3+5o5ikERWacKzU0LET7WVYakY4zWtuqmiCyRSP6dBQgSOqvGyePUenRhmhMJbmCY3m6u+NDEdKzaLATBYR1bJXiP95w1SHV17GRJJqKsjiUJhypGNUFIFGTFKi+cwQTCQzWRGZYImJNnXVTAnO8pdXSf+85Rh+d9FsX5d1VOEYTuAMHLiENtxCF3pA4Ame4RXerNx6sd6tj8VoxSp3juAPrM8fsa2UNQ==</latexit>
Rt 2 R<latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit><latexit sha1_base64="mcKcKCPJb1sMOgc9hprL//Z0AUs=">AAAB+nicbVDLSsNAFL2pr1pfqS7dDBbBVUlE0GXRjcta7APaECbTSTt0MgkzE6XEfoobF4q49Uvc+TdO2iy09cDA4Zx7uWdOkHCmtON8W6W19Y3NrfJ2ZWd3b//Arh52VJxKQtsk5rHsBVhRzgRta6Y57SWS4ijgtBtMbnK/+0ClYrG419OEehEeCRYygrWRfLva8vWACTSIsB4HQdaa+XbNqTtzoFXiFqQGBZq+/TUYxiSNqNCEY6X6rpNoL8NSM8LprDJIFU0wmeAR7RsqcESVl82jz9CpUYYojKV5QqO5+nsjw5FS0ygwk3lCtezl4n9eP9XhlZcxkaSaCrI4FKYc6RjlPaAhk5RoPjUEE8lMVkTGWGKiTVsVU4K7/OVV0jmvu4bfXdQa10UdZTiGEzgDFy6hAbfQhDYQeIRneIU368l6sd6tj8VoySp2juAPrM8fFw2T4Q==</latexit>
⇡⇤ = argmax⇡2⇧
E[1X
t=1
�t�1Rt]<latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit><latexit sha1_base64="YlLbWUf0D4dk67J+wTbjs5rWhLg=">AAACOHicbVDLShxBFK3W+Bpfo1m6KTIIIijdIuhGkAQhu4who8J0T3O7pnosrKpuqm6LQ9Gf5cbPcBeyyUIJ2foFqRlnER8HCg7n3Mutc7JSCoth+DOYmv4wMzs3v9BYXFpeWW2urZ/ZojKMd1ghC3ORgeVSaN5BgZJflIaDyiQ/z66+jPzza26sKPQPHJY8UTDQIhcM0Etp81tcit42PaIxmEGs4CZ1XomFjtuipl7AyyxzJ3WXxrZSqcOjqO457+c49P4AlIKew52opt9TpEnabIW74Rj0LYkmpEUmaKfN+7hfsEpxjUyCtd0oLDFxYFAwyetGXFleAruCAe96qkFxm7hx8JpueqVP88L4p5GO1f83HChrhyrzk6Mk9rU3Et/zuhXmh4kTuqyQa/Z8KK8kxYKOWqR9YThDOfQEmBH+r5RdggGGvuuGLyF6HfktOdvbjTw/3W8df57UMU82yCeyRSJyQI7JV9ImHcLILflFHshjcBf8Dv4Ef59Hp4LJzkfyAsHTP4yErNg=</latexit>
⇡⇤ = argmax⇡2⇧
E[TX
t=1
Rt]<latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit><latexit sha1_base64="TjJh8rvPqqmVSrTZo42NEAGcvKs=">AAACJXicbZDNSsNAFIUn/tb6V3XpZrAI4kISEXRhoSiCyypWC0kaJtNpHTqZhJkbsYS8jBtfxY0LiwiufBUntQttPTBw+O69zL0nTATXYNuf1szs3PzCYmmpvLyyurZe2di81XGqKGvSWMSqFRLNBJesCRwEayWKkSgU7C7snxf1uwemNI/lDQwS5kekJ3mXUwIGBZVTL+HtfVzDHlE9LyKPQWaIx6XX4Dk2AO7DMLvIXezpNAoyqDl5O7vJrwPAflCp2gf2SHjaOGNTRWM1gsrQ68Q0jZgEKojWrmMn4GdEAaeC5WUv1SwhtE96zDVWkohpPxtdmeNdQzq4GyvzJOAR/T2RkUjrQRSazmJtPVkr4H81N4XuiZ9xmaTAJP35qJsKDDEuIsMdrhgFMTCGUMXNrpjeE0UomGDLJgRn8uRpc3t44Bh/dVStn43jKKFttIP2kIOOUR1dogZqIoqe0At6Q0Pr2Xq13q2Pn9YZazyzhf7I+voGRi6lIQ==</latexit>
Recap: Value functions
• State value function: Vp(s)– expected return when starting in s and following p
• State-action value function: Qp(s,a)– expected return when starting in s, performing a, and following p
• Bellman equation
• Bellman optimality equation
4
Recap: Summary of RL algorithms
• Model-based:– Policy iteration / Value iteration– Global optimal solution (need to know the dynamics)
• Model-free: (no need to “explicitly” estimate dynamics)– Monte Carlo methods (Estimate only Q functions)– SARSA, Q-learning (Bootstrap so as to be more efficient)– Global optimal solution (when the model’s correctly specified)
• Absolutely model-free (do not even need MDP model)– Policy gradient– Local search, local optimal solution
5
Recap: Alpha-Go!
• Parameterize the policynetworks with CNN
• Supervised learninginitialization
• RL using Policy gradient• Fit Value Network (This is aheuristic function!)
• Monte-Carlo Tree Search
6
https://www.youtube.com/watch?v=4D5yGiYe8p4
D. Silver. Mastering the game of Go with Deep Neural Networks and Tree Search. Nature, vol. 529 issue 7587
The final lecture series on “logic”
• So far:– Reflex agents (Supervised learning)– Problem solving / planning / game solving agents (Search)– Utility-based agents (RL)
• They can:– Quantify uncertainty– Make rational decisions– Learn from experience
• What’s missing?– Knowledge, reasoning, logical deduction
7
Why do we care?
• Minesweeper
8
• Imagine how you would solve this?
• Imagine how an RL agent wouldsolve this?
Knowledge Base:• Encode the rules.• Encode the observations so far.• TELL operation: add evidence.• ASK operation: check if a tile has amine under it, or not, orundetermined.
9
Knowledge and reasoning
• We want powerful methods for – Representing Knowledge – general methods for representing facts
about the world and how to act in world– Carrying out Reasoning – general methods for deducing additional
information and a course of action to achieve goals– Focus on knowledge and reasoning rather than states and search
• Note that search is still critical
• This brings us to the idea of logic, but….
Example
• A certain country is inhabited by people who always tell the truth or always tell lies and who will respond only to yes/no questions.A tourist comes to a fork in the road where one branch leads to a restaurant and one does not.
No sign indicating which branch to take, but thereis an inhabitant Mr. X standing on the road.
With a single yes/no question, can the hungry tourist askto find the way to the restaurant?
10
Example (cont.)
• Answer: Is exactly one of the following true:1. you always tell the truth2. the restaurant is to the left
• Truth Table:X is truth teller; restaurant is to left; responsetrue; true; notrue; false; yesfalse; true; nofalse; false; yes
11
Another Example
• Bob looks at Alice. Alice looks at George.Bob is married. George is unmarried.Does a married person ever look at an unmarried one;yes, no, cannot be determined?
12
Another Example (cont.)
• Amarried or ~AmarriedBlooksA and AlooksG
BlooksA ^ AlooksG =BlooksA ^ AlooksG ^ Amarried or BlooksA ^ AlooksG ^ ~Amarried
• Case 1: Amarried = true, then BlooksA ^ AlooksG ^ Amarried satisfies conclusion
Case 2: Amarried = false, then BlooksA ^ AlooksG ^ ~Amarried satisfies conclusion
13
14
Logic – the basis of reasoning
• Logics are formal languages for representing information such that conclusions can be drawn– I.e., to support reasoning
• Syntax defines the sentences in the language– Allowable symbols– Rules for constructing grammatically correct sentences
• Semantics defines the meaning of sentences – Correspondence (isomorphism) between sentences and facts in the
world– Gives an interpretation to the sentence– Sentences can be true or false with respect to the semantics
• Proof theory specifies the reasoning steps that are sound (the inference procedure)
15
Knowledge-based agents
• A knowledge-based agent uses reasoning based on priorand acquired knowledge in order to achieve its goals
• Two important components:– Knowledge Base (KB)
• Represents facts about the world (the agent’s environment)– Fact = “sentence” in a particular knowledge representation
language (KRL)• KB = set of sentences in the KRL
– Inference Engine – determines what follows from the knowledge base (what the knowledge base entails)
• Inference / deduction– Process for deriving new sentences from old ones
» Sound reasoning from facts to conclusions
16
Who is this key historical AI figure?
• Built a calculating machine that could add and subtract (which Pascal’s couldn’t)
• But his dream was much grander – to reduce human reasoning to a kind of calculation and to ultimately build a machine capable of carrying out such calculations
• Co-inventor of the calculus
“For it is unworthy of excellent men to lose hours like slaves in the labor of calculation which could safely be relegated to anyone else if the machine were used.”
Gottfried Leibniz (1646-1716)
17
Knowledge Base
Inference engine
Domain specific content; facts
ASK
TELL
Domain independent algorithms; can deduce new facts from the KB
KB AgentsTrue sentences
18
Inference and Entailment
• Given a set of (true) sentences, logical inference generates new sentences– Sentence a follows from sentences { bi }– Sentences { bi } entail sentence a– The classic example is modus ponens: P Þ Q and P entail what?
• An inference procedure i can derive a from KB
KB a
KB a
• A knowledge base (KB) entails sentences a
19
Inference and Entailment (cont.)
Representation
World
FactFOLLOWS
ENTAILS
Facts
Sentences
Semantics
Sentence
Semantics
Inference (n.):a. The act or process of deriving logical conclusions from premises known or assumed to be true. b. The act of reasoning from factual knowledge or evidence.
20
Inference engine
• An inference engine is a program that applies inference rules to knowledge– Goal: To infer new (and useful) knowledge
• Separation of– Knowledge– Rules– Control
Which rules should we apply when?
Inference engine
21
Inference procedures
• An inference procedure– Generates new sentences a that purport to be entailed by the
knowledge base…or...
– Reports whether or not a sentence a is entailed by the knowledge base
• Not every inference procedure can derive all sentences that are entailed by the KB
• A sound or truth-preserving inference procedure generates only entailed sentences
• Inference derives valid conclusions independent of the semantics (i.e., independent of the interpretation)
22
Inference procedures (cont.)
• Soundness of an inference procedure– i is sound if whenever KB i a, it is also true that KB a– I.e., the procedure only generates entailed sentences
• Completeness of an inference procedure– i is complete if whenever KB a, it is also true that KB i a– I.e., the procedure can find a proof for any sentence that is entailed
• The derivation of a sentence by a sound inference procedure is called a proof– Hence, the proof theory of a logical language specifies the
reasoning steps that are sound
23
Logics
• We will soon define a logic which is expressive enough to say most things of interest, and for which there exists a sound and complete inference procedure– I.e., the procedure will be able to derive anything that is derivable
from the KB– This is first-order logic, a.k.a. first-order predicate calculus– But first, we need to define propositional logic
24
Propositional (Boolean) Logic
• Symbols represent propositions (statements of fact, sentences)– P means “San Francisco is the capital of California”– Q means “It is raining in Seattle”
• Sentences are generated by combining proposition symbols with Boolean (logical) connectives
25
Propositional Logic
• Syntax– True, false, propositional symbols– ( ) , ¬ (not), Ù (and), Ú (or), Þ (implies), Û (equivalent)
• Examples of sentences in propositional logic
P1, P2, etc. (propositions)( S1 )¬ S1
S1Ù S2
S1Ú S2
S1 Þ S2
S1 Û S2
trueP1 Ù true Ù ¬( P2 Þ false )P Ù Q Û Q Ù P
26
Entailment and equivalence
• What is the meaning of a entails β, or a β
• a ≡ β iff a β and β a• Examples of logical equivalences
– Commutativity of Ù , Ú– Associativity of Ù , Ú– Distributive laws
• A and (B or C)=(A and B) or (A and C)
– De Morgan’s laws
• NOT (P OR Q) = (NOT P) AND (NOT Q)
• NOT (P AND Q) = (NOT P) OR (NOT Q)
– P Þ Q ≡ ¬P Ú Q
27
Precedence of operators (logical connectives)
• Levels of precedence, evaluating left to right1. ¬ (NOT)2. Ù (AND, conjunction)3. Ú (OR, disjunction)4. Þ (implies, conditional)5. Û (equivalence, biconditional)
• P Ù ¬ Q Þ R– (P Ù (¬Q)) Þ R
• P Ú Q Ù R– P Ú (Q Ù R)
• P Û Q Ù R Þ S– P Û ((Q Ù R) Þ S)
28
Satisfiability and Validity
• Is this true: ( P Ù Q ) ?– It depends on the values of P and Q– This is a satisfiable sentence – there are some interpretations for
which it is true and others for which it is false• Is this true: ( P Ù ¬P ) ?
– No, it is never true– This is an unsatisfiable sentence (self-contradictory) – there is no
interpretation for which it is true• Is this true: ( ((P Ú Q) Ù ¬Q) Þ P ) ?
– Yes, independent of the values of P and Q– This is a valid sentence – it is true under all possible
interpretations (a.k.a. a tautology)
29
Things to know!
• What is a sound inference procedure?– The procedure only generates entailed sentences
• What is a complete inference procedure?– The procedure can find a proof for any sentence that is entailed
• What is a satisfiable sentence?– There are some interpretations for which it is true
• What is an unsatisfiable sentence?– There is no interpretation for which it is true
• What is a valid sentence?– It is true under all possible interpretations
30
Propositional (Boolean) Logic (cont.)
• Semantics– Defined by clearly interpreted symbols and straightforward
application of truth tables – Rules for evaluating truth: Boolean algebra– Simple method: truth tables
2N rows for N propositions
31
Using propositional logic: rules of inference
• Inference rules capture patterns of sound inference– Once established, don’t need to show the truth table every time– E.g., we can define an inference rule: ((P Ú H) Ù ¬H) P for
variables P and H
ab
“If we know a, then we can conclude b”
• Alternate notation for inference rule a b :
(where a and b are propositional logic sentences)
32
Inference
KBb1
KB, b1b2
KB, b1, b2b3
…® ® ®
• Inference steps
KBb
a1, a 2, …b
or
• We’re particularly interested in
So we need a mechanism to do this!Inference rules that can be applied to sentences in our KB
33
Important Inference Rules for Propositional Logic
34
Example
KBQ ® ¬SP Ú ¬WRPP ® Q
What can we infer ( ) if we add this sentence with no inference rules?
P ® Q
What can we infer ( ) if we then add this inference procedure:
(a ® b) Ù a b
Nothing
Q and ¬S(a ® b), a
b
35
Resolution Rule: one rule for all inferences
rprqqp
ÚÚ¬Ú ,
Propositional calculus resolution
rprqqp
Þ¬ÞÞ¬ ,
cacbba
ÞÞÞ ,or
Remember: pÞ q Û ¬p Ú q, so let’s rewrite it as:
Resolution is really the “chaining” of implications.
36
Show that (α Ú β) Ù (¬β Ú γ) Þ (α Ú γ)
α β γ α Ú β ¬β Ú γ αÚβ Ù ¬βÚγ α Ú γ
0 0 0 0 1 0 0
0 0 1 0 1 0 1
0 1 0 1 0 0 0
0 1 1 1 1 1 1
1 0 0 1 1 1 1
1 0 1 1 1 1 1
1 1 0 1 0 0 1
1 1 1 1 1 1 1
This is always true for all propositions α, β, and γ, so we can make it an inference rule
Soundness:
37
Show that (¬αÞβ ) Ù (βÞ γ) Þ (¬α Þ γ)
α β γ ¬αÞ β βÞ γ ¬αÞβ Ù βÞ γ ¬αÞ γ
0 0 0 0 1 0 0
0 0 1 0 1 0 1
0 1 0 1 0 0 0
0 1 1 1 1 1 1
1 0 0 1 1 1 1
1 0 1 1 1 1 1
1 1 0 1 0 0 1
1 1 1 1 1 1 1
This is always true for all propositions α, β, and γ, so we can make it an inference rule
Soundness:
38
Conversion to Conjunctive Normal Form: CNF
• Resolution rule is stated for conjunctions of disjunctions• Question:
– Can every statement in PL be represented this way?• Answer: Yes
– Can show every sentence in propositional logic is equivalent to conjunction of disjunctions
• Conjunctive normal form (CNF)• Procedure for obtaining CNF
– Replace (P Û Q) with (P Þ Q) and (Q Þ P)– Eliminate implications: Replace (P Þ Q) with (¬P Ú Q)– Move ¬ inwards: ¬¬, ¬(P Ú Q), ¬(P Ù Q)– Distribute Ù over Ú, e.g.: (P Ù Q) Ú R becomes (P Ú R) Ù (Q Ú R)
[What about (P Ú Q) Ù R ?]– Flatten nesting: (P Ù Q) Ù R becomes P Ù Q Ù R
39
Complexity of reasoning
• Validity– NP-complete
• Satisfiability– NP-complete
• α is valid iff ¬ α is unsatisfiable• Efficient decidability test for validity iff efficient
decidability test for satisfiability.• To check if KB a, test if (KB ˄¬ α) is unsatisfiable.• For a restricted set of formulas (horn clauses), this check
can be made in linear time.– Forward chaining– Backward chaining
40
Propositional logic is quite limited
• Propositional logic has simple syntax and semantics, and limited expressiveness– Though it is handy to illustrate the process of inference
• However, it only has one representational device, the proposition, and cannot generalize– Input: facts; Output: facts– Result: Many, many rules are necessary to represent any non-
trivial world– It is impractical for even very small worlds
• The solution?– First-order logic, which can represent propositions, objects, and
relations between objects– Worlds can be modeled with many fewer rules