artificial intelligencekti.mff.cuni.cz/~bartak/ui2/lectures/lecture01eng.pdf · 2016. 2. 29. ·...
Post on 04-Feb-2021
6 Views
Preview:
TRANSCRIPT
-
ArtificialIntelligence
Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic
Introduction
Weconstructrationalagents.Anagent isanentitythatperceives itsenvironmentthroughsensors andactsuponthatenvironmentthroughactuators.Arationalagentisanagentmaximizingitsexpectedperformancemeasure.
InAI1wedealtmainlywithalogicalapproachtoagentdesign(nouncertainty).Weignored
– interfacetoenvironment(sensors,actuators)– uncertainty– thepossibilityofself-improvement(learning)
-
Coursestructure
Introduction– motivation and background on probability
Probabilistic reasoning – uncertainty, probabilistic reasoning, Bayesian
networks, Hidden Markov Models
Rational decisions– utility theory, Markov Decision Processes, game
theory, mechanism design
Machine learning– decision trees, regression, SVM, reinforcement
learning
Resources
Artificial Intelligence: A Modern Approach– S. Russell and P. Norvig– Prentice Hall, 2010 (3rd ed.)– http://aima.cs.berkeley.edu/
Umělá inteligence 1-6– Vladimír Mařík, Olga Štěpánková,
Jiří Lažanský a kol.– Academia
-
Coursewebsite
http://ktiml.mff.cuni.cz/~bartak/ui2
Youcanfindthere:– slides– linksandresources– contacts– quiz– …
Linkstoothercourses
SeminaronArtificialIntelligenceII– howtoapplyAItechniquesinpractice
Machinelearning– howcancomputerslearnnewthings
Multi-agentsystems– howtohandlemultipleagents
Probabilisticgraphicalmodels– howtodoBayesianinferenceefficientlyetc.
Human-likeartificialagents– howtodesignagentsforvirtualenvironments
Practicalcourseonrobotics– howtodesignhardwareagents
-
Uncertaintysofar
Canwehandleuncertaininformation inthepurelogicalapproach?beliefstates– representsetsofallpossibleworldstates fortheagent
Drawbacks– a logicalagentmustconsidereverylogicallypossibleexplanationfortheobservations,nomatterhowunlikely(largeandcomplexrepresentations)
– acorrectcontingentplanmustconsiderarbitrarylikelycontingencies(bigplans)
– sometimesthereisnoplanthatisguaranteedtoachievethegoal,yettheagentmustact
Example
Diagnosingadentalpatient‘stoothacheLetustrytoapplypropositionallogic:Toothache⇒ Cavity
Hmm,isitreallytrue?– notallpatientswithtoothacheshavecavities;someofthemhavegumdisease,anabscess,orotherproblemsToothache⇒ Cavity∨ GumProblem ∨ Abscess∨ …
Wecouldtryturningtheruleintoacausalrule:Cavity⇒ Toothache
Butthisisnotrighteither– notallcavitiescausepainTheonlywaytofixtheruleistomakeitlogicallyexhaustive!
-
Usingalogicalapproach
Whydoeslogicfailtocopewithadomainlikemedicaldiagnosis?
• laziness: itistoomuchworktolistthecompletesetofantecedentorconsequentsandtoohardtousesuchrules
• theoreticalignorance:medicalsciencehasnocompletetheoryforthedomain
• practicalignorance:evenifweknowalltheruleswemightbeuncertainbecausenotallthenecessarytestshavebeenorcanberun
Weneedanothertooltodealwithdegreesofbelief–probabilitytheory.
Alogicalagentbelieveseachsentencetobetrueorfalseorhasnoopinion.Aprobabilisticagentmayhaveanumericaldegreeofbeliefbetween0(certainlyfalse)and1(certainlytrue).
Basicprobabilitynotation
Likelogicalassertions,probabilisticassertionsareaboutpossibleworlds– samplespaceΩ.– thepossibleworldsaremutuallyexclusive andexhaustive
Eachpossibleworldω isassociatedwithanumericalprobabilityP(ω)suchthat:
0≤ P(ω)≤ 1Σω∈Ω P(ω)=1
Example:Ifweareabouttorolltwo(distinguishable)dice,thereare36possibleworldstoconsider:(1,1),(1,2),…,(6,6)P(ω)=1/36
-
Events
Thesetsofpossibleworldsarecalledevents.Example:„doublesarerolled“isanevent
Theprobabilityofeventisthesumofprobabilitiesofpossibleworldsintheevent.
P(φ) =Σω∈φ P(ω)Example:P(doubles)=1/36+1/36+1/36+1/36+1/36+1/36=1/6
Theseprobabilitiesarecalledunconditional orprior probabilities („priors“forshort).
Conditionalprobability
Frequently,wehavesomeinformation(evidence)andweareinterestedinprobabilityofsomeevent.
Forexample,whatistheprobabilityofdoubleifwealreadyknowthatfirstdierolledto5?P(doubles|Die1 =5)=1/36/(6*1/36)=1/6
Thisiscalledconditionalor posteriorprobabilityP(a|b)=P(a∧ b)/P(b),wheneverP(B)> 0
ThiscanbealsowritteninadifferentformcalledtheproductruleP(a∧ b)=P(a|b).P(b)
Beware!Ifwehavemoreevidencethentheconditionalprobabilityneedstoassumeit.P(doubles |Die1 =5,Die2 =5)=1
-
Randomvariables
Inafactoredrepresentation,apossibleworldisrepresentedbyasetofvariable/valuepairs.Variablesinprobability theoryarecalledrandomvariables.Everyrandomvariablehasadomain–thesetofpossiblevaluesitcantakeon(similarlytoaCSP).
Die1 – representsavalueonthefirstdie1(1,…,6)Cavity– describeswhether thepatienthasorhasnotcavity(true,false)
Apossibleworldisfullyidentifiedbyvaluesofallrandomvariables.
P(Die1 =5,Die2 =5)
Probabilitydistribution
Probability of all possible worlds can be described using a table called a full joint probability distribution – the elements are indexed by values of random variables.
Given the table, we can calculate probabilities of values of any random variable:
P(toothache=true) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2P(toothache=false) = 0.072+ 0.008 + 0.144 + 0.576 = 0.8
We will describe the table in a short way as:P(Toothache) = 〈0.2, 0.8〉
-
Probabilityaxioms
P(¬a)=1– P(a)
inclusion-exclusionprincipleP(a∨ b)=P(a)+P(b)– P(a∧ b)
chainruleP(A,B,C,D)=P(A|B,C,D)P(B,C,D)=P(A|B,C,D)P(B|C,D)P(C,D)=P(A|B,C,D)P(B|C,D)P(C|D)P(D)
Inferenceusingfulljointdistributions
How to answer questions?Knowledge base isrepresented using full joint distribution.To compute posterior probability of a query proposition given observed evidence, we add up probabilities of possible worlds in which the proposition is true (marginalization or summing out).P(φ) = Σω:ω|=φ P(ω)P(Y) = Σz∈Z P(Y,z)
-
Exampleofprobabilisticinference
P(φ)=Σω:ω|=φ P(ω)P(Y)=Σz∈Z P(Y,z)
P(toothache) (=P(Toothache=true))=0.108+0.012+0.016+0.064 =0.2
P(cavity ∨ toothache)=0.108+0.012+0.072+0.008+0.016+0.064=0.28
P(¬cavity|toothache)=P(¬cavity ∧ toothache) /P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4
Normalization
P(¬cavity|toothache)=P(¬cavity∧ toothache)/P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4
P(cavity|toothache)=P(cavity∧ toothache)/P(toothache)=(0.108+0.012)/(0.108+0.012+0.016+0.064)=0.6
Noticethatdenominatorsareidenticalinbothformulas!Weevendonotneedtoknowtheexactvalueofdenominator:
P(¬cavity|toothache)+P(cavity|toothache)=1Wecanuseanormalizationconstantα instead,computedsuch
thattheevaluateddistributionaddsupto1.P(Cavity|toothache)=α P(Cavity,toothache)
=α [P(Cavity,toothache,catch)+P(Cavity,toothache,¬catch)]= α [〈0.108,0.016〉 +〈0.012,0.064〉 ]= α [〈0.12,0.08〉 ]= [〈0.6,0.4〉 ]
-
Inferenceviaenumeration
Inatypicalcase,weknowvalueseofrandomvariablesEfromtheobservation andwearelookingforprobabilitydistributionofrandomvariablesY fromthequery.TheotherrandomvariablesarehiddenH =X– Y– E.P(Y |E=e)=α P(Y,E=e)=α Σh P(Y,E=e,H=h)
Somedrawbacksofinferencebyenumeration:• theworst-casetimecomplexityisO(dn),wheredisthenumberofvaluesindomainsofeachrandomvariable
• tostorefulljointprobabilitydistributionweneedO(dn)space
• lastbutnotleast,itisnoteasytoobtainprobabilitiesforallpossibleworlds
Independence
LetusexpandthefulljointdistributionbyaddingafourthvariableWeatherwiththedomain{cloudy,sunny,rain,snow}– thenewfulljointdistributionhas2x2x2x4=32elements(possibleworlds).
P(toothache,catch,cavity,cloudy)=P(cloudy|toothache,catch,cavity)*P(toothache,catch,cavity)
Doone‘sdentalproblemsinfluencetheweather?P(cloudy|toothache,catch,cavity)=P(cloudy)
Wecanwriteingeneral:
P(Toothache,Catch,Cavity,Weather)=P(Toothache,Catch,Cavity)*P(Weather)
Hencethefulljointdistributioncanbeconstructedfromtwosmallertables,onewith8elementsandonewith4elements.
Thispropertyiscalled(absolute)independence:P(X|Y)=P(X)orP(Y|X)=P(Y)orP(X,Y)=P(X).P(Y)
-
Conditionalindependence
Fullindependenceallowsusreducingthesizeofthedomainrepresentation,butunfortunatelyfullindependence israreandevenindependentsubsetscanbequitelarge.
Whenonehascavity,doescatchdependontoothache?P(catch|toothache,cavity)=P(catch|cavity)P(catch|toothache,¬ cavity)=P(catch|¬ cavity)
RandomvariablesCatchandToothacheareindependent ifweknowthevalueofCavity.
P(Catch|Toothache,Cavity)=P(Catch|Cavity)
Thispropertyiscalledconditionalindependence:P(X|Y,Z)=P(X|Y)orP(Z|X,Y)=P(Z|Y)orP(Z,X|Y)=P(Z|Y)P(X|Y)
Exploitingconditionalindependence
Conditionalindependence canbeusedtofurtherreducethesizeofdomainrepresentation.
P(Toothache,Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch|Cavity)P(Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)
Thefulljointdistributioncanbeconstructedfromthreesmallertablesofsizes2+2+1=5(onlyindependentelementsarerepresented).
-
Diagnosticsystems
Letusgobacktodiagnosticproblems.Usuallywearelookingfordisease(thesourceofproblems)basedonsymptoms(observations).– weareinterestedinthediagnosticdirectionexpressedasconditionalprobabilityP(disease|symptoms)
However,frompastexperienceweoftenhaveotherinformation:– theprobabilityofdiseaseP(disease)– theprobabilityofsymptomsP(symptoms)– the causalrelationexpressedasconditionalprobabilityP(symptoms|disease)
Howcanthisinformationbeexploitedtogettheprobabilityofthediagnosticdirection?
Bayes' rule
RecalltheproductruleP(a∧b)=P(a|b)P(b)=P(b|a)P(a)
WecandeduceasocalledBayes‘rule(lawortheorem):P(a|b)=P(b|a)P(a)/P(b)
ingeneral:P(Y|X)=P(X|Y)P(Y)/P(X)= α P(X|Y)P(Y)
ItlooksliketwostepsbackwardasnowweneedtoknowP(X|Y),P(Y),P(X).Butthesearethevaluesthatwefrequentlyhave.P(cause|effect)=P(effect|cause)P(cause)/P(effect)– P(effect|cause)describesthecausaldirection– P(cause|effect)describesthediagnosticrelation
-
UsingBayes'rule
Medicaldiagnosis– frompastcasesweknowP(symptoms|disease),P(disease),P(symptoms)
– foranewpatientweknowsymptomsandlookingfordiagnosisP(disease|symptoms)
Example:– meningitiscausesastiffneck70%ofthetime– thepriorprobabilityofmeningitisis1/50000– thepriorprobabilityofstiffneckis1%Whatistheprobabilitythatapatienthavingastiffneckhasmeningitis?
P(m|s)=P(s|m).P(m)/P(s)=0.7*1/50000/0.01=0.0014
Whytheconditionalprobabilityforthediagnosticdirectionisnotstoreddirectly?• diagnosticknowledgeisoftenmorefragilethancausalknowledge• forexample,ifthereisasuddenepidemicofmeningitis,theunconditionalprobabilityofmeningitisP(m)willgoupsoP(m|s) shouldalsogoupwhilethecausalrelationP(s|m)isunaffectedbytheepidemic,asitreflectshowmeningitisworks
NaiveBayes model
Whatiftherearemoreobservations?Wecanexploitconditional independence asfollows
P(Toothache,Catch,Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)
Ifalltheeffectsareconditionally independent giventhecausevariable,weget:
P(Cause,Effect1,…,Effectn) =P(Cause)Πi P(Effecti|Cause)SuchaprobabilitydistributioniscalledanaiveBayesmodel
(itisoftenusedevenincaseswherethe“effect”variablesarenotactuallyconditionally independentgiventhevalueofthecausevariable).
-
TheWumpus worldrevisited
Wumpus isback!Wehaveamazewithpitsthataredetectedinneighboringsquaresviabreeze(Wumpus andgoldwillnotbeassumednow).
Wheredoestheagentshouldgo,ifthereisbreezeat(1,2)and(2,1)?
Purelogicalinferencecanconcludenothingaboutwhichsquareismostlikelytobesafe!
Towhichsquaredoestheagentshouldgo?
Wumpus:probabilisticmodel
Booleanvariables:Pi,j – pitatsquare(i,j)Bi,j – breezeatsquare(i,j)(onlyfortheobservedsquaresB1,1,B1,2 aB2,1.
FulljointprobabilitydistributionP(P1,2,…,P4,4,B1,1,B1,2,B2,1)=P(B1,1,B1,2,B2,1|P1,2,…,P4,4,)*P(P1,2,…,P4,4)P(P1,2,…,P4,4)=Πi,j P(Pi,j)P(P1,2,…,P4,4)=0.2n *0.816-n
product rule
pits are spread independently
probability of pit is 0.2 and there are n pits
-
Wumpus:queryandsimplereasoning
Assumethatwehaveevidence:b=b1,1 ∧ b1,2 ∧ b2,1known=¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1
WeareinterestedinansweringqueriessuchasP(P1,3|known,b).
Answercanbecomputedbyenumerationofthefulljointprobabilitydistribution.LetUnknownbethevariablesPi,j exceptP1,3andKnown:P(P1,3|known,b)=Σunknown P(P1,3,unknown,known,b)
ButitmeanstoexploreallpossiblevaluesofvariablesUnknownandthereare212 =4096terms!
Canwedoitbetter(faster)?
Wumpus:conditionalindependence
Observation:Theobservedbreezesareconditionallyindependentoftheothervariablesgiventheknown(white),frontier(yellow),andqueryvariables.
Wesplitthesetofhiddenvariablesintofringeandothervariables:Unknown=Fringe∪ Other
Fromconditionalindependenceweget:P(b |P1,3,known,unknown)=P(b |P1,3,known,fringe)
Now,letusexploitthisformula.
-
Wumpus:reasoning
P(P1,3 | known, b)= α Σunknown P(P1,3, known, unknown, b)
= α Σunknown P(b | P1,3, known, unknown) * P(P1,3, known, unknown)
= α ΣfringeΣother P(b | P1,3, known, fringe,other) * P(P1,3, known, fringe,other)
= α ΣfringeΣother P(b | P1,3,known, fringe) * P(P1,3,known,fringe,other)
= α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3,known,fringe,other)
= α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3)P(known)P(fringe)P(other)
= α P(known)P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe) Σother P(other)
= α´ P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe)
α = α. P(known)Σother P(other) = 1
product rule P(X,Y) = P(X|Y) P(Y)
Wumpus:solution
P(P1,3|known,b)=α´ P(P1,3)Σfringe P(b|P1,3,known,fringe)P(fringe)Letusexplorepossiblemodels(values)ofFringethatarecompatiblewithobservationb.
P(P1,3|known,b)=α´ 〈0.2(0.04+0.16+0.16),0.8(0.04+0.16) 〉=〈 0.31,0.69 〉
P(P2,2|known,b)=〈 0.86,0.14 〉
Definitelyavoidthesquare(2,2)!
-
Summary
Probabilitytheoryisaformalmechanismtohandleuncertainty.Fulljointdistributiondescribesprobabilities ofallpossibleworlds.Answers toqueriescanbeobtainedbysummingoutprobabilities ofpossibleworldsconsistentwiththeobservation.However,largerproblemswillrequireabetterapproach.Wearegoingtoexploit independence andconditionalindependence.
© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic
bartak@ktiml.mff.cuni.cz
top related