lectures - department of computer and information science...
TRANSCRIPT
TDDD10AIProgramming
AutomatedPlanningCyrilleBerger
Planningslidesborrowed
fromDanaNau:http://www.cs.umd.edu/~nau/planning/slides/ 2/61
Lectures1AIProgramming:
2Introductionto
3AgentsandAgents
4Multi-Agentand
5Multi-AgentDecision
6CooperationAndCoordination
7CooperationAndCoordination
8Machine
9AutomatedPlanning
10PuttingItAll
3/61
Lecturecontent
AutomatedPlanningWhatisplanning?
Typeofplanners
Domain-dependentplanners
Configurableplanners
ReenforcementLearningTaskallocationlearning
IndividualandGroupAssignment
AutomatedPlanning
Whatisplanning?
6
DictionaryDefinitionsof“Plan”
1Ascheme,program,ormethodworkedoutbeforehand
fortheaccomplishmentofanobjective:aplanofattack.2Aproposedortentativeprojectorcourseofaction:hadno
plansfortheevening.3Asystematicarrangementofelementsorimportant
parts;aconfigurationoroutline:aseatingplan;theplan4Adrawingordiagrammadetoscaleshowingthe
structureorarrangementofsomething.5Aprogramorpolicystipulatingaserviceorbenefit:
apensionplan.
7
AIDefinitionofPlan
[arepresentation]offuturebehavior
(...)usuallyasetofactions,with
temporalandotherconstraintson
them,forexecutionbysomeagent
oragents.
–AustinTate,MITEncyclopediaofthe
CognitiveSciences,1999
8
StatetransitionsystemRealworldisabsurdlycomplex,needto
approximateOnlyrepresentwhattheplannerneedstoreason
StatetransitionsystemΣ=(S,A,E,ɣ)S={abstracte.g.,statesmightincludearobot’slocation,butnot
itspositionandorientationA={abstracte.g.,“moverobotfromloc2toloc1”mayneed
complexlower-levelimplementationE={abstractexogenousNotundertheagent’sɣ=statetransitionGivesthenextstate,orpossiblenextstates,after
anactionoreventɣ:S×(A∪E)→Sorɣ:S×(A∪E)→{S₁,...
9
Example
statesS={s₀,…,s₅}
A={move1,move2,
put,take,load,
unload}
E=∅
ɣ:S×A→S:defined
ontherightside
10
Fromplantoexecution
11
PlanningproblemDescriptionofΣ=
(S,A,E,ɣ)
Initialstateorsetof
states
ObjectiveGoalstate,setofgoalstates,set
oftasks,“trajectory”ofstates,
objectivefunction,…
Example:Initialstate=s₀
Goalstate=s₅
12
PlanClassicalplan:a
sequenceofactions:⟨take,move1,load,move2⟩
Policy:partialfunction
fromSintoA{(s₀,take),(s₁,move1),(s₃,load),(s₄,
move2)}
{(s₀,move1),(s₂,take),(s₃,load),(s₄,
move2)}
Both,ifexecutedstarting
ats,produces₅
13
PlanningandScheduling
SchedulingDecidewhenandhowtoperform
agivensetofactionsTimeandResourceconstraintsand
priorities
i.e.theschedulerofyourkernel
NP-Complete
PlanningDecidewhatactionstouseto
achievesomesetofobjectivesCanbemuchworsethanNP-
complete;worstcaseis
undecidable
14
ApplicationsRoboticsSequenceofactions
Pathandmotionplanning
IndustrialPrinter
Productionmachines:sheet-metalbending
ForestfirePackagesGames:playingbridge,...
Typeofplanners
16
TypeofplannersDomain-specificMadeortunedforaspecificplanningdomain
Won’tworkwell(ifatall)inotherplanningdomains
Domain-independentInprinciple,worksinanyplanningdomain
Inpractice,needrestrictionsonwhatkindofplanningdomain
ConfigurableDomain-independentplanningengine
Inputincludesinfoabouthowtosolveproblemsinsome
domain
17
Domain‐SpecificPlanners
Mostsuccessfulreal-worldplanningsystemswork
thisway:Marsexploration,sheet-metalbending,playingbridge,etc.
Oftenuseproblem-specifictechniquesthatare
difficulttogeneralizetootherplanningdomains
18
Domain-SpecificPlanner-Example
Bridge
Aftercarddealing,players
makebidsandprepareaplan
thattheywillfollowduringthe
game
Domainspecific:Removestatedependingonthecardyou
areholding
Forinstance,Northwillnotchoose
"heart"astrumpcolor
19
Domain-IndependentPlanners
Inprinciple,worksinanyplanningdomainNodomain-specificknowledgeexceptthedescriptionofthe
systemΣ
InpracticeNotfeasibletomakedomain-independentplannerswork
wellinallpossibleplanningdomains
Makesimplifyingassumptionstorestrictthe
setofdomainsClassicalplanning
Historicalfocusofmostresearchonautomatedplanning
20
RestrictiveAssumptionsA0:Finitesystem:finitelymanystates,actions,events
A1:Fullyobservable:thecontrolleralwaysΣ’scurrentstate
A2:Deterministic:eachactionhasonlyoneoutcome
A3:Static(noexogenousevents):nochangesbutthecontroller’s
actions
A4:Attainmentgoals:asetofgoalstatesSg
A5:Sequentialplans:aplanisalinearlyorderedsequenceof
actions(a1,a2,...an)
A6:Implicittime:notimedurations;linearsequenceof
instantaneousstates
A7:Off-lineplanning:plannerdoesn’tknowtheexecutionstatus
Domain-dependentplanners
22
ClassicalPlanningClassicalplanningrequiresalleight
restrictiveassumptionsOfflinegenerationofactionsequencesforadeterministic,static,finite
system,withcompleteknowledge,attainmentgoals,andimplicittime
ReducestothefollowingGivenaplanningproblemP=(Σ,s0,Sg)
Findasequenceofactions(a1,a2,...an)thatproducesasequenceofstate
transitions(s1,s2,...,sn)suchthatsnisinSg.
Thisisjustpath-searchinginaNodes=states
Edges=actions
Isthis
23
ClassingPlanningGeneralizetheearlier5locations,
3robotvehicles,
100containers,
3palletstostackcontainerson
Thisisprobablyjustasingleboat...
Thenthereare10²⁷⁷statesNumberofparticlesintheuniverseisonlyabout10⁸⁷
Theexampleismorethan10¹⁹⁰timesaslarge
Automated-planningresearchhasbeen
heavilydominatedbyclassicalplanningDozens(hundreds?)ofdifferentalgorithms
24
Plan‐SpacePlanningDecomposesetsofgoalsintothe
individualgoals
PlanforthemseparatelyBookkeepinginfotodetectandresolve
interactions
Produceapartiallyorderedplan
thatretainsasmuchflexibilityas
possible
TheMarsroversusedatemporal-
planningextensionofthis
25
PlanningGraphs
Roughidea:First,solvearelaxedproblemEach“level”containsalleffectsofallapplicableactions
Eventhoughtheeffectsmaycontradicteachother
Next,doastate-spacesearchwithintheplanninggraph
Graphplan,IPP,CGP,DGP,LGP,PGP,SGP,TGP,...
26
HeuristicSearchHeuristicfunctionlikethoseinA*Createdusingtechniquessimilartoplanninggraphs
Problem:A*quicklyrunsoutofmemorySodoagreedysearchinstead
Greedysearchcangettrappedinlocal
minimaGreedysearchpluslocalsearchatlocalminima
HSP[Bonet&Geffner]
FastForward[Hoffmann]
Configurableplanners
28
ConfigurableplannersInanyfixedplanningdomain,adomain-independentplanner
usuallywillnotworkaswellasadomain-specificplannermade
specificallyforthatdomainAdomain-specificplannermaybeabletogodirectlytowardasolutioninsituations
whereadomain-independentplannerwouldexploremayalternativepaths
Butwedon’twanttowriteawholenewplannerforeverydomain
ConfigurableplannersDomain-independentplanningengine
Inputincludesinfoabouthowtosolveproblemsinthedomain
Generallythismeansonecanwriteaplanningenginewithfewer
restrictionsthandomain-independentplannersHierarchicalTaskNetwork(HTN)planning
Planningwithcontrolformulas
29
PlanningwithControlFormulas
Ateachstates,wehaveacontrolformulawrittenintemporal
logic
e.g.,“neverpickupxunlessxneedstogoontopofsomethingelse”
Foreachsuccessorofs,deriveacontrolformulausinglogical
progression
PruneanysuccessorstateinwhichtheprogressedformulaisfalseTLPlan,TALplanner,...
30
HTNPlanning(1/2)ProblemreductionTasks(activities)ratherthangoals
Methodstodecomposetasksintosubtasks
Enforceconstraints,backtrackifnecessaryE.g.,taxinotgoodforlongdistances
Real-worldapplications
Noah,Nonlin,O-Plan,SIPE,SIPE-2,SHOP,
SHOP2
31
HTNPlanning(2/2)
32
ForwardandBackwardSearch
Instate-spaceplanning,mustchoosewhethertosearch
forwardorbackward
InHTNplanning,therearetwochoicestomakeabout
direction:forwardorbackward
upordown
33
LimitationofOrdered-TaskPlanning
Problemoftotalorder
Thiscouldbenicer
Solvedwithpartialordermethod
34
Planninginanuncertainworld
Untilnow,wehaveassumedthateach
actionhasonlyonepossibleoutcomeButoftenthat’sunrealistic
Inmanysituations,actionsmayhave
morethanonepossibleoutcomeActionfailures
e.g.,gripperdropsitsload
Exogenousevents
e.g.,roadclosed
Wouldliketobeabletoplaninsuch
situations
Oneapproach:MarkovDecision
Processes
35
AutomatedPlanning-Summary
Domain-specificplannerWriteanentirecomputerprogram-lotsofwork
Lotsofdomain-specificperformanceimprovements
Domain-independentplannerJustgiveitthebasicactions-notmucheffort
Notveryefficient
ReenforcementLearning
37
ReinforcementLearningDefinition
Agentsaregivensensoryinputs:States∊S
RewardR∊ℝ
Ateachsteps,agentsselectan
output:Actiona∊A
38
NaïveApproachUsesupervised
learningtolearn:f(s,a)=R
Foranyinputstate,
pickthebest
action:a=argmaxf(s,a)
a∊A
Willthatwork?
39
MarkovDecisionProcess(1/2)
Theagentneedtothinkahead!
Itneedsagoodsequenceofactions.
FormalizedintheMarkovDecision
Processframework!
40
MarkovDecisionProcess(2/2)
FinitesetofstatesS,finitesetofactions
Ateachdiscretetimestepagentobservesstatesₜ∊S
choosesactionaₜ∊A
andreceivesanimmediaterewardrₜ.
Thestatechangestosₜ+1∊
Markovassumptionissₜ+1=δ(sₜ,aₜ)and
rₜ=r(sₜ,aₜ).
41
PolicyFunctionThepolicyfunctiondecideswhichactiontotakeineach
state:aₜ=π(sₜ)
Thepolicyfunctioniswhatwewanttolearn!
42
RewardsTothinkahead:anagentslooksatfuturerewards:r(sₜ₊₁,aₜ₊₁),r(sₜ₊₂,aₜ₊₂)...
formalizedasthesumofrewards(alsocalledutilityorvalue):
V=∑ɣᵗr(sₜ,aₜ)
ɣisthediscountfactormakingrewardsfar
offintothefuturelessvaluable.
Ifwefollowaspecificpolicyπ,thevalueof
statesₜis:Vπ(sₜ)=r(sₜ,π(sₜ))+ɣVπ(sₜ₊₁)
43
ValueFunctionValuefunctionforrandommovement:
Optimalvaluefunctionforoptimalpolicy:
44
OptimalPolicy
Ifwefollowaspecificpolicyπ:Vπ(sₜ)=r(sₜ,π(sₜ))+ɣVπ(sₜ₊₁)
IfweknowVπ(st)thenthepolicyπis
givenby:π(s)=argmaxa(r(s,a)+ɣVπ(δ(s,a)))
Findingtheoptimalpolicyisabout
findingπ(s)orVπ(sₜ)orboth.
45
ValueIteration
InitializethefunctionV(s)with
randomvaluesV₀(s)
Foreachstatesₜandeachiterationk
do:computeVₖ₊₁(sₜ)=maxₐ(r(sₜ,a)+ɣVₖ(sₜ₊₁))
46
Q-FunctionLearninginunknownenvironment
Optimalpolicyπ*(s)=argmaxₐ(r(s,a)+V*(δ(s,a)))
Whatifwedonotknowδ(s,a)?orr(s,a)?
Q-FunctionQ(s,a)=r(s,a)+V*(δ(s,a)))
π*(s)=argmaxa(Q(s,a))
47
UpdatetheQ-Function
QandV*arecloselyV*(s)=maxₐ
QcanbewrittenQ(sₜ,aₜ)=r(sₜ,aₜ)+V*(δ(sₜ,aₜ))
=r(sₜ,aₜ)+ɣmaxₐ'Q(sₜ₊₁,a'
IfQ^denotethecurrent
approximationofQthenitcanbeQ^(s,a):=r+ɣmaxₐ'
48
Q-LearningforDeterministicWorlds
Foreachs,ainitializetableentryQ^(s,a)⟵
0.
Observecurrentstates.
Doforever:Selectanactionaandexecuteit
Receiveimmediaterewardr
Observethenewstates'
UpdatethetableentryforQ^(s,a):Q^(s,a):=r+ɣmaxₐ'Q^(s',a')
s⟵s'
49
Q-LearningExample
50
Q-LearningforNonDeterministicWorlds
Whatiftheworldisnon-
deterministic?
VandQarethenexpectedvalues:V=E[∑ɣᵗr(sₜ,aₜ)]
Q(s,a)=E[r(s,a)+V*(δ(s,a)))]
51
Q-LearningforNonDeterministicWorlds
LearningQbecomes:Q^ₙ(s,a):=(1-αₙ)Q^ₙ₋₁(s,a)+αₙ(r+ɣmaxₐ'
Q^ₙ₋₁(s',a'))
Taskallocationlearning
53
Context
Mosttasksrequiremorethananone
agentExtinguishafire
Whichbuildingtoextinguish?
Howmanyagentsper-task?
54
TaskallocationlearningFiresare
Localdecisionfor
whichbuildingtoSelective
DecisiontreewithQ-
ValuesAteachstep,usethe
treetogetthereward
forextinguishinga
specificbuilding
55
SummaryAdvantagesAllowsanagenttoadapttomaximiserewardsina
potentiallyunknownenvironment.
DisadvantagesRequirescomputationpolynomialinthenumberof
states!
Thenumberofstatesgrowsexponentiallywithinput
dimensions!
ReinforcementLearningassumesdiscretestatesand
actionspaces.
IndividualandGroupAssignment
57
Project
Agroupof4to6students
ImplementaRoboRescueteam
Workindividuallyonasubpartofthe
problem
58
Tasks
FoundationtasksNavigation
Communication
Agents:police,ambulance,fire
brigadExploration
Prediction
Tasksallocation
59
ReportsIndiviualplanFindaround4relatedarticles
Writeaonepagedescription
Deadline:October,30th
IndividualreportImplementandevaluatethetechnique
Writeareportdescribingthetechnique,resultsandadiscussion
Deadline,draft:December,16th,final:January6th
Commentsdeadline:December,21th
GroupreportOneperteam!
Adescriptionofthealgorithmsandstrategiesused
Deadline,final:January6th
60
WhatisagoodReportGradebasedonthereportquality,suchasreadability,
language,pictures,structureandlength,and
theleveloftechnicaldetailweightedwiththedifficultyofthe
chosenapproaches
Thereportsshouldbe5-6pages,butitismoreimportantto
makeitpossibleforthereadertounderstandyourwork
thantogettheexactrightnumberofpages.
61/61
Summary
AutomatedplanningClassicalplanningproblem
HTN
ReenforcementlearningMarkovdecisionprocess
Q-Learning