logical induc-on - artificial intelligence · logical induc-on andrew critch...
TRANSCRIPT
LogicalInduc-on AndrewCritch [email protected]
LogicalInduc-on
Sco8Garrabrant,AndrewCritch,TsviBenson-Tilsen,NateSoares,JessicaTaylor
(sco8|critch|tsvi|nate|jessica)@intelligence.org
MachineIntelligenceResearchIns-tute
h8p://intelligence.org/
LogicalInduc-on AndrewCritch [email protected]
Outline
Roughplanforthistalk:[5mins]Theproblemoflogicalinduc-on[10mins]Mo-va-onfromAIsafetyandotherfields[30mins]Beamerpresenta-onoftechnicalresults[15mins]Implica-onsandtake-aways
LogicalInduc-on AndrewCritch [email protected]
1min 1day ∞
#1.P(D10=7) 10% 10% 10%
#2.P(D10=7|snapshot) 10% 15% 16%
#3.P(10thdigitof√(10)=7) 10% 1% 0%
snapshotfor#2:
Credencesshouldchangewith@mespentthinking/compu@ng:Probabilitytheorygivesrulesforhowprobabili-esshouldrelatetoeachotherandchangewithnewobserva-ons,assuminglogicalomniscience…
Also,50%wouldbeaworseanswertostartwithhere...canwemakeaprincipledtheoryfromwhichthisclaimwouldfollows?
…butwhatrulesshouldcredencesfollowover-me,ascomputa-oniscarriedoutonobserva-onsthathavealreadybeenmade?
Goal:callthepurpleprocesses“logicalinduc@on”andfigureouthowitshouldwork.
LogicalInduc-on AndrewCritch [email protected]
Whydevelopatheore-calmodeloflogicalinduc-on?
Q:HowcanwereasonaboutahighlycapableAIsystembeforeitexists?A:Oneapproachistomodelitas“goodatstuff”,like:choosingac@onstoachieveobjec-vesgivenbeliefsà itroughlyobeysra@onalchoicetheory(e.g.VNMtheorem)
upda@ngbeliefsaccordingtonewevidenceàitroughlyobeysprobabilitytheory(e.g.Bayes’theorem)compu@ngbeliefupdateswithresourcelimita-onsà itroughlyobeys<?????>theory(e.g.<*****>theorem)
Inhopesofdevelopingit,<?????>hasbeencalled“logicaluncertainty”,andwecalltheprocessofrefininglogicaluncertain-es“logicalinduc@on”.
LogicalInduc-on AndrewCritch [email protected]
Pastdesideratafor“goodreasoning”underlogicaluncertainty:
1. computableapproximability—theprocessshouldbeapproximablebyaTuringMachine.(Demsky,2012)2. coherentlimit—anerinfinite-me,credencesshouldsa-sfythelawsofprobabilitytheory,suchas
(A→B)⇒(P(A)≤P(B)).(Gaifman,1964).3. par@alcoherence:credencesatfinites-meshouldroughlysa-sfysomecoherenceproper-es;suchas
Q(A^B)+Q(AvB)≈Q(A)+Q(B)(Good,1950;Hacking,1967)4. calibra@on—theprocessshouldberightroughly90%ofthe-mewhenit’s90%confident.(Savage,1967)5. introspec@on—theprocessshouldbeabletodescribeandreasonaboutitself.(Hin-kka,1962;Fagin,1995;
Chris-ano,2013;Campbell-Moore,2015)6. self-trust—itshouldunderstandthatitisreliableandthatitwillbecomemorereliablewith-me
(Hilbert,1900)7. non-dogma@sm—itdoesnotassign100%or0%credencetoclaimsunlesstheyhavebeenprovenor
disproven,respec-vely(Carnap,1962;Gaifman,1982;Snir,1982)8. PA-capable—itshouldassignnon-zeroprobabilitytotheconsistencyofPeanoArithme-c,i.e.tothesetof
consistentcomple-onsofPA.9. roughinexploitability—itshouldnotbeeasyto``dutchbook’’theprocess/makebetsagainstitthatare
guaranteedtowin(vonNeumannandMorgenstern1944;deFine{1979)10. Gaifmaninduc@vity—itshouldcometobelieve(∀x,f(x))inthelimitasitexamineseveryexampleofxand
confirmsf(x)(Gaifman1964,Hu8er2013)11. Efficiency—itrunsinpolynomial(preferablyquadra-c)-me12. Decision-relevant—shouldbeabletofocuscomputa-ononques-onsrelevanttodecisions.13. Updatesonoldevidence(Glymour,1980)
LogicalInduc-on AndrewCritch [email protected]
Let’sdeferapplica@onsun-llaterinthetalk,whentheideahasbeenmademoreprecise.
Anyques-onsfarabouttheproblemitselfbeforewegetintoformaldefini-ons?
LogicalInduc-on AndrewCritch [email protected]
Thecurrentstateoflogicaluncertaintytheory
DomainofStudy
AgentConcept
Minimalis@cSufficientCondi@ons
DesirabilityArguments Feasibility
ra-onalchoicetheory/
economics
VNMu-litymaximizer VNMaxioms Dutchbookarguments,
compellingaxioms,…AIXI,POMDPsolvers,…
probabilitytheory
Bayesianupdater
axiomsofprobabilitytheory
Dutchbookarguments,compellingaxioms,…
Solomonoffinduc-on
logicaluncertaintytheory
Garrabrantinductor ???
Dutchbookarguments,historical
desiderata,…LIA2016
recentprogress
LogicalInduc-on AndrewCritch [email protected]
Pathsforward
*Musteventuallyaddresslogicaluncertaintyimplicitlyorexplicitly,soexpectsomeconvergence.
1. Improvinglogicaluncertaintytheory(minimalis-ccondi-ons,moreconsequences…)
2. UsingGarrabrantinductors/LIA2016toposeandsolvenewproblemsinAIalignment
3. OtherapproachestoAIalignment*MIRI’sfocus
LogicalInduc-on AndrewCritch [email protected]
Howwilllogicalinduc-onbeapplicable?
Conceptualtoolsforreasoningaboutincen@ves,compe@@on,andgoalpursuitareunder-developedforcomputa-onallyboundedagents.Theypresumeagentsarelogicallyomniscient,becausewealreadyhadgoodtheore-calmodelsfordevelopingthemthatway:• Gametheoryandeconomics:
– VonNeumann-Morgensternu-litytheorem– Nashequilibriaandcorrelatedequilibria– Efficientmarkettheory:
• Fundamentaltheoremsofwelfareeconomics• Coase’sTheorem
– ValueofInforma-on(VOI)• Mechanismdesign:
– Gibbard–Sa8erthwaitetheorem– Myerson–Sa8erthwaitetheorem– RevenueEquivalencetheorem
Wecanuseourtheore-calmodeloflogicalinduc-ontorefineandexpandthesefieldsforbe8erapplica-ontoar-ficialagents.
LogicalInduc-on AndrewCritch [email protected]
Currently,gametheoryanalyzesscenarioswithlogicallyomniscientagents…
Nowwecanbe8ertheore-callyanalyzescenarioswithboundedreasoners:
Visualizingatheore-calapplica-on
LogicalInduc-on AndrewCritch [email protected]
Whathavewelearnedsofar?Thefollowingaremorefeasiblethanonemightthink:• Inexploitability.Analgorithmcansa-sfyafairlyarbitrarysetofinexploitabilitycondi-onsusingBrouwer’sFPT.
• Self-trust.Introspec-onandself-trustneednotleadtomathema-calparadoxes.
• [email protected],byanuncomputablylargemarginonpoly-megenerableques-ons.
LogicalInduc-on AndrewCritch [email protected]
Whathavewelearnedsofar?Thefollowingareless“required”thanonemightthinkforara-onalgamblertoavoidexploita-on:• Calibra@on.Sofaritlookslikeoneneedonlybecalibratedaboutlogicalbetsthatarese8ledsufficientlyquickly(thisisbeingac-velyresearched).
• Hard-codedbeliefcoherence.Apowerfulbet-balancingprocedurecanandmustlearnto“mimic”deduc-verulesusedtose8lesitsbets.
LogicalInduc-on AndrewCritch [email protected]
Metaupdates
MIRI’sgeneralapproachincludesdevelop“big”ques-onsabouthowAIcanandshouldwork,pastthestagesofphilosophicalconversa-onandintothedomainofmathandCS.Philosophy Mathema-cs/CS
bigques-onsaboutAI
technicalanswers
LogicalInduc-on AndrewCritch [email protected]
Metaupdates
Iwasnotpersonallyexpec-nglogicalinduc-ontobe“solved”inthiswayforatleastadecade,soI’veupdatedthat:• themethodologyofbreakingunse8ledphilosophicalques-onsdownintomath/CSandgrindingthroughthemismorefrui�ulthanIthought;and
• perhapsotherseemingly“outofreach”problemsinAIalignment,likedecisiontheoryandlogicalcounterfactuals,mightbeamenabletothisapproach.
LogicalInduc-on AndrewCritch [email protected]
Thanks!
To• ScobGarrabrant,forthecoreideaandmanyrapidsubsequentinsights
• TsviBensonTilsen,NateSoares,andJessicaTaylorforcoauthoringthepaper
• JimmyRintjemaforalotofhelpwithLaTeXbugsandcollabora-veedi-ngissues
LogicalInduc-on AndrewCritch [email protected]
<endofthistalk>
LogicalInduc-on AndrewCritch [email protected]
SlidesfromothertalksIcouldendupwan-ngtouseinresponsetoques-ons:
LogicalInduc-on AndrewCritch [email protected]
Someques-onsIsitfeasibletobuildausefulsuperintelligencethat,e.g.,• Sharesourvalues,andwillnottakethemtoextremes?(“valuelearning”)
• Willnotcompetewithusforresources?(“convergentincen-ves”)
• Willnotresistusmodifyingitsgoalsorshu{ngitdown?(“corribility”)
• Canunderstanditselfwithoutderivingcontradic-onsviaboundedLöb’sTheorem?(“self-reflec-vestability”)
LogicalInduc-on AndrewCritch [email protected]
Examplesoftechnicalunderstanding
• Vickreysecond-priceauc-ons(1961):– Well-understoodop-malityresults(truthfulbiddingisop-mal)
– Real-worldapplica-ons,(networkrou-ng)
– Decadesofpeer-review
LogicalInduc-on AndrewCritch [email protected]
• Nashequilibria(1951):
LogicalInduc-on AndrewCritch [email protected]
Problem:CounterfactualsforSelf-Reflec-veAgents
WhatdoesitmeanforaprogramAtoimprovesomefeatureofalargerprogramEinwhichAisrunning,andwhichAcanunderstand?
def Environment (): … def Agent(senseData) : def Utility(globalVariables) : … … … do Agent(senseData1) … do Agent(senseData2) … end
LogicalInduc-on AndrewCritch [email protected]
(op-onalpausefordiscussionofIndigna-onBot)
LogicalInduc-on AndrewCritch [email protected]
Example:πmaximizing
WhatwouldhappenifIchangedthefirstdigitofπto9?Thisseemsabsurdbecauseπislogicallydetermined.However,theresultofrunningacomputerprogram(e.g.theevolu-onoftheSchrodingerequa-on)islogicallydeterminedbyitssourcecodeandinputs…
LogicalInduc-on AndrewCritch [email protected]
…whenanagentreasonstodoX“becauseXisbe8erthanY”,consideringwhatwouldhappenifitdidYinsteadmeansconsideringamathema-calimpossibility.(Iftheagenthasaccesstoitsownsourcecode,itcanderiveacontradic-onfromthehypothesis“IdoY”,fromwhichanythingfollows.ThisisclearlynothowwewantourAItoreason.Howdowe?
LogicalInduc-on AndrewCritch [email protected]
Currentformalismsare“Cartesian”inthattheyseparateanagent’ssourcecodeandcogni-vemachineryformitsenvironment.
Thisisatypeerror,andincombina-onwithothersubtle-es,ithassomeseriousconsequences.
LogicalInduc-on AndrewCritch [email protected]
Examples(page1)• RobustCoopera?oninthePrisoners’Dilemma(LaVictoireetal,2014)demonstratesnon-classicalcoopera-vebehaviorinagentswithopensourcecodes;
• MemoryIssuesofIntelligentAgents(OrseauandRing,AGI2012)notesthatCartesianagentsareoblivioustodamagetotheircogni-vemachinery;
LogicalInduc-on AndrewCritch [email protected]
Examples(page2)• Space-TimeEmbeddedIntelligence(OrseauandRing,AGI2012)providesamorenaturalizedframeworkforagentsinsideenvironments;
• Problemsofself-referenceinself-improvingspace-?meembeddedintelligence(FallensteinandSoares,AGI2014)iden-fiesproblemspersis-ngintheOrseau-Ringframework,includingprocras-na-onandissueswithself-trustarisingfromLöb’stheorem;
LogicalInduc-on AndrewCritch [email protected]
Examples(page3)• VingeanReflec?on:ReliableReasoningforSelf-ImprovingAgents(FallensteinandSoares,2015)providessomeapproachestoresolvingsomeoftheseissues;
• …lotsmore;seeintelligence.org/researchforaddi-onalreading.