truth-conduciveness without reliability: a non-theological explanation of ockham’s razor kevin t....
Post on 22-Dec-2015
220 views
TRANSCRIPT
Truth-conduciveness Truth-conduciveness Without Reliability: Without Reliability: A Non-Theological A Non-Theological
Explanation of Ockham’s Explanation of Ockham’s RazorRazor
Kevin T. KellyKevin T. KellyDepartment of PhilosophyDepartment of Philosophy
Carnegie Mellon UniversityCarnegie Mellon Universitywww.cmu.eduwww.cmu.edu
I. The Puzzle I. The Puzzle
Which Theory is True?Which Theory is True?
???
Ockham Says:Ockham Says:
Choose theSimplest!
But Why?But Why?
Gotcha!
PuzzlePuzzle
An indicator must be An indicator must be sensitivesensitive to to what it indicates.what it indicates.
simple
PuzzlePuzzle
An indicator must be An indicator must be sensitivesensitive to to what it indicates.what it indicates.
complex
PuzzlePuzzle
But Ockham’s razor always points But Ockham’s razor always points at simplicity.at simplicity.
simple
PuzzlePuzzle
But Ockham’s razor always points But Ockham’s razor always points at simplicity.at simplicity.
complex
PuzzlePuzzle
If a broken compass is known to If a broken compass is known to point North, then we already know point North, then we already know where North is. where North is.
complex
PuzzlePuzzle
But then who needs the compass?But then who needs the compass?
complex
Proposed AnswersProposed Answers
1.1. EvasiveEvasive
2.2. CircularCircular
3.3. MagicalMagical
A. EvasionsA. Evasions
Truth
A. EvasionsA. Evasions
Truth
TestabilityExplanation
Unity
Brevity
VirtuesVirtues
Simple theories have virtues:Simple theories have virtues: TestableTestable UnifiedUnified ExplanatoryExplanatory SymmetricalSymmetrical BoldBold Compress dataCompress data
VirtuesVirtues
Simple theories have virtues:Simple theories have virtues: TestableTestable UnifiedUnified ExplanatoryExplanatory SymmetricalSymmetrical BoldBold Compress dataCompress data
But to assume that the But to assume that the truthtruth has has these virtues is these virtues is wishful thinking.wishful thinking.[van Fraassen]
ConvergenceConvergence
At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth. to the truth.
Complexity
truth
ConvergenceConvergence
At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth. to the truth.
Complexity
truth
Plink!
Blam!
ConvergenceConvergence
At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth. to the truth.
Complexity
truth
Plink!Blam!
ConvergenceConvergence
At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth. to the truth.
Complexity
truth
Plink!
Blam!
ConvergenceConvergence
Convergence allows for Convergence allows for any theory any theory choice whatever in the short runchoice whatever in the short run, so , so this is not an argument for Ockham’s this is not an argument for Ockham’s razor razor nownow..
Alternative ranking
truth
OverfittingOverfitting Empirical estimatesEmpirical estimates based on based on
complex models have greater complex models have greater expected distance from the truthexpected distance from the truth
Truth
OverfittingOverfitting Empirical estimatesEmpirical estimates based on based on
complex models have greater complex models have greater expected distance from the truth.expected distance from the truth.
Pop!Pop!Pop!Pop!
OverfittingOverfitting Empirical estimatesEmpirical estimates based on based on
complex models have greater complex models have greater expected distance from the truth.expected distance from the truth.
clamp
Truth
OverfittingOverfitting Empirical estimatesEmpirical estimates based on based on
complex models have greater complex models have greater expected distance from the truth.expected distance from the truth.
clamp
Truth
Pop!Pop!Pop!Pop!
OverfittingOverfitting ...even if the simple theory is ...even if the simple theory is knownknown
to be false…to be false…
clamp
Four eyes!
C. CirclesC. Circles
Prior ProbabilityPrior Probability Assign high prior probability to Assign high prior probability to
simple theories.simple theories.
Simplicity is plausible now because it was yesterday.
Miracle ArgumentMiracle Argument
e e would would not be a miraclenot be a miracle given given C C;; ee would be a would be a miraclemiracle given given PP. .
P C
Miracle ArgumentMiracle Argument
e e would would not be a miraclenot be a miracle given given C C;; ee would be a would be a miraclemiracle given given PP. .
C S
’
However…However…
e e would would notnot be a miracle given be a miracle given PP((););
C S
Why not this?
The Real MiracleThe Real MiracleIgnorance Ignorance about model:about model:
pp((CC) ) pp((PP); );
++ Ignorance Ignorance about parameter setting:about parameter setting:
p’p’((PP(() | ) | PP) ) pp((PP((’ ’ ) | ) | PP).).
== Knowledge Knowledge about about CC vs. vs. PP(():):
pp((PP(()) << )) << pp((CC).).Is it knognorance or Ignoredge?
CP
The Ellsberg ParadoxThe Ellsberg Paradox
1/3 ? ?
The Ellsberg ParadoxThe Ellsberg Paradox
1/3 ? ?
Human betting Human betting preferencespreferences
>
> 1/3
The Ellsberg ParadoxThe Ellsberg Paradox
1/3 ? ?
>
Human betting Human betting preferencespreferences
>
> 1/3< 1/3
Human ViewHuman View
1/3 ? ?
>
Human betting Human betting preferencespreferences
>
knowledge ignorance
Bayesian ViewBayesian View
1/3 1/3 1/3
>
Human betting Human betting preferencespreferences
>
ignoredgeignoredge
MoralMoral
1/3 ? ?
Even in the most mundane contexts, when Bayesians offer to replace our ignorance with ignoredge, we vote with our feet.
Probable TrackingProbable Tracking1.1.If the simple theory If the simple theory SS were true, then were true, then
the data would probably be simple so the data would probably be simple so Ockham’s razor would probably Ockham’s razor would probably believe believe SS. .
2.2.If the simple theory If the simple theory SS were false, then were false, then the complex alternative theory the complex alternative theory CC would be true, so the data would would be true, so the data would probably be complex so you would probably be complex so you would probably believe probably believe CC rather than rather than SS. .
Probable TrackingProbable TrackingGiven that you use Ockham’s razor:Given that you use Ockham’s razor:
pp(B((B(SS) | ) | SS) = ) = pp((eeSS | | SS) = 1.) = 1.
pp(not-B((not-B(SS) | not-) | not-SS) = 1 - ) = 1 - pp((eeSS | | CC) = 1.) = 1.
Probable TrackingProbable TrackingGiven that you use Ockham’s razor:Given that you use Ockham’s razor:
pp(B((B(CC) | ) | CC) = 1 ) = 1
= probability that the data look = probability that the data look simple given simple given CC..
pp(B((B(CC) | not-) | not-CC) = 0 ) = 0
= probability that the data look = probability that the data look simple given alternative theory simple given alternative theory PP..
B. MagicB. Magic
Truth Simplicity
MagicMagic Simplicity Simplicity informsinforms via via hidden causeshidden causes..
G
Simple B(Simple)
Simple B(Simple)
Simple B(Simple)
Leibniz, evolution
Kant
Ouija board
MagicMagic SimplerSimpler to explain Ockham’s razor to explain Ockham’s razor
withoutwithout hidden causes. hidden causes.
Metaphysicians
for Ockham ?
Reductio of Naturalism Reductio of Naturalism (Koons 2000)(Koons 2000)
Suppose that the crucial probabilities Suppose that the crucial probabilities pp((TT | | TT) )
in the Bayesian miracle argument are natural in the Bayesian miracle argument are natural chances, so that Ockham’s razor really is chances, so that Ockham’s razor really is reliable.reliable.
Suppose that Suppose that TT is the fundamental theory of is the fundamental theory of natural chance, so that natural chance, so that TT determines the true determines the true
pp for some choice of for some choice of ..
But if But if pp((TT) is defined at all, it should be 1 if ) is defined at all, it should be 1 if = =
and 0 otherwise. and 0 otherwise. So natural science can only produce So natural science can only produce
fundamental knowledge of natural chance if fundamental knowledge of natural chance if there are non-natural chances.there are non-natural chances.
DiagnosisDiagnosis Indication or trackingIndication or tracking
Too Too strong: strong: Circles, evasions, or magic required.Circles, evasions, or magic required.
Convergence Convergence Too Too weakweak Doesn’t single out simplicityDoesn’t single out simplicity
ComplexSimple
ComplexSimple
DiagnosisDiagnosis Indication or trackingIndication or tracking
Too Too strong: strong: Circles or magic required.Circles or magic required.
Convergence Convergence Too Too weakweak Doesn’t single out simplicityDoesn’t single out simplicity
““Straightest” convergenceStraightest” convergence Just right?Just right?
Complex
Complex
Simple
Simple
ComplexSimple
II. Straightest II. Straightest ConvergenceConvergence
ComplexSimple
Empirical ProblemsEmpirical Problems
T1 T2 T3
Set Set KK of infinite of infinite input sequencesinput sequences.. Partition of Partition of KK into alternative into alternative
theoriestheories..K
Empirical MethodsEmpirical Methods
T1 T2 T3
Map finite input sequences to Map finite input sequences to theories or to “?”.theories or to “?”.
K
T3
e
Method ChoiceMethod Choice
T1 T2 T3
e1 e2 e3 e4
Input history
Output historyAt each stage, scientist can choose a new method (agreeing with past theory choices).
Aim: Aim: Converge to the Converge to the TruthTruth
T1 T2 T3
K
T3 ? T2 ? T1 T1 T1 T1 . . .T1 T1 T1
RetractionRetraction
Choosing Choosing TT and then not choosing and then not choosing TT next next
T’T’
TT
??
Aim: Aim: Eliminate Eliminate Needless Needless RetractionsRetractions
Truth
Aim: Aim: Eliminate Eliminate Needless Needless RetractionsRetractions
Truth
Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions
theory
applicationapplicationapplication
applicationcorollary
applicationtheory
applicationapplication
corollary applicationcorollary
Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions
Easy Retraction Time Easy Retraction Time ComparisonsComparisons
T1 T1 T2 T2
T1 T1 T2 T2 T3 T3T2 T4 T4
T2 T2
Method 1
Method 2
T4 T4 T4
. . .
. . .
at least as many at least as late
Worst-case Retraction Time Worst-case Retraction Time BoundsBounds
T1 T2
Output sequences
T1 T2
T1 T2
T4
T3
T3
T3
T3
T3 T3
T4
T4
T4
T4 T4
. . .
(1, 2, ∞)
. . .
. . .
. . .. . .
. . .T4
T4
T4
T1 T2 T3 T3 T3 T4T3 . . .
II. Ockham Without II. Ockham Without Circles, Evasions, Circles, Evasions,
or Magic or Magic
Curve FittingCurve Fitting
Data = open intervals around Data = open intervals around YY at rational values of at rational values of X.X.
Curve FittingCurve Fitting
No effects:No effects:
Curve FittingCurve Fitting
First-order effect:First-order effect:
Curve FittingCurve Fitting
Second-order effect:Second-order effect:
Empirical EffectsEmpirical Effects
Empirical EffectsEmpirical Effects
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical EffectsEmpirical Effects
May take arbitrarily long to discoverMay take arbitrarily long to discover
Empirical TheoriesEmpirical Theories True theory determined by which True theory determined by which
effects appear.effects appear.
Empirical ComplexityEmpirical Complexity
More complex
Background ConstraintsBackground Constraints
More complex
Background ConstraintsBackground Constraints
More complex
?
Background ConstraintsBackground Constraints
More complex
?
Ockham’s RazorOckham’s Razor Don’t select a theory unless it is Don’t select a theory unless it is
uniquely simplest in light of uniquely simplest in light of experience.experience.
Weak Ockham’s RazorWeak Ockham’s Razor Don’t select a theory unless it Don’t select a theory unless it
among the simplest in light of among the simplest in light of experience.experience.
StalwartnessStalwartness Don’t retract your answer while it Don’t retract your answer while it
is uniquely simplestis uniquely simplest
StalwartnessStalwartness Don’t retract your answer while it Don’t retract your answer while it
is uniquely simplestis uniquely simplest
Uniform ProblemsUniform Problems All paths of accumulating effects All paths of accumulating effects
starting at a level have the same length.starting at a level have the same length.
Timed Retraction BoundsTimed Retraction Bounds
rr((M, e, nM, e, n)) = = the least timed retraction the least timed retraction bound covering the total timed bound covering the total timed retractionsretractions of of M M along input streams along input streams of complexityof complexity n n that extendthat extend e e
Empirical Complexity 0 1 2 3 . . .
. . .
M
EfficiencyEfficiency of Method of Method M M at at ee
MM convergesconverges to the truth no matter to the truth no matter what;what;
For each convergent For each convergent M’M’ that agrees that agrees with with MM up to the end of up to the end of ee, and for , and for each each nn:: rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn))
Empirical Complexity 0 1 2 3 . . .
. . .
M M’
MM is is Strongly Beaten Strongly Beaten at at ee
There exists convergent There exists convergent M’M’ that that agrees with agrees with MM up to the end of up to the end of ee, , such thatsuch that For each For each nn, , rr((MM, , ee, , nn) ) >> rr((M’M’, , ee, , nn).).
Empirical Complexity 0 1 2 3 . . .
. . .
M M’
MM is is Weakly Beaten Weakly Beaten at at ee
There exists convergent There exists convergent M’M’ that that agrees with agrees with MM up to the end of up to the end of ee, , such thatsuch that For each For each nn, , rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn);); Exists Exists nn, , rr((MM, , ee, , nn) ) > > rr((M’M’, , ee, , nn).).
Empirical Complexity 0 1 2 3 . . .
. . .
M M’
IdeaIdea No matter what convergent No matter what convergent MM has done has done
in the past, nature can in the past, nature can forceforce MM to to produce each answer down an arbitrary produce each answer down an arbitrary effect path, arbitrarily often.effect path, arbitrarily often.
Nature can also Nature can also forceforce violators of violators of Ockham’s razor or stalwartness either Ockham’s razor or stalwartness either into an into an extraextra retraction or a retraction or a latelate retraction in retraction in eacheach complexity class. complexity class.
Ockham Violation with Ockham Violation with RetractionRetraction
Ockham violation
Extra retraction in eachcomplexity class
Ockham Violation without Ockham Violation without RetractionRetraction
Ockham violationLate retraction in eachcomplexity class
UniformUniform Ockham Efficiency Ockham Efficiency TheoremTheorem
Let Let MM be a be a solution solution to ato a uniform uniform problem. The following are problem. The following are equivalent:equivalent: MM is is strongly Ockhamstrongly Ockham and and stalwartstalwart
at at ee;; MM is is efficientefficient at at ee;; MM is is notnot strongly beatenstrongly beaten at at ee..
IdeaIdea
Similar, but if convergent Similar, but if convergent MM already violates strong Ockham’s already violates strong Ockham’s razor by favoring an answer razor by favoring an answer TT at at the root of a longer path, the root of a longer path, sticking with sticking with TT may reduce may reduce retractions in complexity classes retractions in complexity classes reached only along the longer reached only along the longer path. path.
Violation Favoring Shorter Violation Favoring Shorter PathPath
Ockham violation
?
Late or extra retraction ineach complexity class
Non-uniform problem
Violation Favoring Longer Violation Favoring Longer Path without RetractionPath without Retraction
Ockham violation
?
Ouch!Extra retraction in eachcomplexity class!
Non-uniform problem
But at But at FirstFirst Violation… Violation…
First Ockham violation
??
? Breaks even each class.
Non-uniform problem
But at But at FirstFirst Violation… Violation…
First Ockham violation
??
? Breaks even each class.
Loses in class 0 whentruth is red.
Non-uniform problem
Ockham Efficiency Ockham Efficiency TheoremTheorem
Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and
stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever weaklyweakly beaten. beaten.
Application: Causal Application: Causal InferenceInference Causal graph theoryCausal graph theory: more correlations : more correlations
more causes.more causes.
Idealized dataIdealized data = list of conditional = list of conditional dependencies discovered so far.dependencies discovered so far.
AnomalyAnomaly = the addition of a conditional = the addition of a conditional dependency to the list.dependency to the list.
partial correlations
S G(S)
Causal Path RuleCausal Path Rule XX, , YY are are dependent dependent conditional on conditional on
set set SS of variables not containing of variables not containing XX, , YY iff iff XX, , YY are connected by at least are connected by at least one path in which: one path in which: no non-collider is in no non-collider is in SS and and each collider has a descendent in each collider has a descendent in SS. .
S
X Y
[Pearl, SGS]
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
X dep Y | {Z}, {W}, {Z,W}
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {Y,W}
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {Z}, {X,Z}
Forcible Sequence of Forcible Sequence of ModelsModels
X Y Z W
X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {X}, {Z}, {X,Z}
Policy PredictionPolicy Prediction
Y Z
Y Z
Y Z
Y Z
Consistent policy Consistent policy estimatorestimator can be can be forcedforced into retractions. into retractions.
““Failure of Failure of uniform uniform consistencyconsistency”.”.
No non-trivial No non-trivial confidence confidence intervalinterval..[Robins, Wasserman, Zhang][Robins, Wasserman, Zhang]
MoralMoral
Not Not true model true model vs.vs. prediction prediction.. Issue: Issue: actual actual vs.vs.
counterfactual model counterfactual model selectionselection and and predictionprediction..
In counterfactual prediction, In counterfactual prediction, formform of model matters and of model matters and retractions are unavoidableretractions are unavoidable..
Y Z
Y Z
Y Z
Y Z
IV. SimplicityIV. Simplicity
AimAim
General General definition definition ofof simplicitysimplicity..
ProveProve Ockham efficiency Ockham efficiency theoremtheorem for general for general definition.definition.
ApproachApproach
Empirical complexityEmpirical complexity reflects reflects nested problems of inductionnested problems of induction posed by the problem.posed by the problem.
Hence, simplicity is Hence, simplicity is problem-problem-relativerelative..
Empirical ProblemsEmpirical Problems
T1 T2 T3
Set Set KK of infinite of infinite input sequencesinput sequences.. Partition of Partition of KK into alternative into alternative
theoriestheories..
K
Grove SystemsGrove Systems A A sphere systemsphere system for for KK is just a is just a
downward-nested sequence of downward-nested sequence of subsets of subsets of KK starting with starting with K.K.
0
1
2
K
Grove SystemsGrove Systems Think of successive differences as Think of successive differences as
levels of increasing levels of increasing empirical empirical complexitycomplexity in in KK..
0
1
2
Answer-preservingAnswer-preserving Grove Grove SystemsSystems
No answer is No answer is splitsplit across levels. across levels.
0
1
2
Answer-preservingAnswer-preserving Grove Grove SystemsSystems
Refine offending answer if Refine offending answer if necessary.necessary.
0
1
2
Data-drivenData-driven Grove Grove SystemsSystems
Each answer is decidable given a Each answer is decidable given a complexity level.complexity level.
Each upward union of levels is Each upward union of levels is verifiable.verifiable.
Verifiable
0
1
2
DecidableDecidable
0
1
2
Grove System UpdateGrove System Update Update by restriction.Update by restriction.
Grove System UpdateGrove System Update Update by restrictionUpdate by restriction
0
1
ForcibleForcible Grove Systems Grove Systems At each stage, the data presented by a At each stage, the data presented by a
world at a level are compatible with the world at a level are compatible with the next level up (if there is a next level).next level up (if there is a next level).
. . .
Forcible PathForcible Path A forcible restriction of a Grove A forcible restriction of a Grove
system.system.
0
1
2
Forcible Path to TopForcible Path to Top A forcible restriction of a Grove A forcible restriction of a Grove
system that intersects with every system that intersects with every level.level.
0
1
2
Simplicity ConceptSimplicity Concept A data-driven, answer-preserving Grove A data-driven, answer-preserving Grove
system for which each restriction to a system for which each restriction to a possible data event has a forcible path to the possible data event has a forcible path to the top. top.
0
1
2
Uniform Simplicity Uniform Simplicity ConceptsConcepts
If a data event intersects a level, it If a data event intersects a level, it intersects each higher level.intersects each higher level.
0
1
2
UniformUniform Ockham Efficiency Ockham Efficiency TheoremTheorem
Let Let MM be a be a solution solution to ato a uniform uniform problem. The following are problem. The following are equivalent:equivalent: MM is is strongly Ockhamstrongly Ockham and and stalwartstalwart
at at ee;; MM is is efficientefficient at at ee;; MM is is strongly beatenstrongly beaten at at ee..
Ockham Efficiency Ockham Efficiency TheoremTheorem
Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and
stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever weaklyweakly beaten. beaten.
V. Stochastic V. Stochastic OckhamOckham
Mixed StrategiesMixed Strategies Require that the strategy Require that the strategy converge in converge in
chancechance to the true model. to the true model.
Sample size
Chance of producing true model at parameter
. . .
Retractions in ChanceRetractions in Chance
Total drop in chanceTotal drop in chance of of producing an arbitrary answer as producing an arbitrary answer as sample size increases.sample size increases.
Retraction in Retraction in signalsignal, not actual , not actual retractions due to retractions due to noisenoise..
Sample size
Chance of producing true model at parameter
. . .
Ockham EfficiencyOckham Efficiency
Bound retractions in chance by easy Bound retractions in chance by easy comparisons of time and magnitude.comparisons of time and magnitude.
Ockham efficiency still follows.Ockham efficiency still follows.
Sample size
Chance of producing true model at parameter
. . .
(0, 0, .5, 0, 0, 0, .5, 0, 0, …)
Classification ProblemsClassification Problems
Points from plane sampled IID, labeled with Points from plane sampled IID, labeled with half-plane membership. half-plane membership. Edge of half-plane Edge of half-plane is some polynomialis some polynomial. What is its degree?. What is its degree?
Uniform Ockham efficiency theoremUniform Ockham efficiency theorem applies.applies.
[Cosma Shalizi]
Model Selection Model Selection ProblemsProblems
Random variables.Random variables. IID sampling.IID sampling. Joint distribution continuously Joint distribution continuously
parametrized.parametrized. Partition over parameter space.Partition over parameter space. Each partition cell is a “model”.Each partition cell is a “model”. Method maps sample sequences to models.Method maps sample sequences to models.
Two Dimensional Two Dimensional ExampleExample
Assume:Assume: independent bivariate normal independent bivariate normal distribution of unit variance.distribution of unit variance.
Question:Question: how manyhow many components of the components of the joint mean are zero?joint mean are zero?
Intuition: Intuition: more nonzeros = more more nonzeros = more complexcomplex
Puzzle:Puzzle: How does it help to favor How does it help to favor simplicity in less-than-simplest worlds?simplicity in less-than-simplest worlds?
A Standard Model Selection A Standard Model Selection MethodMethod
Bayes Information Criterion (BIC)Bayes Information Criterion (BIC) BIC(BIC(MM, sample) = , sample) =
- log(max prob that - log(max prob that MM can assign to can assign to sample) + sample) +
+ log(sample size) + log(sample size) model complexity model complexity ½. ½.
BIC method: choose BIC method: choose MM with least with least BIC score.BIC score.
Official BIC PropertyOfficial BIC Property
In the limit, minimizing BIC finds a In the limit, minimizing BIC finds a model with maximal conditional model with maximal conditional probability when the prior probability probability when the prior probability is flat over models and fairly flat over is flat over models and fairly flat over parameters within a model. parameters within a model.
But it is also mind-change-efficient.But it is also mind-change-efficient.
Toy ProblemToy Problem Truth is bivariate normal of known covariance.Truth is bivariate normal of known covariance. Count non-zero components of mean vector.Count non-zero components of mean vector.
Pure MethodPure Method Acceptance zones for different Acceptance zones for different
answers in sample mean space.answers in sample mean space.
ComplexComplex
SimpleSimple
Performance in Simplest Performance in Simplest WorldWorld
nn = 2 = 2 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-2 -1 0 1 2 3
-2
-1
0
1
2
3
95%
Performance in Simplest Performance in Simplest WorldWorld
nn = 2 = 2 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-2 -1 0 1 2 3
-2
-1
0
1
2
3
Performance in Simplest Performance in Simplest WorldWorld
nn = 100 = 100 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Performance in Simplest Performance in Simplest WorldWorld
nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.0075-0.005-0.0025 0 0.0025 0.005 0.0075
-0.0075
-0.005
-0.0025
0
0.0025
0.005
0.0075
Performance in Simplest Performance in Simplest WorldWorld
nn = 20,000,000 = 20,000,000 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.006 -0.004 -0.002 0 0.002 0.004 0.006
-0.006
-0.004
-0.002
0
0.002
0.004
0.006
Performance in Complex Performance in Complex WorldWorld
nn = 2 = 2 = (.05, .005).= (.05, .005).
Retractions Retractions = 0= 0
-2 -1 0 1 2 3
-2
-1
0
1
2
3
ComplexComplex
SimpleSimple
95%
Performance in Complex Performance in Complex WorldWorld
nn = 100 = 100 = (.05, .005).= (.05, .005).
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
Performance in Complex Performance in Complex WorldWorld
nn = 30,000 = 30,000 = (.05, .005).= (.05, .005).
-0.1 -0.05 0 0.05 0.1
-0.1
-0.05
0
0.05
0.1 Retractions Retractions
= 1= 1
ComplexComplex
SimpleSimple
Performance in Complex Performance in Complex WorldWorld
nn = 4,000,000 (!) = 4,000,000 (!) = (.05, .005).= (.05, .005).
-0.04 -0.02 0 0.02 0.04
-0.04
-0.02
0
0.02
0.04 Retractions Retractions = 2= 2
ComplexComplex
SimpleSimple
Causal Inference from Causal Inference from Stochastic DataStochastic Data
Suppose that the true linear causal Suppose that the true linear causal model is:model is:
X Y Z W
.998
.99 -.99 .1
Variables are standard normal
Causal Inference from Causal Inference from Stochastic DataStochastic Data
[Scheines, Mayo-Wilson, and Fancsali]
X Y Z W
Sample size 40. In 9 out of 10 samples, PC algorithm outputs:
Sample size 100,000. In 9 out of 10 samples, PC outputs truth:
X Y Z W
Variables standard normal
Deterministic Sub-Deterministic Sub-problemsproblems
Membership degree = 0
Membership Degree = 1w
n
Worst-case cost at Worst-case cost at ww = =
supsupw’ w’ mem(mem(ww, , w’w’) ) XX cost( cost(w’w’))
Worst-case cost = supWorst-case cost = supw w worst-case cost at worst-case cost at w.w.
Statistical Sub-problemsStatistical Sub-problems
p p’p’
Membership(p, p’) = 1 – (p, p’)
Worst-case cost at Worst-case cost at pp = =
supsupw’w’ mem(mem(pp, , p’p’) ) XX cost( cost(pp))
Worst-case cost = supWorst-case cost = supp p worst-case cost at worst-case cost at p.p.
Future DirectionFuture Direction
Consistency:Consistency: Converge to production of Converge to production of true answer with chance > 1 - true answer with chance > 1 - ..
Compare worst-case timed bounds on Compare worst-case timed bounds on retractions in chance of retractions in chance of -consistent -consistent methods over each complexity class.methods over each complexity class. Generalized powerGeneralized power: minimizing retraction : minimizing retraction
time forces simple acceptance zones to be time forces simple acceptance zones to be powerful.powerful.
Generalized significanceGeneralized significance: minimizing : minimizing retractions forces simple zone to be size retractions forces simple zone to be size
BalanceBalance balance depends on balance depends on ..
V. Conclusion:V. Conclusion:
Ockham’s RazorOckham’s Razor
NecessaryNecessary for staying on the for staying on the straightest pathstraightest path to the to the truthtruth
Does not Does not point atpoint at or or indicateindicate the the truth.truth.
Works without Works without circlescircles, , evasionsevasions, or , or magicmagic..
Such a theory is motivated in Such a theory is motivated in counterfactual inferencecounterfactual inference and and estimationestimation..
(with C. Glymour) (with C. Glymour) “Why Probability Does Not “Why Probability Does Not Capture the Logic of Scientific Justification”,Capture the Logic of Scientific Justification”, C. C. Hitchcock, ed., Hitchcock, ed., Contemporary Debates in the Philosophy of ScienceContemporary Debates in the Philosophy of Science, , Oxford: Blackwell, 2004.Oxford: Blackwell, 2004.
““Justification as Truth-finding Efficiency: How Justification as Truth-finding Efficiency: How Ockham's Razor Works”,Ockham's Razor Works”, Minds and MachinesMinds and Machines 14: 2004, 14: 2004, pp. 485-505.pp. 485-505.
““Ockham's Razor, Efficiency, and the Unending Ockham's Razor, Efficiency, and the Unending Game of Science”,Game of Science”, forthcoming in proceedings, forthcoming in proceedings, Foundations Foundations of the Formal Sciences 2004: Infinite Game Theoryof the Formal Sciences 2004: Infinite Game Theory, Springer, under , Springer, under review.review.
““How Simplicity Helps You Find the Truth How Simplicity Helps You Find the Truth Without PointingWithout Pointingat it”,at it”, forthcoming, V. Harazinov, M. Friend, and N. Goethe, eds.forthcoming, V. Harazinov, M. Friend, and N. Goethe, eds.Philosophy of Mathematics and InductionPhilosophy of Mathematics and Induction, Dordrecht: Springer., Dordrecht: Springer.
““Ockham's Razor, Empirical Complexity, and Ockham's Razor, Empirical Complexity, and Truth-findingTruth-findingEfficiency”,Efficiency”, forthcomingforthcoming, Theoretical Computer Science, Theoretical Computer Science..
““Learning, Simplicity, Truth, and Learning, Simplicity, Truth, and Misinformation”,Misinformation”, forthcoming inVan Benthem, J. and forthcoming inVan Benthem, J. and Adriaans, P., eds. Philosophy of Information.Adriaans, P., eds. Philosophy of Information.
Further ReadingFurther Reading
II. Navigation II. Navigation Without a Without a Compass Compass
Asking for DirectionsAsking for Directions
Where’s …Where’s …
Asking for DirectionsAsking for DirectionsTurn around. The Turn around. The freeway ramp is freeway ramp is on the left.on the left.
Asking for DirectionsAsking for Directions
Goal
Helpful AdviceHelpful Advice
Goal
Best RouteBest Route
Goal
Best Route to Any GoalBest Route to Any Goal
Disregarding Advice is Disregarding Advice is BadBad
Extra U-turn
Best Route to Any GoalBest Route to Any Goal
…so fixed advice can help you reach a hidden goalwithout circles, evasions, or magic.
There is no difference whatsoever in It. He goes from death to There is no difference whatsoever in It. He goes from death to
death, who sees difference, as it were, in Itdeath, who sees difference, as it were, in It [Brihadaranyaka [Brihadaranyaka 4.4.19-20] 4.4.19-20]
"Living in the midst of ignorance and considering themselves "Living in the midst of ignorance and considering themselves intelligent and enlightened, the senseless people go round and intelligent and enlightened, the senseless people go round and round, following crooked courses, just like the blind led by the round, following crooked courses, just like the blind led by the
blind." Katha Upanishad I. ii. 5.blind." Katha Upanishad I. ii. 5.
AcademicAcademic
AcademicAcademic
Poof!If there weren’t an apple on the tableI wouldn’t be a brain in a vat, so I wouldn’t see one.