truth-conduciveness without reliability: a non-theological explanation of ockham’s razor kevin t....

Truth-conduciveness Truth-conduciveness Without Reliability: Without Reliability: A Non-Theological A Non-Theological

Explanation of Ockham’s Explanation of Ockham’s RazorRazor

Kevin T. KellyKevin T. KellyDepartment of PhilosophyDepartment of Philosophy

Carnegie Mellon UniversityCarnegie Mellon Universitywww.cmu.eduwww.cmu.edu

I. The Puzzle I. The Puzzle

Which Theory is True?Which Theory is True?

???

Ockham Says:Ockham Says:

Choose theSimplest!

But Why?But Why?

Gotcha!

PuzzlePuzzle

An indicator must be An indicator must be sensitivesensitive to to what it indicates.what it indicates.

simple

PuzzlePuzzle

An indicator must be An indicator must be sensitivesensitive to to what it indicates.what it indicates.

complex

PuzzlePuzzle

But Ockham’s razor always points But Ockham’s razor always points at simplicity.at simplicity.

simple

PuzzlePuzzle

But Ockham’s razor always points But Ockham’s razor always points at simplicity.at simplicity.

complex

PuzzlePuzzle

If a broken compass is known to If a broken compass is known to point North, then we already know point North, then we already know where North is. where North is.

complex

PuzzlePuzzle

But then who needs the compass?But then who needs the compass?

complex

Proposed AnswersProposed Answers

1.1. EvasiveEvasive

2.2. CircularCircular

3.3. MagicalMagical

A. EvasionsA. Evasions

Truth

A. EvasionsA. Evasions

Truth

TestabilityExplanation

Unity

Brevity

VirtuesVirtues

Simple theories have virtues:Simple theories have virtues: TestableTestable UnifiedUnified ExplanatoryExplanatory SymmetricalSymmetrical BoldBold Compress dataCompress data

VirtuesVirtues

Simple theories have virtues:Simple theories have virtues: TestableTestable UnifiedUnified ExplanatoryExplanatory SymmetricalSymmetrical BoldBold Compress dataCompress data

But to assume that the But to assume that the truthtruth has has these virtues is these virtues is wishful thinking.wishful thinking.[van Fraassen]

ConvergenceConvergence

At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth. to the truth.

Complexity

truth



Complexity

truth

Plink!

Blam!



Complexity

truth

Plink!Blam!



Complexity

truth

Plink!

Blam!


Convergence allows for Convergence allows for any theory any theory choice whatever in the short runchoice whatever in the short run, so , so this is not an argument for Ockham’s this is not an argument for Ockham’s razor razor nownow..

Alternative ranking

truth

OverfittingOverfitting Empirical estimatesEmpirical estimates based on based on

complex models have greater complex models have greater expected distance from the truthexpected distance from the truth

Truth


complex models have greater complex models have greater expected distance from the truth.expected distance from the truth.

Pop!Pop!Pop!Pop!



clamp

Truth



clamp

Truth

Pop!Pop!Pop!Pop!

OverfittingOverfitting ...even if the simple theory is ...even if the simple theory is knownknown

to be false…to be false…

clamp

Four eyes!

C. CirclesC. Circles

Prior ProbabilityPrior Probability Assign high prior probability to Assign high prior probability to

simple theories.simple theories.

Simplicity is plausible now because it was yesterday.

Miracle ArgumentMiracle Argument

e e would would not be a miraclenot be a miracle given given C C;; ee would be a would be a miraclemiracle given given PP. .

P C

Miracle ArgumentMiracle Argument

e e would would not be a miraclenot be a miracle given given C C;; ee would be a would be a miraclemiracle given given PP. .

C S

’

However…However…

e e would would notnot be a miracle given be a miracle given PP((););

C S

Why not this?

The Real MiracleThe Real MiracleIgnorance Ignorance about model:about model:

pp((CC) ) pp((PP); );

++ Ignorance Ignorance about parameter setting:about parameter setting:

p’p’((PP(() | ) | PP) ) pp((PP((’ ’ ) | ) | PP).).

== Knowledge Knowledge about about CC vs. vs. PP(():):

pp((PP(()) << )) << pp((CC).).Is it knognorance or Ignoredge?

CP

The Ellsberg ParadoxThe Ellsberg Paradox

1/3 ? ?


1/3 ? ?

Human betting Human betting preferencespreferences

>

> 1/3


1/3 ? ?

>


>

> 1/3< 1/3

Human ViewHuman View

1/3 ? ?

>


>

knowledge ignorance

Bayesian ViewBayesian View

1/3 1/3 1/3

>


>

ignoredgeignoredge

MoralMoral

1/3 ? ?

Even in the most mundane contexts, when Bayesians offer to replace our ignorance with ignoredge, we vote with our feet.

Probable TrackingProbable Tracking1.1.If the simple theory If the simple theory SS were true, then were true, then

the data would probably be simple so the data would probably be simple so Ockham’s razor would probably Ockham’s razor would probably believe believe SS. .

2.2.If the simple theory If the simple theory SS were false, then were false, then the complex alternative theory the complex alternative theory CC would be true, so the data would would be true, so the data would probably be complex so you would probably be complex so you would probably believe probably believe CC rather than rather than SS. .

Probable TrackingProbable TrackingGiven that you use Ockham’s razor:Given that you use Ockham’s razor:

pp(B((B(SS) | ) | SS) = ) = pp((eeSS | | SS) = 1.) = 1.

pp(not-B((not-B(SS) | not-) | not-SS) = 1 - ) = 1 - pp((eeSS | | CC) = 1.) = 1.

Probable TrackingProbable TrackingGiven that you use Ockham’s razor:Given that you use Ockham’s razor:

pp(B((B(CC) | ) | CC) = 1 ) = 1

= probability that the data look = probability that the data look simple given simple given CC..

pp(B((B(CC) | not-) | not-CC) = 0 ) = 0

= probability that the data look = probability that the data look simple given alternative theory simple given alternative theory PP..

B. MagicB. Magic

Truth Simplicity

MagicMagic Simplicity Simplicity informsinforms via via hidden causeshidden causes..

G

Simple B(Simple)

Simple B(Simple)

Simple B(Simple)

Leibniz, evolution

Kant

Ouija board

MagicMagic SimplerSimpler to explain Ockham’s razor to explain Ockham’s razor

withoutwithout hidden causes. hidden causes.

Metaphysicians

for Ockham ?

Reductio of Naturalism Reductio of Naturalism (Koons 2000)(Koons 2000)

Suppose that the crucial probabilities Suppose that the crucial probabilities pp((TT | | TT) )

in the Bayesian miracle argument are natural in the Bayesian miracle argument are natural chances, so that Ockham’s razor really is chances, so that Ockham’s razor really is reliable.reliable.

Suppose that Suppose that TT is the fundamental theory of is the fundamental theory of natural chance, so that natural chance, so that TT determines the true determines the true

pp for some choice of for some choice of ..

But if But if pp((TT) is defined at all, it should be 1 if ) is defined at all, it should be 1 if = =

and 0 otherwise. and 0 otherwise. So natural science can only produce So natural science can only produce

fundamental knowledge of natural chance if fundamental knowledge of natural chance if there are non-natural chances.there are non-natural chances.

DiagnosisDiagnosis Indication or trackingIndication or tracking

Too Too strong: strong: Circles, evasions, or magic required.Circles, evasions, or magic required.

Convergence Convergence Too Too weakweak Doesn’t single out simplicityDoesn’t single out simplicity

ComplexSimple

ComplexSimple

DiagnosisDiagnosis Indication or trackingIndication or tracking

Too Too strong: strong: Circles or magic required.Circles or magic required.

Convergence Convergence Too Too weakweak Doesn’t single out simplicityDoesn’t single out simplicity

““Straightest” convergenceStraightest” convergence Just right?Just right?

Complex

Complex

Simple

Simple

ComplexSimple

II. Straightest II. Straightest ConvergenceConvergence

ComplexSimple

Empirical ProblemsEmpirical Problems

T1 T2 T3

Set Set KK of infinite of infinite input sequencesinput sequences.. Partition of Partition of KK into alternative into alternative

theoriestheories..K

Empirical MethodsEmpirical Methods

T1 T2 T3

Map finite input sequences to Map finite input sequences to theories or to “?”.theories or to “?”.

K

T3

e

Method ChoiceMethod Choice

T1 T2 T3

e1 e2 e3 e4

Input history

Output historyAt each stage, scientist can choose a new method (agreeing with past theory choices).

Aim: Aim: Converge to the Converge to the TruthTruth

T1 T2 T3

K

T3 ? T2 ? T1 T1 T1 T1 . . .T1 T1 T1

RetractionRetraction

Choosing Choosing TT and then not choosing and then not choosing TT next next

T’T’

TT

??

Aim: Aim: Eliminate Eliminate Needless Needless RetractionsRetractions

Truth

Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions

theory

applicationapplicationapplication

applicationcorollary

applicationtheory

applicationapplication

corollary applicationcorollary

Aim: Eliminate Aim: Eliminate Needless Needless DelaysDelays to Retractions to Retractions

Easy Retraction Time Easy Retraction Time ComparisonsComparisons

T1 T1 T2 T2

T1 T1 T2 T2 T3 T3T2 T4 T4

T2 T2

Method 1

Method 2

T4 T4 T4

. . .

. . .

at least as many at least as late

Worst-case Retraction Time Worst-case Retraction Time BoundsBounds

T1 T2

Output sequences

T1 T2

T1 T2

T4

T3

T3

T3

T3

T3 T3

T4

T4

T4

T4 T4

. . .

(1, 2, ∞)

. . .

. . .

. . .. . .

. . .T4

T4

T4

T1 T2 T3 T3 T3 T4T3 . . .

II. Ockham Without II. Ockham Without Circles, Evasions, Circles, Evasions,

or Magic or Magic

Curve FittingCurve Fitting

Data = open intervals around Data = open intervals around YY at rational values of at rational values of X.X.


No effects:No effects:


First-order effect:First-order effect:


Second-order effect:Second-order effect:

Empirical EffectsEmpirical Effects

Empirical EffectsEmpirical Effects

May take arbitrarily long to discoverMay take arbitrarily long to discover

Empirical TheoriesEmpirical Theories True theory determined by which True theory determined by which

effects appear.effects appear.

Empirical ComplexityEmpirical Complexity

More complex

Background ConstraintsBackground Constraints

More complex

Background ConstraintsBackground Constraints

More complex

?

Ockham’s RazorOckham’s Razor Don’t select a theory unless it is Don’t select a theory unless it is

uniquely simplest in light of uniquely simplest in light of experience.experience.

Weak Ockham’s RazorWeak Ockham’s Razor Don’t select a theory unless it Don’t select a theory unless it

among the simplest in light of among the simplest in light of experience.experience.

StalwartnessStalwartness Don’t retract your answer while it Don’t retract your answer while it

is uniquely simplestis uniquely simplest

Uniform ProblemsUniform Problems All paths of accumulating effects All paths of accumulating effects

starting at a level have the same length.starting at a level have the same length.

Timed Retraction BoundsTimed Retraction Bounds

rr((M, e, nM, e, n)) = = the least timed retraction the least timed retraction bound covering the total timed bound covering the total timed retractionsretractions of of M M along input streams along input streams of complexityof complexity n n that extendthat extend e e

Empirical Complexity 0 1 2 3 . . .

. . .

M

EfficiencyEfficiency of Method of Method M M at at ee

MM convergesconverges to the truth no matter to the truth no matter what;what;

For each convergent For each convergent M’M’ that agrees that agrees with with MM up to the end of up to the end of ee, and for , and for each each nn:: rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn))


. . .

M M’

MM is is Strongly Beaten Strongly Beaten at at ee

There exists convergent There exists convergent M’M’ that that agrees with agrees with MM up to the end of up to the end of ee, , such thatsuch that For each For each nn, , rr((MM, , ee, , nn) ) >> rr((M’M’, , ee, , nn).).


. . .

M M’

MM is is Weakly Beaten Weakly Beaten at at ee

There exists convergent There exists convergent M’M’ that that agrees with agrees with MM up to the end of up to the end of ee, , such thatsuch that For each For each nn, , rr((MM, , ee, , nn) ) rr((M’M’, , ee, , nn);); Exists Exists nn, , rr((MM, , ee, , nn) ) > > rr((M’M’, , ee, , nn).).


. . .

M M’

IdeaIdea No matter what convergent No matter what convergent MM has done has done

in the past, nature can in the past, nature can forceforce MM to to produce each answer down an arbitrary produce each answer down an arbitrary effect path, arbitrarily often.effect path, arbitrarily often.

Nature can also Nature can also forceforce violators of violators of Ockham’s razor or stalwartness either Ockham’s razor or stalwartness either into an into an extraextra retraction or a retraction or a latelate retraction in retraction in eacheach complexity class. complexity class.

Ockham Violation with Ockham Violation with RetractionRetraction

Ockham violation

Extra retraction in eachcomplexity class

Ockham Violation without Ockham Violation without RetractionRetraction

Ockham violationLate retraction in eachcomplexity class

UniformUniform Ockham Efficiency Ockham Efficiency TheoremTheorem

Let Let MM be a be a solution solution to ato a uniform uniform problem. The following are problem. The following are equivalent:equivalent: MM is is strongly Ockhamstrongly Ockham and and stalwartstalwart

at at ee;; MM is is efficientefficient at at ee;; MM is is notnot strongly beatenstrongly beaten at at ee..

IdeaIdea

Similar, but if convergent Similar, but if convergent MM already violates strong Ockham’s already violates strong Ockham’s razor by favoring an answer razor by favoring an answer TT at at the root of a longer path, the root of a longer path, sticking with sticking with TT may reduce may reduce retractions in complexity classes retractions in complexity classes reached only along the longer reached only along the longer path. path.

Violation Favoring Shorter Violation Favoring Shorter PathPath

Ockham violation

?

Late or extra retraction ineach complexity class

Non-uniform problem

Violation Favoring Longer Violation Favoring Longer Path without RetractionPath without Retraction

Ockham violation

?

Ouch!Extra retraction in eachcomplexity class!

Non-uniform problem

But at But at FirstFirst Violation… Violation…

First Ockham violation

??

? Breaks even each class.

Non-uniform problem

But at But at FirstFirst Violation… Violation…

First Ockham violation

??

? Breaks even each class.

Loses in class 0 whentruth is red.

Non-uniform problem

Ockham Efficiency Ockham Efficiency TheoremTheorem

Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and

stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever weaklyweakly beaten. beaten.

Application: Causal Application: Causal InferenceInference Causal graph theoryCausal graph theory: more correlations : more correlations

more causes.more causes.

Idealized dataIdealized data = list of conditional = list of conditional dependencies discovered so far.dependencies discovered so far.

AnomalyAnomaly = the addition of a conditional = the addition of a conditional dependency to the list.dependency to the list.

partial correlations

S G(S)

Causal Path RuleCausal Path Rule XX, , YY are are dependent dependent conditional on conditional on

set set SS of variables not containing of variables not containing XX, , YY iff iff XX, , YY are connected by at least are connected by at least one path in which: one path in which: no non-collider is in no non-collider is in SS and and each collider has a descendent in each collider has a descendent in SS. .

S

X Y

[Pearl, SGS]

Forcible Sequence of Forcible Sequence of ModelsModels

X Y Z W


X Y Z W

X dep Y | {Z}, {W}, {Z,W}


X Y Z W

X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {Y,W}


X Y Z W

X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}


X Y Z W

X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {Z}, {X,Z}


X Y Z W

X dep Y | {Z}, {W}, {Z,W}Y dep Z | {X}, {W}, {X,W}X dep Z | {Y}, {W}, {Y,W}Z dep W| {X}, {Y}, {X,Y}Y dep W| {X}, {Z}, {X,Z}

Policy PredictionPolicy Prediction

Y Z

Y Z

Y Z

Y Z

Consistent policy Consistent policy estimatorestimator can be can be forcedforced into retractions. into retractions.

““Failure of Failure of uniform uniform consistencyconsistency”.”.

No non-trivial No non-trivial confidence confidence intervalinterval..[Robins, Wasserman, Zhang][Robins, Wasserman, Zhang]

MoralMoral

Not Not true model true model vs.vs. prediction prediction.. Issue: Issue: actual actual vs.vs.

counterfactual model counterfactual model selectionselection and and predictionprediction..

In counterfactual prediction, In counterfactual prediction, formform of model matters and of model matters and retractions are unavoidableretractions are unavoidable..

Y Z

Y Z

Y Z

Y Z

IV. SimplicityIV. Simplicity

AimAim

General General definition definition ofof simplicitysimplicity..

ProveProve Ockham efficiency Ockham efficiency theoremtheorem for general for general definition.definition.

ApproachApproach

Empirical complexityEmpirical complexity reflects reflects nested problems of inductionnested problems of induction posed by the problem.posed by the problem.

Hence, simplicity is Hence, simplicity is problem-problem-relativerelative..

Empirical ProblemsEmpirical Problems

T1 T2 T3

Set Set KK of infinite of infinite input sequencesinput sequences.. Partition of Partition of KK into alternative into alternative

theoriestheories..

K

Grove SystemsGrove Systems A A sphere systemsphere system for for KK is just a is just a

downward-nested sequence of downward-nested sequence of subsets of subsets of KK starting with starting with K.K.

0

1

2

K

Grove SystemsGrove Systems Think of successive differences as Think of successive differences as

levels of increasing levels of increasing empirical empirical complexitycomplexity in in KK..

0

1

2

Answer-preservingAnswer-preserving Grove Grove SystemsSystems

No answer is No answer is splitsplit across levels. across levels.

0

1

2

Answer-preservingAnswer-preserving Grove Grove SystemsSystems

Refine offending answer if Refine offending answer if necessary.necessary.

0

1

2

Data-drivenData-driven Grove Grove SystemsSystems

Each answer is decidable given a Each answer is decidable given a complexity level.complexity level.

Each upward union of levels is Each upward union of levels is verifiable.verifiable.

Verifiable

0

1

2

DecidableDecidable

0

1

2

Grove System UpdateGrove System Update Update by restriction.Update by restriction.

Grove System UpdateGrove System Update Update by restrictionUpdate by restriction

0

1

ForcibleForcible Grove Systems Grove Systems At each stage, the data presented by a At each stage, the data presented by a

world at a level are compatible with the world at a level are compatible with the next level up (if there is a next level).next level up (if there is a next level).

. . .

Forcible PathForcible Path A forcible restriction of a Grove A forcible restriction of a Grove

system.system.

0

1

2

Forcible Path to TopForcible Path to Top A forcible restriction of a Grove A forcible restriction of a Grove

system that intersects with every system that intersects with every level.level.

0

1

2

Simplicity ConceptSimplicity Concept A data-driven, answer-preserving Grove A data-driven, answer-preserving Grove

system for which each restriction to a system for which each restriction to a possible data event has a forcible path to the possible data event has a forcible path to the top. top.

0

1

2

Uniform Simplicity Uniform Simplicity ConceptsConcepts

If a data event intersects a level, it If a data event intersects a level, it intersects each higher level.intersects each higher level.

0

1

2

UniformUniform Ockham Efficiency Ockham Efficiency TheoremTheorem

Let Let MM be a be a solution solution to ato a uniform uniform problem. The following are problem. The following are equivalent:equivalent: MM is is strongly Ockhamstrongly Ockham and and stalwartstalwart

at at ee;; MM is is efficientefficient at at ee;; MM is is strongly beatenstrongly beaten at at ee..

Ockham Efficiency Ockham Efficiency TheoremTheorem

Let Let MM be a be a solutionsolution. The . The following are equivalent:following are equivalent: MM is is alwaysalways strongly Ockham and strongly Ockham and

stalwart;stalwart; MM is is alwaysalways efficient; efficient; MM is is nevernever weaklyweakly beaten. beaten.

V. Stochastic V. Stochastic OckhamOckham

Mixed StrategiesMixed Strategies Require that the strategy Require that the strategy converge in converge in

chancechance to the true model. to the true model.

Sample size

Chance of producing true model at parameter

. . .

Retractions in ChanceRetractions in Chance

Total drop in chanceTotal drop in chance of of producing an arbitrary answer as producing an arbitrary answer as sample size increases.sample size increases.

Retraction in Retraction in signalsignal, not actual , not actual retractions due to retractions due to noisenoise..

Sample size


. . .

Ockham EfficiencyOckham Efficiency

Bound retractions in chance by easy Bound retractions in chance by easy comparisons of time and magnitude.comparisons of time and magnitude.

Ockham efficiency still follows.Ockham efficiency still follows.

Sample size


. . .

(0, 0, .5, 0, 0, 0, .5, 0, 0, …)

Classification ProblemsClassification Problems

Points from plane sampled IID, labeled with Points from plane sampled IID, labeled with half-plane membership. half-plane membership. Edge of half-plane Edge of half-plane is some polynomialis some polynomial. What is its degree?. What is its degree?

Uniform Ockham efficiency theoremUniform Ockham efficiency theorem applies.applies.

[Cosma Shalizi]

Model Selection Model Selection ProblemsProblems

Random variables.Random variables. IID sampling.IID sampling. Joint distribution continuously Joint distribution continuously

parametrized.parametrized. Partition over parameter space.Partition over parameter space. Each partition cell is a “model”.Each partition cell is a “model”. Method maps sample sequences to models.Method maps sample sequences to models.

Two Dimensional Two Dimensional ExampleExample

Assume:Assume: independent bivariate normal independent bivariate normal distribution of unit variance.distribution of unit variance.

Question:Question: how manyhow many components of the components of the joint mean are zero?joint mean are zero?

Intuition: Intuition: more nonzeros = more more nonzeros = more complexcomplex

Puzzle:Puzzle: How does it help to favor How does it help to favor simplicity in less-than-simplest worlds?simplicity in less-than-simplest worlds?

A Standard Model Selection A Standard Model Selection MethodMethod

Bayes Information Criterion (BIC)Bayes Information Criterion (BIC) BIC(BIC(MM, sample) = , sample) =

- log(max prob that - log(max prob that MM can assign to can assign to sample) + sample) +

+ log(sample size) + log(sample size) model complexity model complexity ½. ½.

BIC method: choose BIC method: choose MM with least with least BIC score.BIC score.

Official BIC PropertyOfficial BIC Property

In the limit, minimizing BIC finds a In the limit, minimizing BIC finds a model with maximal conditional model with maximal conditional probability when the prior probability probability when the prior probability is flat over models and fairly flat over is flat over models and fairly flat over parameters within a model. parameters within a model.

But it is also mind-change-efficient.But it is also mind-change-efficient.

Toy ProblemToy Problem Truth is bivariate normal of known covariance.Truth is bivariate normal of known covariance. Count non-zero components of mean vector.Count non-zero components of mean vector.

Pure MethodPure Method Acceptance zones for different Acceptance zones for different

answers in sample mean space.answers in sample mean space.

ComplexComplex

SimpleSimple

Performance in Simplest Performance in Simplest WorldWorld

nn = 2 = 2 = (0, 0).= (0, 0).

Retractions Retractions = 0= 0

ComplexComplex

SimpleSimple

-2 -1 0 1 2 3

-2

-1

0

1

2

3

95%


nn = 2 = 2 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-2 -1 0 1 2 3

-2

-1

0

1

2

3


nn = 100 = 100 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.0075-0.005-0.0025 0 0.0025 0.005 0.0075

-0.0075

-0.005

-0.0025

0

0.0025

0.005

0.0075


nn = 20,000,000 = 20,000,000 = (0, 0).= (0, 0).


ComplexComplex

SimpleSimple

-0.006 -0.004 -0.002 0 0.002 0.004 0.006

-0.006

-0.004

-0.002

0

0.002

0.004

0.006

Performance in Complex Performance in Complex WorldWorld

nn = 2 = 2 = (.05, .005).= (.05, .005).


-2 -1 0 1 2 3

-2

-1

0

1

2

3

ComplexComplex

SimpleSimple

95%


nn = 100 = 100 = (.05, .005).= (.05, .005).

-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


ComplexComplex

SimpleSimple


nn = 30,000 = 30,000 = (.05, .005).= (.05, .005).

-0.1 -0.05 0 0.05 0.1

-0.1

-0.05

0

0.05

0.1 Retractions Retractions

= 1= 1

ComplexComplex

SimpleSimple


nn = 4,000,000 (!) = 4,000,000 (!) = (.05, .005).= (.05, .005).

-0.04 -0.02 0 0.02 0.04

-0.04

-0.02

0

0.02

0.04 Retractions Retractions = 2= 2

ComplexComplex

SimpleSimple

Causal Inference from Causal Inference from Stochastic DataStochastic Data

Suppose that the true linear causal Suppose that the true linear causal model is:model is:

X Y Z W

.998

.99 -.99 .1

Variables are standard normal

Causal Inference from Causal Inference from Stochastic DataStochastic Data

[Scheines, Mayo-Wilson, and Fancsali]

X Y Z W

Sample size 40. In 9 out of 10 samples, PC algorithm outputs:

Sample size 100,000. In 9 out of 10 samples, PC outputs truth:

X Y Z W

Variables standard normal

Deterministic Sub-Deterministic Sub-problemsproblems

Membership degree = 0

Membership Degree = 1w

n

Worst-case cost at Worst-case cost at ww = =

supsupw’ w’ mem(mem(ww, , w’w’) ) XX cost( cost(w’w’))

Worst-case cost = supWorst-case cost = supw w worst-case cost at worst-case cost at w.w.

Statistical Sub-problemsStatistical Sub-problems

p p’p’

Membership(p, p’) = 1 – (p, p’)

Worst-case cost at Worst-case cost at pp = =

supsupw’w’ mem(mem(pp, , p’p’) ) XX cost( cost(pp))

Worst-case cost = supWorst-case cost = supp p worst-case cost at worst-case cost at p.p.

Future DirectionFuture Direction

Consistency:Consistency: Converge to production of Converge to production of true answer with chance > 1 - true answer with chance > 1 - ..

Compare worst-case timed bounds on Compare worst-case timed bounds on retractions in chance of retractions in chance of -consistent -consistent methods over each complexity class.methods over each complexity class. Generalized powerGeneralized power: minimizing retraction : minimizing retraction

time forces simple acceptance zones to be time forces simple acceptance zones to be powerful.powerful.

Generalized significanceGeneralized significance: minimizing : minimizing retractions forces simple zone to be size retractions forces simple zone to be size

BalanceBalance balance depends on balance depends on ..

V. Conclusion:V. Conclusion:

Ockham’s RazorOckham’s Razor

NecessaryNecessary for staying on the for staying on the straightest pathstraightest path to the to the truthtruth

Does not Does not point atpoint at or or indicateindicate the the truth.truth.

Works without Works without circlescircles, , evasionsevasions, or , or magicmagic..

Such a theory is motivated in Such a theory is motivated in counterfactual inferencecounterfactual inference and and estimationestimation..

(with C. Glymour) (with C. Glymour) “Why Probability Does Not “Why Probability Does Not Capture the Logic of Scientific Justification”,Capture the Logic of Scientific Justification”, C. C. Hitchcock, ed., Hitchcock, ed., Contemporary Debates in the Philosophy of ScienceContemporary Debates in the Philosophy of Science, , Oxford: Blackwell, 2004.Oxford: Blackwell, 2004.

““Justification as Truth-finding Efficiency: How Justification as Truth-finding Efficiency: How Ockham's Razor Works”,Ockham's Razor Works”, Minds and MachinesMinds and Machines 14: 2004, 14: 2004, pp. 485-505.pp. 485-505.

““Ockham's Razor, Efficiency, and the Unending Ockham's Razor, Efficiency, and the Unending Game of Science”,Game of Science”, forthcoming in proceedings, forthcoming in proceedings, Foundations Foundations of the Formal Sciences 2004: Infinite Game Theoryof the Formal Sciences 2004: Infinite Game Theory, Springer, under , Springer, under review.review.

““How Simplicity Helps You Find the Truth How Simplicity Helps You Find the Truth Without PointingWithout Pointingat it”,at it”, forthcoming, V. Harazinov, M. Friend, and N. Goethe, eds.forthcoming, V. Harazinov, M. Friend, and N. Goethe, eds.Philosophy of Mathematics and InductionPhilosophy of Mathematics and Induction, Dordrecht: Springer., Dordrecht: Springer.

““Ockham's Razor, Empirical Complexity, and Ockham's Razor, Empirical Complexity, and Truth-findingTruth-findingEfficiency”,Efficiency”, forthcomingforthcoming, Theoretical Computer Science, Theoretical Computer Science..

““Learning, Simplicity, Truth, and Learning, Simplicity, Truth, and Misinformation”,Misinformation”, forthcoming inVan Benthem, J. and forthcoming inVan Benthem, J. and Adriaans, P., eds. Philosophy of Information.Adriaans, P., eds. Philosophy of Information.

Further ReadingFurther Reading

II. Navigation II. Navigation Without a Without a Compass Compass

Asking for DirectionsAsking for Directions

Where’s …Where’s …

Asking for DirectionsAsking for DirectionsTurn around. The Turn around. The freeway ramp is freeway ramp is on the left.on the left.

Asking for DirectionsAsking for Directions

Goal

Helpful AdviceHelpful Advice

Goal

Best RouteBest Route

Goal

Best Route to Any GoalBest Route to Any Goal

Disregarding Advice is Disregarding Advice is BadBad

Extra U-turn

Best Route to Any GoalBest Route to Any Goal

…so fixed advice can help you reach a hidden goalwithout circles, evasions, or magic.

There is no difference whatsoever in It. He goes from death to There is no difference whatsoever in It. He goes from death to

death, who sees difference, as it were, in Itdeath, who sees difference, as it were, in It [Brihadaranyaka [Brihadaranyaka 4.4.19-20] 4.4.19-20]

"Living in the midst of ignorance and considering themselves "Living in the midst of ignorance and considering themselves intelligent and enlightened, the senseless people go round and intelligent and enlightened, the senseless people go round and round, following crooked courses, just like the blind led by the round, following crooked courses, just like the blind led by the

blind." Katha Upanishad I. ii. 5.blind." Katha Upanishad I. ii. 5.

AcademicAcademic

AcademicAcademic

Poof!If there weren’t an apple on the tableI wouldn’t be a brain in a vat, so I wouldn’t see one.

truth-conduciveness without reliability: a non-theological explanation of ockham’s razor kevin t....

Documents

truth truth slide

complexity truth slide

evasions truth slide

complex slide

convergence convergence

puzzle slide

simple slide

magical slide