the linkage between popperian science and statistical analysisbio.classes.ucsc.edu/bio162/moorea...

1

Rigorous Science - Based on a probability value?

The linkage between Popperian science and statistical analysis

2

The Philosophy of science: the scientific Method - from a Popperian perspective

Philosophy

Science

Design

How we understand the world

How we expand that understanding

How we implement science

Arguments over how we understand and expand our understanding are the basis of debates over how science has been, is and should be done

The Philosophy of science: the scientific Method - from a Popperian perspective

Terms:1. Science - A method for understanding rules of assembly or

organizationa) Problem: How do we, (should we) make progress in science

2. Theory - a set of ideas formulated to explain something3. Hypothesis - supposition or conjecture (prediction) put forward to

account for certain facts, used as a basis for further investigations4. Induction or inductive reasoning - reasoning that general

(universal) laws exist because particular cases that seem to be examples of those laws also exist

5. Deduction or deductive reasoning - reasoning that something must be true because it is a particular case of a general (universal) law

Particular General

Induction

Deduction

3

The Scientific Method - from a Popperian perspective

Extreme example1. Induction

“Every swan I have seen is white, therefore all swans are white”

2. Deduction“All swans are white, the next one I see will be

white”

Compare these statements:1. Which can be put into the form of a testable hypothesis?

(eg. prediction, if - then statement) 2. Which is closer to how we operate in the world?3. Which type of reasoning is most repeatable?

Is there a difference between ordinary understanding and scientific understanding (should there be?)

INSIGHT

Existing Theory

General hypothesis

Previous Observations

Perceived Problem

Belief

Comparison withnew observations

Specific hypotheses(and predictions)Confirmation Falsification

Conception

Assessment

- Inductive reasoning

- Deductive reasoning

H supported (accepted)H rejectedO

AH rejectedH supported (accepted)O

A

The Scientific Method - from a Popperian perspectiveHypothetico - deductive method

1. Conception - Inductive reasoninga. Observationsb. Theoryc. Problemd. Regulatione. Belief

2. Leads to Insight and a GeneralHypothesis

3. Assessment is done bya. Formulating Specific

hypothesesb. Comparison with new

observations

4. Which leads to:a. Falsification - and rejection

of insight, and specific and general hypotheses, or

b. Confirmation - and retesting of alternative hypotheses

4

INSIGHT

Existing TheoryGeneral hypothesis


Perceived Problem

Belief


Specific hypothesesThe next swan I see will be white)

Confirmation Falsification

Conception

Assessment





A


1. Is there any provision for accepting the insight or working hypothesis?

2. Propositions not subject to rejection by contrary observations are not “scientific”

3. Confirmation does not end hypothesis testing - new hypotheses should always be put forth for a particular observation, theory, belief…

4. In practice but rarely reported, alternatives are tested until only one (or a few) are left (not rejected). Then we say things like: suggest, indicates, is evidence for

5. Why is there no provision for accepting theory or working hypotheses?

Questions and Notes

All swans are white

a) Because it is easy to find confirmatory observations for almost any hypothesis, but one negative result refutes it absolutely (this assumes test was adequate - the quality of falsification is important)


Considerations - problems with the Popperianhypothetico -deductive approach)

INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white

1) This type of normal science may rarely lead to revolutions in Science (Kuhn)

A) Falsification science leads to paradigms - essentially a way of doing and understanding science that has followers

B) Paradigms have momentum - mainly driven by tradition, infrastructure and psychology

C) Evidence against accepted theory is considered to be exceptions (that prove the rule)

D) Only major crises lead to scientific revolutions 1) paradigms collapse from weight of exceptions -

normal science - crisis - revolution - normal science

5

By adjusting the velocities of these concentric spheres, many features of planetary motion could be explained. However, the troubling observations of varying planetary brightness and retrograde motion could not be accommodated: the spheres moved with constant angular velocity, and the objects attached to them were always the same distance from the earth because they moved on spheres with the earth at the center.

The paradigm: The earth must be the center of the universe350 BC

The "solution" to these problems came in the form of a mad, but clever proposal: planets were attached, not to the concentric spheres themselves, but to circles attached to the concentric spheres, as illustrated in the adjacent diagram. These circles were called "Epicycles", and the concentric spheres to which they were attached were termed the "Deferents". Then, the centers of the epicycles executed uniform circular motion as they went around the deferent at uniform angular velocity, and at the same time the epicyles (to which the planets were attached) executed their own uniform circular motion.

Exceptions are explained

6

However, in practice, even this was not enough to account for the detailed motion of the planets on the celestial sphere! In more sophisticated epicycle models further "refinements" were introduced:

•In some cases, epicycles were themselves placed on epicycles, as illustrated in the figure below. •In actual models, the center of the epicycle moved with uniform circular motion, not around the center of the deferent, but around a point that was displaced by some distance from the center of the deferent.

That ancient astronomers could convince themselves that this elaborate scheme still corresponded to "uniform circular motion" is testament to the power of three ideas that we now know to be completely wrong, but that were so ingrained in the astronomers of an earlier age that they were essentially never questioned:

1.All motion in the heavens is uniform circular motion. 2.The objects in the heavens are made from perfect material, and cannot change their intrinsic properties (e.g., their brightness). 3.The Earth is at the center of the Universe.

These ideas concerning uniform circular motion and epicycles were catalogued by Ptolemy in 150 A.D. His book was called the "Almagest" (literally, "The Greatest"), and this picture of the structure of the Solar System has come to be called the "Ptolemaic Universe".

Paradigm nears scientific collapse

Ptolemy and the paradigm

By the Middle Ages, such ideas took on a new power as the philosophy of Aristotle (newly rediscovered in Europe) was wedded to Medieval theology in the great synthesis of Christianity and Reason undertaken by philosopher-theologians such as Thomas Aquinas. The Prime Mover of Aristotle's universe became the God of Christian theology, the outermost sphere of the Prime Mover became identified with the Christian Heaven, and the position of the Earth at the center of it all was understood in terms of the concern that the Christian God had for the affairs of mankind.

Thus, the ideas largely originating with pagan Greek philosophers were baptized into the Catholic church and eventually assumed the power of religious dogma: to challenge this view of the Universe was not merely a scientific issue; it became a theological one as well, and subjected dissenters to the considerable and not always benevolent power of the Church.

Religion Intervenes

7

The Copernican Revolution1543 AD



1) This type of normal science may rarely lead to revolutions in Science (Kuhn)

A) Falsification science leads to paradigms - essentially a way of doing and understanding science that has followers

B) Paradigms have momentum - mainly driven by tradition, infrastructure and psychology

C) Evidence against accepted theory is considered to be exceptions (that prove the rule)

D) Only major crises lead to scientific revolutions 1) paradigms collapse from weight of exceptions -

normal science - crisis - revolution - normal science

Copernican Universe

Aristotle - Ptolemian universe

8


INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white

2) Philosophical opposition - (e.g. Roughgarden 1983) A) Establishment of empirical fact is by building a

convincing case for that fact. B) We don’t use formal rules in everyday life, instead we

use native abilities and common sense in building and evaluating claims of fact

C) Even if we say we are using the hypothetico - deductive approach, we are not, instead we use intuition and make it appear to be deduction



INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white


3) Practical opposition - (e.g. Quinn and Dunham 1983) A) In practice ecology and evolution differ from Popperian

science 1) they are largely inductive 2) although falsification works well in physical and

some experimental areas of biology - it is difficult to apply in complex systems of multiple causality - e.g. Ecology and Evolution

3) Hypothetico - deductive reasoning works well if potential cause is shown not to work at all (falsified) but this rarely occurs in Ecology or Evolution - usually effects are of degree.

9


INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white

1) Choice of Method for doing science. Platt (1964) reviewed scientific discoveries and concluded that the most efficient way of doing science consisted of a method of formal hypothesis testing he called Strong Inference.

A) Apply the following steps to every problem in Science - formally, explicitly and regularly:

1) Devise alternative hypotheses 2) Devise critical experiments with alternative possible

outcomes, each of which will exclude one or more of the hypotheses (rejection)

3) Carry out procedure so as to get a clean result 1’) Recycle the procedure, making subhypotheses or sequential ones to define possibilities that remain



INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white

1) Choice of Method for doing science. Platt (1964) reviewed scientific discoveries and concluded that the most efficient way of doing science consisted of a method of formal hypothesis testing he called Strong Inference.

A) Apply the following steps to every problem in Science - formally, explicitly and regularly:

1) Devise alternative hypotheses 2) Devise critical experiments with alternative possible

outcomes, each of which will exclude one or more of the hypotheses (rejection)

3) Carry out procedure so as to get a clean result 1’) Recycle the procedure, making subhypotheses or sequential ones to define possibilities that remain


Very much like the Popperian method!

10


INSIGHT



Perceived Problem

Belief




Conception

Assessment





A

All swans are white


3) Practical opposition - (e.g. Quinn and Dunham 1983) A) In practice ecology and evolution differ from Popperian

science 1) they are largely inductive 2) although falsification works well in physical and

some experimental areas of biology - it is difficult to apply in complex systems of multiple causality - e.g. Ecology and Evolution

3) Hypothetico - deductive reasoning works well if potential cause is shown not to work at all (falsified) but this rarely occurs in Ecology or Evolution - usually effects are of degree.

This may be a potent criticism and it leads to the use of inferential statistics

A) Philosophical underpinnings of Popperian Method is based on absolute differences

1) E.g. All swans are white, therefore the next swan I see will be white - If the next swan is not white then the hypothesis is refuted absolutely.

B) Instead, most results are based on comparisons of measured variables 1) not really true vs. false but degree to which an effect exists

Example - Specific hypothesis – number of Oak seedlings is higher in areas outside impact sites than inside impact sites

Absolute vs. measured differences

Observation 1: Number inside Number outside 0 10 0 15 0 18 0 12 0 13

Mean 0 13

Observation 2: Number inside Number outside 3 10 5 7 2 9 8 12 7 8

Mean 5 9.2

What counts as a difference?Are these different?

11

Statistical analysis - cause, probability, and effectI) What counts as a difference - this is a statistical

philosophical question of two parts A) How shall we compare measurements - statistical

methodology B) What will count as a significant difference -

philosophical question and as such subject to convention

II) Statistical Methodology - General A) Null hypotheses - Ho

1) In most sciences, we are faced with: a) NOT whether something is true or false

(Popperian decision) b) BUT rather the degree to which an effect exists

(if at all) - a statistical decision. 2) Therefore 2 competing statistical hypotheses are

posed: a) HA: there is a difference in effect between

(usually posed as < or >) b) HO: there is no difference in effect between

INSIGHT

Existing Theory

General hypothesis


Perceived Problem

Belief


Specific hypotheses(and predictions)Confirmation Falsification

Conception

Assessment



Specific hypothesis HA:Number of oak seedlings isGreater in areas outside impactSites than inside impact sites



A

Statistical tests - a set of rules whereby a decision about hypotheses is reached (accept, reject)

1) Associated with rules - some indication of the accuracy of the decisions - that measure is a probability statement or p-value

2) Statistical hypotheses: a) do not become false when a critical p-value is exceeded b) do not become true if bounds are not exceeded c) Instead p-values indicate a level of acceptable uncertainty d) critical p-values are set by convention - what counts as acceptable

uncertainty e) Example - if critical p-value = 0.05 this means that we are unwilling to

accept the posed alternative hypothesis unless: 1) we 95% sure that it is correct, or equivalently that 2) we are willing to accept an error rate of 5% or less that we are wrong

when we accept the hypothesis

Statistical analysis - cause, probability, and effect

12

The logic of statistical tests - how they are performed1) Assume the Null Hypothesis (Ho) is true (e.g. no difference in number of

Oak seedlings in impact and non-impact sites).

2) Compare measurements - generally this means comparing two sample distributions (determined from the numbers from the experiment or survey)

A) Comparison of distributions - generally by comparing means and the estimate of error associated with the sampling of the meanssimplest case is the Standard Error of the Mean (SE or SEM) = Sx

.5 sd/n = standard deviation / square root of level of replication

3) Determine the probability that distributions are similar/different

4) Compare with a critical p-value to assign significance

.

Statistical analysis - cause, probability, and effect

05101520253035404550

VALUES

Distribution of Oak Seedlings - pre-impact

Sites = 100Mean per site = 25Total seedlings = 2500

Calculation of statistical distributions

13

x fifty iterations

Example: sample 10 sites to determine mean

Evaluate effect of sample size on calculation of and confidence in Mean

Compare for sample size's of 5,10, 20, 50, 99 cellsIterate 50 times

22 27

12

3325

41

31

23

1936

23

24

36

2828 25

172140

16

X=21.5 X=25.8

0 20 40 60

True Mean = 25

22 27

12

3325

41

31

23

1936

Mean = 21.5

23

24

36

2828 25

172140

16

Mean = 25.8

Means21.522.323.023.924.925.125.826.527.829.9

Estimate of Mean

Num

ber

of c

ases

14

Effect of number of observations on estimate of Mean

Estimate of Mean15 20 25 30 350

4

8

12

16

# of

cas

es

0.0

0.1

0.2

0.3 Proportion per Bar


2 sd2 sd

2.5%2.5%

Estimate of Mean

Num

ber o

f Cas

es


4

8

12

16

# of

cas

es

0.0

0.1

0.2


Recall that standard deviation of distribution of means is approximated by sample standard error (se)

15


15 20 25 30 35

Estimate of Mean

0

4

8

12

16

Num

ber o

f cas

es

995020105

Mean based onX observations

Frequency distributions of sample means


4

8

12

16

# of

cas

es

0.0

0.1

0.2


0 10 20 30 40 50 60 70 80 90 100SAMPLES

0

5

10

15

20

25

30

Val

ue

Standard DeviationMean

How to estimate optimal sample size1) Do a preliminary study of variables that will be evaluated in project

2) Plot the mean and some estimate of variability of data as a function of sample size

3) Look for “sweet spot”where estimates of mean and variance (or standard deviation) converge on a stable value

4) Calculate a “bang for buck”relationship to determine if a robust design (sufficient sampling) can be paid for

16

Sample Size – e.g. number of replicate sites

Acc

urac

y1 Cost Bang per

buck+-

Trade off between accuracy and cost

(1Accuracy = index of approach to most reliable estimate of mean)

Type 1 and Type II error.1) By convention, the hypothesis tested is the null hypothesis (no difference

between)a) In statistics, assumption is made that a hypothesis is true (assume HO true =

assume HA false)b) accepting HO (saying it is likely to be true) is the same as rejecting HA

(falsification)c) Scientific method is to falsify competing alternative hypotheses (alternative

HA’s) 2) Errors in decision making

DecisionTruth Accept HO Reject HO

HO true no error (1-α) Type I error (α)HO false Type II error ( β ) no error (1- β)

Type I error - probability α that we mistakenly reject a true null hypothesis (HO)Type II error - probability β that we mistakenly fail to reject (accept) a false null

hypothesisPower of Test - probability (1-β) of not committing a Type II error - The more powerful

the test the more likely you are to correctly conclude that an effect exists when it really does (reject HO when HO false = accept HA when HA true).

Types of statistical error – Type 1 and II

17

INSIGHT

Existing Theory

General hypothesis


Perceived Problem

Belief


Specific hypotheses(and predictions)


Conception

Assessment





A

Decision

Truth Accept H O Reject H O

HO true no error (1-alpha) Type I error (alpha)

HO false Type II error (beta) no error (1-beta)

Scientific method andstatistical errors

Decision

Truth Accept HO Reject HO



INSIGHT

Existing Theory

Oil affects Oak seedlings

Oil leaking on a numberof sites with Oaks

Perceived Problem

Belief

Compare Seedling # on “impact and control sites

Seedling # will be higherIn control sites than on

“impact” sitesConfirmation Falsification

Conception - Inductive reasoning

Assessment - Deductive reasoning



A

No differenceMore seedlings in control sites

Scientific method andstatistical errors

- case example

18

Decision

Truth Accept HO Reject HO



Correct decisionImpact detected

Type II ErrorFailure to detect real impact; false sense of security

Impact

Type 1 ErrorFalse Alarm

Correct decisionNo impact detected

No Impact

ImpactNo ImpactBiological Truth

ConclusionMonitoring

Error types and implications in basic and environmental science

What type of error should we guard against?

y 1y 2

Statistical comparison of two distributions

Statistical Power, effect size, replication and alpha

19

Area under the curve = 1.00

Statistical Power, effect size, replication and alpha

15 20 25 30 350

4

8

12

16

Cou

nt

0.0

0.1

0.2


Estimate of Mean

9.5 9.7 9.9 10.1 10.3 10.5Mean Value from Sample

0

10

20

30

Cou

nt

0.0

0.1

0.2

0.3

Proportion per Bar

9.5 9.7 9.9 10.1 10.3 10.5Mean Value from Sample

0

5

10

15

20

25

Cou

nt

0.0

0.1


y1

y2

y1

y2

Ho true: Distributions of means are truly the same

y1 −t =

y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

-4 -3 -2 -1 0 1 2 3 4t - value

0

5

10

15

20

25

Cou

nt

0.0

0.1


t distribution

20



HO true no error (1-alpha)

Type I error (alpha)

HO false Type II error (beta)

no error (1-beta)

0t distribution (df)

y1 −t =

y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

y1

y2

tassign critical p (assume = 0.05)

t c






no error (1-beta)

0

If distributions are truly the same then: area to right of critical t C

> t C will cause incorrect conclusion that distributions are different

represents the Type 1 error rate (Blue); any calculated t

t distribution (df)y1 −

t =y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

21

Ho false: Distributions of means are truly different

y1 y2

9.5 10.0 10.5 11.0Mean value from sample

0

10

20

30

40

50

Cou

nt

0.0

0.1

0.2

0.3

0.4

0.5

Proportion per Bar


0

10

20

30

40

50

Cou

nt

0.0

0.1

0.2

0.3

0.4

0.5

Proportion per Bar

y1 −t =

y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

-5 0 5 10t - value

0

10

20

30

40

Cou

nt

0.0

0.1

0.2

0.3

0.4

Proportion per Bar

0

t distribution (df)

y1 −t =

y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

Central t distribution

Non-central t distribution


y1 y2

t

22

assign critical p (assume = 0.05)

t c





no error (1-beta)


If distributions are truly different then: area to left of critical t C

< t C will cause incorrect conclusion that distributions are same

represents the Type II error rate (Red); any Calculated t

0t distribution (df)y1 −

t =y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp


t

0

t c

Type I errorType II error

Critical p

How to control Type II error (distributions are truly different)This will maximize statistical power to detect real impacts

1) Vary critical P-Values 2) Vary Magnitude of Effect3) Vary replication





no error (1-beta)


t

23

0

t cType I errorType II error

Critical p

How to control Type II error (distributions are truly different) 1) Vary critical P-Values (change blue area)

t c

Critical p

t c

Critical p

Reference

A) Make critical P morestringent (smaller)

A) Relax critical P (larger values)

Type II error increasesPower decreases

Type II error decreasePower increases





no error (1-beta)

tt

How to control Type II error (distributions are truly different) 2) Vary magnitude of effect (vary distance between y1 and y2, which affects non-central t distribution


0

10

20

30

40

50

60

70

Cou

nt

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Proportion per B

ar


0

10

20

30

40

50

60

Cou

nt

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Proportion per Bar


0

10

20

30

40

50

60

70

Cou

nt

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Proportion per B

ar

-5 0 5 10 15 20t value

0

10

20

30

40

50

60

Cou

n t

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Proportion per B

ar

-5 0 5 10 15 20t value

0

10

20

30

40

50

60

Cou

n t

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Proportion per B

ar

A.

B.

C.

B vs A

C vs A

y1 −t =

y2

1n1

1n2

+sp

y1 −t =

y2

1n1

1n2

+sp

y = 10

y = 10.5

y = 12

24

0

t c


Critical p

t c

Reference

A) Make distance smaller

A) Make distance greater


Type II error decreasesPower increases

0

t c

How to control Type II error (distributions are truly different)





no error (1-beta)

0

2) Vary magnitude of effect (vary distance between y1 and y2, which affects non-central t distribution

tt

tt

tt

t c


Critical p

3a) Vary replication (which controls estimates of error)- Note tc is constant - and Type 1 error is allowed to vary

t c

Reference

A) Decrease replication

A) Increase replication

Type II error increasesPower decreases (note that Type I error increases)

Type II error decreasePower increases (note thatType I error decreases)

t c






no error (1-beta)

0

0

0

tt

tt

tt

25


t c


Critical p

Reference

A) Decrease replication

A) Increase replication


Type II error decreasePower increases

t c





no error (1-beta)

t c

3b) Vary replication (which controls estimates of error)- Note Type 1 error is constant and tc is allowed to vary

0

0

0

tt

tt

tt

Replication

Power

Low

High

Low High

Magnitude of Effect

Power

Low

High

Small Large

Critical alpha

Power

Low

High

Low High

POWER

26

Replication and independenceControl sites Impact sites

Pseudoreplicationin space

Pseudoreplicationin time

Pseudoreplicationin time and space

The fallacy of the primacy of alpha for environmental work

The trade-off between error and cost – an obvious conflict (I)

Sample size

Cost of assessment

27

Probability (statistical) of detecting true impact

Cost of assessment


The trade-off between error and cost – an obvious conflict (II)

Probability (statistical) of concluding no impact

Therefore, the cynical conclusion is that the best strategy for anapplicant would be to under-fund an assessment

A) The lower the cost the more likely that a conclusionof no impact will be reached (regardless of the truth)


The trade-off between error and cost – an obvious conflict (III)

Recommendations: 1) Determine the importance of error rates as a ratio

For example – what is the relative importance of:correctly concluding there is an impact (I) vs.correctly concluding there is no impact (N)

2) Determine the maximum Type 1 error acceptable (probability of concluding there is an effect (impact) when there is not one)

3) Estimate cost of assessment

4) Recycle procedure if necessary

5) Do not do the assessment if acceptable ratios and maximum error ratescannot be attained

28

General schematic for the design of a monitoring program1

Basic Sampling Design

Select Variables

Cost of errors

Refine design

Achievable?

Go

Effect Size Background Variation

Accept GreaterError Rate?

YY

N

1 from Keough and Mapstone 1993

the linkage between popperian science and statistical analysisbio.classes.ucsc.edu/bio162/moorea...

Documents