david causeur agrocampus ouest irmar cnrs umr...

198
Effect at a population level Decision making procedure Testing for a group effect Linear effect First steps in data analysis with David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/david.causeur

Upload: ngothu

Post on 06-Mar-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

First steps in data analysis with

David CauseurAgrocampus Ouest

IRMAR CNRS UMR 6625http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/david.causeur

Page 2: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Free Printable Signs from www.hooverwebdesign.com

Page 3: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Learning objectives

By the end of this course, students will be able to:• Implement statistical methods for common data analysis issues;• Choose appropriate procedures based on statistical arguments;• Assess the performance of a statistical decision rule;• Apply these key insights into class activities using a statistical software.

Readings

D. Causeur and Sheu, C.-F. (2017). Significance of a relationship. Onlineunpublished textbook.Freely downloadable here: http://http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/david.causeur/teaching

Assignments

• In-class short exams - 50%• Final Project - 50%

Page 4: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Online resources

Page 5: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Online resources

Page 6: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Online resources

Page 7: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Online resources

Page 8: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Why ?• covers a huge range of functionality;• is free;• knowing is explicitly demanded in many many job offers;

Page 9: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 10: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 11: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 12: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 13: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 14: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 15: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Page 16: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline1 Effect at a population level2 Decision making procedure3 Testing for a group effect

Exploring for a group effectOne-way analysis-of-variance modelLeast-squares estimation of effect parametersF-testThe special case of a two-level factor: t-testDetailing a significant group effectTesting a group effect using paired data

4 Linear effectLinearity of an effectLinear regression modellingLeast-squares fittingF-testComparing regression lines

Page 17: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population level

Page 18: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population levelOne of the most common question addressed by statistics

’is there an effect of this on that?’

• does an increase of a drug dose modify the blood pressureof a patient?

• does the nitrogen content in soil have an impact on cropyield?

• does the gender of a consumer affect his propensity to buya product?

Page 19: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population levelOne of the most common question addressed by statistics

’is there an effect of this on that?’

Illustrative example along the lectures:• The lean meat percentage (LMP) in a pig carcass

measures its commercial value.• In slaughterhouses, it is predicted using biometric

measurements (tissue depths).• To which extent is the LMP predictable from tissue depths?• Does the genetic type of a pig affect his LMP, his fat depth?

Page 20: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population levelHow to handle this:

By considering it involves two kinds of variables:• the response variable Y ,• the explanatory variables X .

... the variations of Y being possibly related to the values of X

Page 21: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population levelHow to handle this:

By considering it involves two kinds of variables:• the response variable Y ,• the explanatory variables X .

... the variations of Y being possibly related to the values of X

Definition’X has an effect on Y ’

can be formulated mathematically as

’the distribution of Y restricted to items having the same valuex of X , the so-called conditional distribution of Y givenX = x , actually depends on that value x ’.

Page 22: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Effect at a population levelHow to handle this:

By considering it involves two kinds of variables:• the response variable Y ,• the explanatory variables X .

... the variations of Y being possibly related to the values of X

Definition (in the pig data context)’The genetic type has an effect on the fat depth’

can be formulated mathematically as

’the within-genetic type distributions of the fat depths are notthe same’.

Page 23: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Data for decision makingData: joint observations (xi , yi)i=1,...,n of X and Y , with n ≥ 2

DefinitionThe set of n items for which we have observations (xi , yi)i=1,...,nis named the sample, and n the sample size.

Page 24: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Data for decision makingData: joint observations (xi , yi)i=1,...,n of X and Y , with n ≥ 2

R script> dta = read.table("pig.txt",header=TRUE)

dta has 60 rows, one for each pig carcass in the sample, and 6 columns:• LMP for the lean meat percentage,• VFAT for the fat depth measured in the area of lumbar vertebra (mm),• BFAT for the back fat depth (mm),• BMUSCLE for the back muscle depth (mm),• SEX for the sex of the animal and• GENET for its genotype, regarding a gene of interest.

Page 25: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Data for decision makingData: joint observations (xi , yi)i=1,...,n of X and Y , with n ≥ 2

DefinitionThe set of n items for which we have observations (xi , yi)i=1,...,nis named the sample, and n the sample size.

Provided the sample is representative of a widerpopulation

Conclusions and/or decisions are supposed to be valid at thepopulation level.

DefinitionThe statistical methodology aiming at making a decision for apopulation, based on a sample of this population, is namedinferential statistics.

Page 26: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Inferential versus exploratory data analysisOur focus: using sample data, making a decision about theexistence or not of an effect at the population level

Exploratory data analysis will be used first to describe theeffect of interest:• using graphical representations• or summary statistics

Exploratory data analysis is a complement of inferentialstatistics, insightful to build hypotheses.

Page 27: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Data summarySummary statistics: summarize the distribution of randomobservations

R script> summary(dta) # Provides a columnwise summary of the data table

LMP VFAT BFAT BMUSCLEMin. :48.62 Min. : 9.16 Min. : 7.005 Min. :48.10

1st Qu.:58.07 1st Qu.:14.09 1st Qu.:11.050 1st Qu.:57.80Median :59.84 Median :15.95 Median :13.210 Median :61.27

Mean :59.47 Mean :16.40 Mean :13.424 Mean :61.643rd Qu.:61.18 3rd Qu.:18.43 3rd Qu.:15.328 3rd Qu.:66.05

Max. :66.30 Max. :23.83 Max. :22.195 Max. :72.56

SEX GENETF:32 P0 :15M:28 P25 :23

P50 :21NA’s: 1

Page 28: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Data summary

Most importantly, the distribution of a random variable is firstrelated to its nature,• either numeric: measured on a continuous or discrete

scale (LMP, tissue depths, ...)• or categorical: defining subgroups in the population (sex,

genetic type, ...)... sometimes ambiguous: e.g. number of children.

Page 29: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Mean, median and quantilesThe mean and median indicate the position of a seriesx1, . . . , xn on the real axis.

DefinitionThe mean of (x1, . . . , xn) is defined as follows:

x =x1 + . . .+ xn

n.

The median is defined less explicitly by two properties:

1n

card {i = 1, . . . ,n, xi ≤ median(x)} ≥ 0.5,

1n

card {i = 1, . . . ,n, xi ≥ median(x)} ≥ 0.5.

Page 30: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Mean, median and quantilesThe mean and median indicate the position of a seriesx1, . . . , xn on the real axis.

R script> x = 1:4 # x is the series 1,2,3,4

> mean(x) # mean value of the series

[1] 2.5

> median(x) # median value

[1] 2.5

> x[4] = 40 # The 4th value is now 40

> mean(x)

[1] 11.5

> median(x)

[1] 2.5

Page 31: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Mean, median and quantilesThe mean and median indicate the position of a seriesx1, . . . , xn on the real axis.

The median can be viewed as one of the three quartiles, the50%-quartile q0.5(x), the 1st quartile being q0.25(x) and the 3rdq0.75(x).

DefinitionFor all 0 ≤ α ≤ 1, the 100α%−quantile of (x1, . . . , xn) is usuallydenoted qα(x) and defined as follows:

1n

card {i = 1, . . . ,n, xi ≤ qα(x)} ≥ α,

1n

card {i = 1, . . . ,n, xi ≥ qα(x)} ≥ 1− α.

Page 32: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline1 Effect at a population level2 Decision making procedure3 Testing for a group effect

Exploring for a group effectOne-way analysis-of-variance modelLeast-squares estimation of effect parametersF-testThe special case of a two-level factor: t-testDetailing a significant group effectTesting a group effect using paired data

4 Linear effectLinearity of an effectLinear regression modellingLeast-squares fittingF-testComparing regression lines

Page 33: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Significance of an effect

DefinitionThe effect of X on Y will be said to be significant if there is anevidence deduced from the data analysis that this effectactually exists at the population level.

Page 34: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Decision errorsDecision making has to deal with two kinds of errors:

Definition• Type-I error: declaring an effect as significant whereas it

does not exist at the population level.• Type-II error: declaring an effect as non-significant

whereas it does exist at the population level.

Those two error types are antagonist:• Liberal decision making: declaring the effect as

significant even for a light evidence, large risk of a type-Ierror and low risk of a type-II error.

• Conservative decision making: declaring the effect assignificant only if the evidence is absolutely sure, low riskof a type-I error and large risk of a type-II error.

Page 35: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The conservative decision making processThe statistical decision making process favours a lowprobability of a Type-I error.

Two asymmetric hypothesized states of the effect:{H0 : the effect does not exist at the population levelH1 : the effect actually exists at the population level

H0 is called the null hypothesis: it will only be rejected if thereis a clear evidence that H0 is not consistent with theobservations.

R.A. Fisher, ’The Design of Experiments’ (1935)The null hypothesis is never proved or established, but ispossibly disproved, in the course of experimentation.

Page 36: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The conservative decision making processThe statistical decision making process favours a lowprobability of a Type-I error.

Two asymmetric hypothesized states of the effect:{H0 : the effect does not exist at the population levelH1 : the effect actually exists at the population level

H0 is called the null hypothesis: it will only be rejected if thereis a clear evidence that H0 is not consistent with theobservations.

DefinitionThe test of H0 consists in deciding to reject or not H0 based ondata.

Page 37: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline for a decision making processThe 3 components of a statistical decision making process:• The test statistics T . Relevantly chosen to measure the

effect size: the larger the effect size in the data, the largerthe value of T .

• The null distribution of T : the distribution of T under H0.Suppose we know that PH0

(T ≥ 2) ≤ 0.05, then observingT = 3 shall encourage to reject H0.

• The p-value of the test: the probability, calculated underthe null hypothesis, that the test statistics exceeds theobserved value.

If the p-value is lower than a preset type-I error level α(usually α = 0.05), then the effect is significant.

Page 38: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline for a decision making processInspired from a famous TV program: suppose you take aflight to an unknown destination, blindfolded.

• Your guess (H0): the destination is Brittany, west of France;• At your arrival, you evaluate the outside temperature (test

statistics) at 40◦;• Is T = 40◦ consistent with your guess?• To answer this question, your knowledge is that the

probability that the temperature exceeds 40◦ in Brittany isvery low (null distribution).

• Your conclusion: the null hypothesis is rejected.

Page 39: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline1 Effect at a population level2 Decision making procedure3 Testing for a group effect

Exploring for a group effectOne-way analysis-of-variance modelLeast-squares estimation of effect parametersF-testThe special case of a two-level factor: t-testDetailing a significant group effectTesting a group effect using paired data

4 Linear effectLinearity of an effectLinear regression modellingLeast-squares fittingF-testComparing regression lines

Page 40: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

Definition (reminder)’X has an effect on Y ’

can be formulated mathematically as

’the conditional distribution of Y given X = x actuallydepends on that value x ’.

Page 41: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

Definition (reminder)’X has an effect on Y ’

can be formulated mathematically as

’the conditional distribution of Y given X = x actuallydepends on that value x ’.

Exploring a group effect using summary statisticsR script

> with(dta,numSummary(BFAT,groups=GENET))

mean sd IQR 0% 25% 50% 75% 100%P0 15.16467 2.593618 3.6725 10.600 13.260 15.440 16.9325 20.170

P25 12.95435 3.628696 3.9225 7.005 10.830 12.035 14.7525 22.195P50 12.87071 2.420862 3.7850 9.420 10.855 12.835 14.6400 17.140

data:nP0 15

P25 23P50 21

Page 42: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

DefinitionThe standard deviation is defined as follows:

sx =

√∑ni=1(xi − x)2

n − 1.

The larger sx , the larger the variations of xi around x .

The variance s2x is almost the mean of squared xi − x .

Why dividing by n− 1 and not n? The sum of variations xi − xbeing 0, xi − x are only (n − 1) linearly independent variations.

General principle: the number k of linear dependenciesbetween variations is accounted for by dividing the squaredvariation by its degrees of freedom n − k and not n.

Page 43: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effectExploring a group effect using summary statistics

R script> with(dta,numSummary(BFAT,groups=GENET))

mean sd IQR 0% 25% 50% 75% 100%P0 15.16467 2.593618 3.6725 10.600 13.260 15.440 16.9325 20.170

P25 12.95435 3.628696 3.9225 7.005 10.830 12.035 14.7525 22.195P50 12.87071 2.420862 3.7850 9.420 10.855 12.835 14.6400 17.140

data:nP0 15

P25 23P50 21

It is deduced that:• the mean backfat depth is slightly larger for P0 than for P25

and P50;• the backfat depth values are more dispersed for P25 than

for P0 and P50

Page 44: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effectExploring a group effect using plots

R script

> with(dta,+ plot(GENET,BFAT,col="darkgray",cex.lab=1.25,pch=16,+ main="Distribution of the backfat depth (mm) across genetictypes"))

Page 45: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

P0 P25 P50

1015

20

Distribution of the backfat depth (mm) across genetic types

Page 46: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

DefinitionA boxplot along the y-axis summarizes the dispersion of aseries of numeric values by the following graphical elements:• the box, which lower value is q0.25 and upper value q0.75.

The plain segment within the box locates the median;• the lower whisker, that extends to the smallest value

which is no more that 1.5× IQR from the median;• the upper whisker, that extends to the largest value which

is no more that 1.5× IQR from the median;• isolated dots for each value out of the limits of the

whiskers.

Page 47: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effectExploring using plots

R script

> vbreaks = seq(from=7,to=23,by=2)> # defines a partition of the BFAT values

> with(dta,+ Hist(x=BFAT,groups=GENET,scale="percent",+ breaks=vbreaks,col="darkgray",cex.lab=1.25,pch=16,+ xlab="Backfat depth (mm)",+ main="Distribution of backfat depth across genetic types"))

Page 48: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

GENET = P0

Backfat depth (mm)

perc

ent

10 15 20

05

1525

GENET = P25

Backfat depth (mm)

perc

ent

10 15 20

05

1525

GENET = P50

Backfat depth (mm)

perc

ent

10 15 20

05

1525

Distribution of backfat depth across genetic types

Page 49: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Exploring group effect

DefinitionLet us split the interval covering all the (x1, . . . , xn) into ak−partition: B1 = [a0; a1[, B2 = [a1; a2[, . . . ,B2 = [ak−1; ak [,where k is a pre-chosen number of bins.

A histogram is a bar plot, the support of the i th bar being thebin Bi and its area being proportional to the number ni (or theproportion pi ) of values falling within Bi .

When the widths of the bins are all equal, the heights of thebars are just proportional to ni (or pi ).

Page 50: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Distribution of a numeric variable

Definition (reminder)’X has an effect on Y ’

can be formulated mathematically as

’the conditional distribution of Y given X = x actuallydepends on that value x ’.

Page 51: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Distribution of a numeric variable

Definition (reminder)’X has an effect on Y ’

can be formulated mathematically as

’the conditional distribution of Y given X = x actuallydepends on that value x ’.

Conditional distributions of Y can have different:• means;• variances;• density shapes.

Page 52: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Distribution of a numeric variable

Definition (reminder)’X has an effect on Y ’

can be formulated mathematically as

’the conditional distribution of Y given X = x actuallydepends on that value x ’.

Conditional distributions of Y can have different:• means;• variances;• density shapes.

Page 53: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The normality frameworkUsual restrictions:• The general shape of within-group density functions is the

same;• The ’reference’ distribution in group i is N (µi ;σi):

P(Y ≤ y | i th group) =

∫ y

−∞fµi ,σi (t)dt ,

where fµ,σ is the density function of N (µ;σ):

fµ,σ(t) =1√2π

exp(− 1

2σ2 (t − µ)2).

Page 54: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The normality frameworkUsual restrictions:• The general shape of within-group density functions is the

same;• The ’reference’ distribution in group i is N (µi ;σi):

P(Y ≤ y | i th group) =

∫ y

−∞fµi ,σ(t)dt , [homoscedasticity]

where fµ,σ is the density function of N (µ;σ):

fµ,σ(t) =1√2π

exp(− 1

2σ2 (t − µ)2).

Definition (simplified)There is an effect of X on Y if, for at least two i 6= i ′, µi 6= µi ′ .

Page 55: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Statistical model for a group effectIf Yij stands for the response value for the j th sampling item(j = 1, . . . ,ni ) of the i th group (i = 1, . . . , I):

Yij = µi + εij ,

where εij ∼ N (0, σ) is the residual error.

The above decomposition of Yij exhibits two additive parts:• the non-random part µi concentrates the variations of the

response only due to the factor;⇒ µi are I unknown parameters.

• the random part εij captures the within-group variations ofthe response.⇒ σ, the residual standard deviation, is another unknown

parameter.

Page 56: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Judging for the existence of an effectTest for an effect of X on Y :{

H0 : µ1 = . . . = µI = µ(no effect of X at the population level)H1 : For at least one couple (i , i ′), with i 6= i ′, µi 6= µi ′ .

.

Page 57: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Judging for the existence of an effectTest for an effect of X on Y :{

H0 : µ1 = . . . = µI = µ(no effect of X at the population level)H1 : For at least one couple (i , i ′), with i 6= i ′, µi 6= µi ′ .

.

Choice between two models:• the null model for which X has no effect on Y [submodel]

Yij = µ+ εij .

• and the nonnull model:

Yij = µi + εij .

Page 58: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The one-way ANOVA model

DefinitionThe one-way analysis of variance (ANOVA) model for theeffect of X on Y is usually formulated as follows:

Yij = µ+ αi + εij , where α1, . . . , αI are the effect parameters.

Page 59: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The one-way ANOVA model

DefinitionThe one-way analysis of variance (ANOVA) model for theeffect of X on Y is usually formulated as follows:

Yij = µ+ αi + εij , where α1, . . . , αI are the effect parameters.

Parameterizations (µ1, . . . , µI) and (µ, α1, . . . , αI) are notequivalent: indeed, equating, for all i = 1, . . . , I, µ+ αi = µihas an infinity of solutions.

The most common ways of fixing this:• Consider that α1 + . . .+ αI = 0. Since for all i , µ+ αi = µi ,

then, µ =∑I

i=1 µi/I.• Consider that α1 = 0. Then, µ = µ1 and, for all i = 1, . . . , I,αi = µi − µ1.

Page 60: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

The one-way ANOVA model

DefinitionThe one-way analysis of variance (ANOVA) model for theeffect of X on Y is usually formulated as follows:

Yij = µ+ αi + εij , where α1, . . . , αI are the effect parameters.

Basic idea: Testing for the significance of an effect amounts tocomparing the goodness-of-fit of the null and nonnull models.

Page 61: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting a one-way ANOVA model

DefinitionFitting a one-way ANOVA model amounts to estimating itsparameters, namely assigning values to the parameters so thatthe model is as

:::::close as possible to the data.

Page 62: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting a one-way ANOVA model

DefinitionFitting a one-way ANOVA model amounts to estimating itsparameters, namely assigning values to the parameters so thatthe model is as

:::::close as possible to the data.

Close? ... minimization of the least squares criterion:

SS(µ1, . . . , µI) =

n1∑j=1

(Y1j − µ1)2 + . . .+

nI∑j=1

(YIj − µI)2.

by separately minimizing summands∑ni

j=1(Yij − µi)2 :

µi =Yi1 + . . .+ Yini

ni= Yi•

µi is said to be the least-squares estimator of µi .

Page 63: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting a one-way ANOVA model

DefinitionFitting a one-way ANOVA model amounts to estimating itsparameters, namely assigning values to the parameters so thatthe model is as

:::::close as possible to the data.

DefinitionAn estimator θ of θ is a function of the data, designed toensure that θ is close to the true value θ of the parameter.

Page 64: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting a one-way ANOVA modelR script

> bfat.lm1 = lm(BFAT ˜ -1+GENET,data=dta,+ na.action=na.exclude)

# na.action tells what to do with missing data (exclusion)

> coef(bfat.lm1) # Extracts estimated coefficients

GENETP0 GENETP25 GENETP5015.16467 12.95435 12.87071

Page 65: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting a one-way ANOVA modelR script

> bfat.lm2 = lm(BFAT ˜ GENET,data=dta,+ na.action=na.exclude)

> coef(bfat.lm2)

(Intercept) GENETP25 GENETP5015.164667 -2.210319 -2.293952

Page 66: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Estimation accuracy

DefinitionThe accuracy of an estimator θ of θ is usually measured by:• its bias, namely the expected value of the estimation errorθ − θ: bθ = E(θ − θ);

• the Root Mean Squared Error (RMSE) of the estimationerror: MSEθ = E

((θ − θ)2).

If bθ = 0,

• θ is said to be unbiased.• RMSEθ coincides with the standard deviation σθ of θ.

Page 67: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Accuracy of least-squares estimationThe estimation error µi − µi has the following expression:

µi − µi =Yi1 + . . .+ Yini

ni− µi =

(Yi1 − µi) + . . .+ (Yini − µi)

ni,

=εi1 + . . .+ εini

ni.

Therefore,• µi − µi is normally distributed;• µi is unbiased;• MSEµi is given by:

MSEµi =Var(εi1) + . . .+ Var(εini )

n2i

,

=σ2 + . . .+ σ2

n2i

=σ2

ni.

Page 68: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Least-squares fit of the ANOVA modelDecomposition of Yij as a sum Yij + εij of:

• fitted values Yij = µi = µ+ αi , which variations are onlydue to the explanatory variable,

• and by difference, residuals εij = Yij − Yij = Yij − Yi•,which variations are within-group differences between theresponse values and their group means.

Page 69: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Least-squares fit of the ANOVA modelR script

> bfat.fit = fitted(bfat.lm2) # Extracts fitted values

> numSummary(bfat.fit,group=dta$GENET,+ statistics=c("mean","sd"))

mean sd data:nP0 15.16467 1.162899e-15 15

P25 12.95435 0.000000e+00 23P50 12.87071 5.086712e-15 21

Page 70: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Least-squares fit of the ANOVA modelR script

> bfat.res = residuals(bfat.lm2) # Extracts residuals

> numSummary(bfat.res,group=dta$GENET,+ statistics=c("mean","sd"))

mean sd data:nP0 1.110223e-15 2.593618 15

P25 1.927052e-17 3.628696 23P50 -2.193599e-16 2.420862 21

Page 71: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Estimation of the residual standard deviationSince σ2 is the variance of the residual error terms:

σ2 =

∑n1j=1(Y1j − Y1•)

2 + . . .+∑nI

j=1(YIj − YI•)2

n − I,

=RSSn − I

.

Note: RSS is not divided by n but by n − I.

n − I are the residual degrees of freedom, which can beviewed as the number of linearly independent residuals.

Page 72: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Estimation of the residual standard deviationR script

> # Summary extracts useful statistics from the fitted model

> # including the residual standard deviation

> summary(bfat.lm2)$sigma

[1] 2.99127

Page 73: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

ANOVA equationTesting for an effect: comparison between the null andnonnull models

Goodness-of-fit is measured by the residual sum of squares:

RSS =I∑

i=1

ni∑j=1

(Yij − Yi•)2, [nonnull model]

RSS0 =I∑

i=1

ni∑j=1

(Yij − Y••)2. [null model]

where Y•• = n1n Y1• + n2

n Y2• + . . .+ nIn YI•.

Page 74: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

ANOVA equationTesting for an effect: comparison between the null andnonnull models

Goodness-of-fit is measured by the residual sum of squares:

RSS =I∑

i=1

ni∑j=1

(Yij − Yi•)2, [nonnull model]

RSS0 =I∑

i=1

ni∑j=1

(Yij − Y••)2. [null model]

ANOVA equation : RSS0 =I∑

i=1

ni(Yi• − Y••)2 + RSS.

Page 75: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Model assessment using the R2

Effect is assessed by the following ratio:

R2 =RSS0 − RSS

RSS0,

=

∑Ii=1 ni(Yi• − Y••)2

RSS0.

Indeed,• 0 ≤ R2 ≤ 1,• R2 = 0 corresponds to ’no effect’,• R2 = 1 corresponds to ’perfect effect’.

Page 76: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Model assessment using the R2

R script

> # Extracts the R2

> summary(bfat.lm2)$r.squared

[1] 0.1016868

Page 77: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Significance of an effect using the F-testIs R2 = 0.10 too weak to consider that genetic-type differencesexist at the population level?

Test statistics (the F-test):

F =(RSS0 − RSS)/(I − 1)

RSS/(n − I).

Degrees of freedom:• RSS: n − I d.f;• RSS0 − RSS: I − 1 d.f.

Indeed, the between-group variations ni(Yi• −Y••) sums tozero: only I − 1 of them are linearly independent.

Page 78: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Significance of an effect using the F-testR script

> # Extracts the F-test statistics

> summary(bfat.lm2)$fstatistic

value numdf dendf3.169528 2.000000 56.000000

Page 79: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision ruleSignificance or not of an effect now relies on the judgement thatF = 3.170 is abnormally large or not, regarding its nulldistribution.

Here, the null distribution is the Fisher distribution FI−1,n−I withI − 1 and n − I degrees of freedom.

Page 80: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision ruleR script

> # Defines a sequence of x values

> x = seq(from=0,to=8,by=0.01)

> # Calculates the Fisher density function for each x

> y = df(x,df1=2,df2=56)

> # Plots the density function

> plot(x,y,type="l",xlab="F-test statistics",ylab="Density",+ main="Density function of the Fisher distribution+ with 2 and 56 d.f.",lwd=2)

Page 81: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision rule

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Density function of the Fisher distribution with 2 and 56 d.f.

Fisher test statistics

Den

sity

Page 82: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision ruleR script

> # p-value of the test

> pf(3.170,df1=2,df2=56,lower.tail=FALSE)

[1] 0.04963573

Page 83: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision ruleR script

> # Displays the complete ANOVA table

> anova(bfat.lm2)

Analysis of Variance TableResponse: BFAT

Df Sum Sq Mean Sq F value Pr(>F)GENET 2 56.72 28.3600 3.1695 0.04966

Residuals 56 501.07 8.9477

Page 84: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

p-value and decision ruleFirst row of ANOVA table for the effect of the genetic type andsecond row for the residual error:• Df: degrees of freedom, respectively I − 1 and n − I;• Sum Sq: sum-of-squares, respectively RSS0 − RSS and

RSS;• Mean Sq: mean-squares, respectively

(RSS0 − RSS)/(I − 1) and RSS/(n − I);• F value: F-statistics, the ratio of mean-squares;• Pr(>F): p-value of the F-test.

Page 85: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparisonTest for the effect of a 2-level factor:{

H0 : δ = µ1 − µ2 = 0H1 : δ = µ1 − µ2 6= 0

R script

> bfatsex.lm = lm(BFAT ˜ SEX,data=dta)

> anova(bfatsex.lm)

Analysis of Variance TableResponse: BFAT

Df Sum Sq Mean Sq F value Pr(>F)SEX 1 74.0 74.001 8.6237 0.004752

Residuals 58 497.7 8.581

Page 86: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F =n1(Y1• − Y••)2 + n2(Y2• − Y••)2

σ2 ,

with

n1(Y1• − Y••)2 = n1

(Y1• −

n1Y1• + n2Y2•n1 + n2

)2,

= n1n22

( Y2• − Y1•n1 + n2

)2,

n2(Y2• − Y••)2 = n2n21

( Y2• − Y1•n1 + n2

)2.

Page 87: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F =n1(Y1• − Y••)2 + n2(Y2• − Y••)2

σ2 ,

=( Y2• − Y1•

σ

)2 n1n2(n1 + n2)

(n1 + n2)2 ,

=( Y2• − Y1•

σ√

1n1

+ 1n2︸ ︷︷ ︸

T

)2.

Page 88: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F = T 2, where T =Y2• − Y1•

σ√

1n1

+ 1n2

σδ.

Indeed,

σ2δ

= Var(Y1• − Y2•),

= Var(Y1•) + Var(Y2•),

= σ2( 1

n1+

1n2

).

Page 89: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F = T 2, where T =Y2• − Y1•

σ√

1n1

+ 1n2

σδ.

DefinitionLet θ be an estimator of θ and σθ and estimator of the standarddeviation of θ.

For the test of H0: θ = θ0, Tθ0 = (θ − θ0)/σθ is called a t-teststatistics.

Page 90: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testR script

> # var.equal=TRUE in t.test states that> # within-group standard deviations are assumed to be equal> # mu=0 states that the mean difference under the null is zero> # mu=0 is the default option

> t.test(BFAT ˜ SEX,var.equal=TRUE,data=dta,mu=0)$statistic

t-2.936608

Page 91: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F = T 2, where T =Y2• − Y1•

σ√

1n1

+ 1n2

σδ.

In the present situation, the null distribution of T is the Studentdistribution with n − 2 degrees of freedom, denoted Tn−2.

R script

> # pt is the Student probability distribution function> 2*pt(2.936608,df=58,lower.tail=FALSE)

[1] 0.004751769

Page 92: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-group comparison t-testLet us first focus on the F-test statistics:

F = T 2, where T =Y2• − Y1•

σ√

1n1

+ 1n2

σδ.

In the present situation, the null distribution of T is the Studentdistribution with n − 2 degrees of freedom, denoted Tn−2.

R script

> # The same value is also provided as an output of t.test> t.test(BFAT ˜ SEX,var.equal=TRUE,data=dta)$p.value

[1] 0.004751764

Page 93: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

One-sided t-testsTest of H0: δ = 0 against H1: δ < 0.

The rejection rule is one-sided: H0 is rejected if T isconsidered as suspiciously too small under the null.

Consistently, the p-value is just the probability that a Tn−2variable is lower than the observed value of T :

R script

> pt(-2.936608,df=58,lower.tail=TRUE)

[1] 0.002375884

Page 94: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

One-sided t-testsTest of H0: δ = 0 against H1: δ < 0.

The rejection rule is one-sided: H0 is rejected if T isconsidered as suspiciously too small under the null.

Consistently, the p-value is just the probability that a Tn−2variable is lower than the observed value of T :

R script

> # alternative="less" is used for the present one-sided test> t.test(BFAT ˜ SEX,var.equal=TRUE,data=dta,+ alternative="less")$p.value

[1] 0.002375882

Page 95: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersBack to the two-sided test, with |T | = 2.9366:

The difference between the mean backfat depths of males andfemales is significant, with type-I error level α = 0.05.

What would be the largest value t? of |T | for which the nullhypothesis would not be rejected, with type-I error level α?

Page 96: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersBack to the two-sided test, with |T | = 2.9366:

The difference between the mean backfat depths of males andfemales is significant, with type-I error level α = 0.05.

What would be the largest value t? of |T | for which the nullhypothesis would not be rejected, with type-I error level α?

If |T | = t?, then the p-value takes its largest value α, over whichthe null hypothesis is not rejected.

Therefore, t? = t(n−2)1−α/2 is the 100(1− α/2)%-quantile of the

Tn−2 distribution.

Page 97: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersBack to the two-sided test, with |T | = 2.9366:

The difference between the mean backfat depths of males andfemales is significant, with type-I error level α = 0.05.

What would be the largest value t? of |T | for which the nullhypothesis would not be rejected, with type-I error level α?

R script

> qt(0.975,df=58)

[1] 2.001717

Page 98: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersThe confidence interval CI1−α(δ), with confidence level 1− α,can be viewed as:

CI1−α(δ) = {δ0, H0 : δ = δ0 is not rejected at level α} ,

=

{δ0, −t(n−2)

1−α/2 ≤δ − δ0

σδ≤ t(n−2)

1−α/2

},

=[δ − t(n−2)

1−α/2σδ; δ + t(n−2)1−α/2σδ

].

DefinitionThe set of values θ0 such that the null hypothesis H0 : θ = θ0 isnot rejected by a pre-chosen test at type-I error level α is aconfidence interval for θ, with confidence level 1− α.

Page 99: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersR script

> # var.equal=TRUE in t.test states that> # within-group standard deviations are assumed to be equal> t.test(BFAT ˜ SEX,var.equal=TRUE,data=dta)

Two Sample t-testdata: BFAT by SEX

t = -2.9366, df = 58, p-value = 0.004752alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:

-3.7434566 -0.7086862

sample estimates:mean in group F mean in group M

12.38500 14.61107

Page 100: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for parametersFor the mean parameters µ1 and µ2:

CI1−α(µi) =[µi − t(n−2)

1−α/2σ√ni

; µi + t(n−2)1−α/2

σ√ni

].

R script

> # Confidence intervals for mean backfat depths by sex> bfatsex.lm1 = lm(BFAT ˜ -1+SEX,data=dta)

> # level = 0.95 (default) sets the confidence level at 0.95> cbind(coef(bfatsex.lm1),confint(bfatsex.lm1,level=0.95))

2.5 % 97.5 %SEXF 12.38500 11.34843 13.42157SEXM 14.61107 13.50293 15.71921

Page 101: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testTo which extent can the t-test detect a targeted meandifference at the population level?

In the pig data context: if the mean difference at the populationlevel between males and females is 1, can we be sure that thetest will declare the effect as significant?

Page 102: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testTo which extent can the t-test detect a targeted meandifference at the population level?

In the pig data context: if the mean difference at the populationlevel between males and females is 1, what is the probabilitythat the t-test declares the effect as significant?

DefinitionLet us consider the test of the null hypothesis H0 : θ = θ0against the alternative hypothesis H1 : θ = θ0 + τ , with τ 6= 0,at the type-I error level α.

The power of the test is the probability that the test rejects thenull under H1:

Power(τ) = Pθ=θ0+τ(|T | ≥ t(n−2)1−α/2).

Page 103: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testDistribution of the t-test statistics under H1 : θ = θ0 + τ :noncentral Student distribution Tn−2(λ) with:

λ =τ

σ√

1n1

+ 1n2

. [noncentrality parameter]

Page 104: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testR script

> x = seq(from=-5,to=5,by=0.1)# Sequence of regularly spaced values in [-5,5]

> dstud = dt(x,df=58)# Density function of the Student distribution (58 d.f.)

> plot(x,dstud,main="Density function of Student distribution (58 d.f.)",+ ylab="Density",lwd=2,col="darkgray",cex.lab=1.25,cex.axis=1.25,type="l",+ ylim=c(0,0.5)) # Plots the density curve

> dstud.nc = dt(x,df=58,ncp=1)> # Density of the noncentral Student distribution (58 d.f., lambda=1)

> lines(x,dstud.nc,lwd=2,col="blue") # Adds the noncentral density curve> legend("topleft",col=c("darkgray","blue"),lwd=2,bty="n",+ legend=c("Student (58 d.f)","Noncentral Student (58 d.f., ncp=1)"))

Page 105: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-test

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

Density function of Student distribution (58 d.f.)

x

Den

sity

Student (58 d.f)Noncentral Student (58 d.f., ncp=1)

Page 106: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testR script

> qttest = qt(0.975,df=58) # 97.5% quantile of the null distribution

> qttest

[1] 2.001717

> sigma = summary(bfatsex.lm)$sigma # Residual standard deviation

> lambda = 1/(sigma*sqrt((1/32)+(1/28))) # Noncentrality parameter

> pt(-qttest,df=58,ncp=lambda)++ pt(qttest,df=58,ncp=lambda,lower.tail=FALSE) # Power of the t-test

[1] 0.2543652

Page 107: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Power of the t-testR script

> power.t.test(delta=1,n=30,sd=sigma,sig.level=0.05)> # delta is the mean difference to detect at population level> # n = 30 is the group size> # sd is the within-group standard deviation> # sig.level is the type-I error level

Two-sample t test power calculation

n = 30delta = 1

sd = 2.929351sig.level = 0.05

power = 0.2547309alternative = two.sided

NOTE: n is number in *each* group

Page 108: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

What makes a t-test powerful?The larger λ, the more powerful the t-test.

Consequently, the power of a t-test depends on:• |τ |/σ: the larger, the more powerful the test.

R script

> power.t.test(delta=5,n=30,sd=sigma,sig.level=0.05)$power

[1] 0.9999972

• n1 and n2: the larger, the more powerful the test.

R script

> power.t.test(delta=1,power=0.90,sd=sigma,sig.level=0.05)$n

[1] 181.2963

Page 109: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Post-hoc testsDeclaring the effect of a factor as significant: some group meanresponses are different.

Which groups?

For example, significant effect of ’Genetic type’:• P0 6= P25 6= P50?• or P0 6= {P25 = P50}?• or ...

Post-hoc tests for I groups: I(I − 1)/2 simultaneous tests ofthe null hypotheses H(ii ′)

0 : αi = αi ′ , for 1 ≤ i < i ′ ≤ I.

Page 110: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Post-hoc testsIs it safe to test simultaneously many hypotheses withα = 0.05?

If I is large, I(I − 1)/2 of pairwise comparisons can becomevery large,

... which increases the probability of one or more erroneousrejections of H(ii ′)

0 :

1− Pall H(ii′)

0(H(12)

0 not rejected, . . . ,H(I−1,I)0 not rejected),

= 1− PH(12)

0(H(12)

0 not rejected) . . .PH(I−1,I)

0(H(I−1,I)

0 not rejected),

[Under an independence assumption among tests]≤ 1− (1− α)I(I−1)/2

Page 111: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Post-hoc testsIs it safe to test simultaneously many hypotheses withα = 0.05?

If I is large, I(I − 1)/2 of pairwise comparisons can becomevery large,

... which increases the probability of one or more erroneousrejections of H(ii ′)

0 :

Family Wise Error Rate (FWER) ≤ 1− (1− α)I(I−1)/2

Page 112: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Post-hoc testsR script

> 1-(1-0.05) ˆ 3

[1] 0.142625

Page 113: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Post-hoc testsSidak’s correction of α: α∗ = 1− (1− α)2/I(I−1) guaranteesthat FWER ≤ α:

R script

> alpha = 1-(1-0.05) ˆ (1/3)

> alpha

[1] 0.01695243

> 1-(1-alpha) ˆ 3

[1] 0.05

Page 114: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Simultaneous testsR script

> bfat.lm = lm(BFAT ˜ GENET,data=dta)

> summary(bfat.lm)$coefficients

Estimate Std. Error t value Pr(>|t|)(Intercept) 15.164667 0.7723426 19.634636 8.809707e-27

GENETP25 -2.210319 0.9927454 -2.226471 3.002270e-02GENETP50 -2.293952 1.0112339 -2.268469 2.717540e-02

> tmp = dta # Temporary dataset similar to dta

> tmp$GENET = relevel(dta$GENET,"P25")> # tmp$GENET = dta$GENET except that the reference level is P25

> tmp.lm = lm(BFAT ˜ GENET,data=tmp)

> summary(tmp.lm)$coefficients # P25 vs P50

Estimate Std. Error t value Pr(>|t|)(Intercept) 12.95434783 0.6237230 20.76939415 5.504528e-28GENETP0 2.21031884 0.9927454 2.22647094 3.002270e-02

GENETP50 -0.08363354 0.9028351 -0.09263435 9.265247e-01

Page 115: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Paired data’Toy’ case study: 2 food products rated by 3 observers on a’liking’ scale from 1 (’dislike’) to 10 (’like’).

ObserversProducts J1 J2 J3

A 2 3 6B 4 6 8

DefinitionLetYij denote the response value measured for the j th samplingitem, 1 ≤ j ≤ J, in the i th group, 1 ≤ 1 ≤ I.

If the j th sampling item is the same in all groups, then the dataare said to be paired.

Page 116: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Paired dataR script

> toydta = data.frame(Observer=rep(c("J1","J2","J3"),rep(2,3)),+ Product=rep(c("A","B"),3),Rating=c(2,4,3,6,6,8))

> toydta

Observer Product Rating1 J1 A 22 J1 B 43 J2 A 34 J2 B 65 J3 A 66 J3 B 8

Page 117: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Paired data’Toy’ case study: 2 food products rated by 3 observers on a’liking’ scale from 1 (’dislike’) to 10 (’like’).

F-test for the ’product’ effect on ’rating’R script

> toydta.lm1 = lm(Rating ˜ Product,data=toydta)

> anova(toydta.lm1)

Analysis of Variance Table

Response: RatingDf Sum Sq Mean Sq F value Pr(>F)

Product 1 8.1667 8.1667 1.96 0.2341Residuals 4 16.6667 4.1667

Page 118: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Paired data

Can we seriously declare that there is no statisticalevidence of a ’product’ effect?

ObserversProducts J1 J2 J3

A 2 3 6B 4 6 8

Page 119: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

A 2-way analysis of variance model for paired data

DefinitionLetYij denote the response value measured for the j th samplingitem, 1 ≤ j ≤ J, in the i th group, 1 ≤ 1 ≤ I:

Yij = µ+ αi + βj + eij ,

where• αi , i = 1, . . . , I, are the group effect parameters,• βj , j = 1, . . . , J are the ’individual’ effect parameters.

The residual error eij ∼ N (0;σ).

Remark: the 1-way ANOVA Model is a submodel of the 2-wayANOVA model: εij = βj + eij .

Page 120: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelMinimization of the least-squares criterion:

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2 = min

µ,αi ,βj

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2.

Page 121: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelMinimization of the least-squares criterion:

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2 = min

µ,αi ,βj

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2.

Equating to zero the partial derivatives of the least-squarescriterion with respect to µ, αi , i = 2, . . . , I and βj , j = 2, . . . , J:−2∑I

i=1∑J

j=1(Yij − µ− αi − βj) = 0,−2∑J

j=1(Yij − µ− αi − βj) = 0, for i = 2, . . . , I,−2∑I

i=1(Yij − µ− αi − βj) = 0, for j = 2, . . . , J.

Page 122: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelMinimization of the least-squares criterion:

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2 = min

µ,αi ,βj

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2.

Developing the sums in each equation and dividing by thenumber of summands:

Y•• − µ−∑I

i=1 αi/I −∑J

j=1 βj/J = 0,Yi• − µ− αi −

∑Jj=1 βj/J = 0, for i = 2, . . . , I,

Y•j − µ−∑I

i=1 αi/I − βj = 0, for j = 2, . . . , J,

where Y•j =∑I

i=1 Yij/I.

Page 123: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelMinimization of the least-squares criterion:

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2 = min

µ,αi ,βj

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2.

It is deduced that µ = Y•• −∑I

i=1 αi/I −∑J

j=1 βj/J.

Plugging-in µ in the remaining equations:

αi −I∑

i=1

αi/I = Yi• − Y••, for i = 2, . . . , I,

βj −J∑

j=1

βj/J = Y•j − Y••, for j = 2, . . . , J.

Page 124: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelMinimization of the least-squares criterion:

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2 = min

µ,αi ,βj

I∑i=1

J∑j=1

(Yij − µ− αi − βj)2.

Summing the (I − 1) first (respectively (J − 1) last) equationsabove gives:•∑I

i=1 αi/I = −(Y1• − Y••)

•∑J

j=1 βj/J = −(Y•1 − Y••).Therefore,• αi = Yi• − Y1•, for i = 1, . . . , I.• βj = Y•j − Y•1, for j = 1, . . . , J.

Page 125: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelR script

> toydta.lm2 = lm(Rating ˜ Product+Observer,data=toydta)

> # Extract the least-squares estimation of coefficients> coef(toydta.lm2)

(Intercept) ProductB ObserverJ2 ObserverJ31.833333 2.333333 1.500000 4.000000

Page 126: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Fitting of the two-way analysis of variance modelResiduals: eij = Yij − µ− αi − βj = Yij − Yi• − Y•j + Y••.

Estimation of the residual variance σ2:

σ2 =

∑Ii=1∑J

j=1(Yij − Yi• − Y•j + Y••)2

(I − 1)(J − 1).

R script

> summary(toydta.lm2)$sigma

[1] 0.4082483

> summary(toydta.lm1)$sigma

[1] 2.041241

Page 127: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-way analysis of variance F-testLet us start from the one-way ANOVA equation :

∑Ii=1∑J

j=1(Yij − Y••)2

=I∑

i=1

J(Yi• − Y••)2 +I∑

i=1

J∑j=1

(Yij − Yi•)2,

=I∑

i=1

J(Yi• − Y••)2 +J∑

j=1

I(Y•j − Y••)2 +I∑

i=1

J∑j=1

(Yij − Yi• − Y•j + Y••

)2.

Page 128: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-way analysis of variance F-testLet us start from the one-way ANOVA equation :∑I

i=1∑J

j=1(Yij − Y••)2

=I∑

i=1

J(Yi• − Y••)2 +I∑

i=1

J∑j=1

(Yij − Yi•)2,

=I∑

i=1

J(Yi• − Y••)2 +J∑

j=1

I(Y•j − Y••)2 +I∑

i=1

J∑j=1

(Yij − Yi• − Y•j + Y••

)2.

R script

> toydta.lm2 = lm(Rating ˜ Product+Observer,data=toydta)

> anova(toydta.lm2)

Analysis of Variance Table

Response: RatingDf Sum Sq Mean Sq F value Pr(>F)

Product 1 8.1667 8.1667 49 0.0198Observer 2 16.3333 8.1667 49 0.0200Residuals 2 0.3333 0.1667

Page 129: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Two-way analysis of variance F-testEquivalent paired t-test for a two-group comparison:

R script

> # Paired t-test> t.test(Rating ˜ Product,data=toydta,paired=TRUE)

Paired t-test

data: Rating by Product

t = -7, df = 2, p-value = 0.0198

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:-3.7675509 -0.8991158

sample estimates:mean of the differences

-2.333333

Page 130: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Outline1 Effect at a population level2 Decision making procedure3 Testing for a group effect

Exploring for a group effectOne-way analysis-of-variance modelLeast-squares estimation of effect parametersF-testThe special case of a two-level factor: t-testDetailing a significant group effectTesting a group effect using paired data

4 Linear effectLinearity of an effectLinear regression modellingLeast-squares fittingF-testComparing regression lines

Page 131: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Prediction of the Lean Meat PercentageR script

> # Scatterplot of LMP against backfat depth> with(dta,plot(BFAT,LMP,bty="l",xlab="Backfat depth (mm)",+ ylab="LMP",cex.lab=1.25,pch=16,+ main="Effect of backfat depth on Lean Meat Percentage"))

Page 132: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Prediction of the Lean Meat Percentage

● ●

●●

●●●

●●

●●

●●●

10 15 20

5055

6065

Effect of backfat depth on Lean Meat Percentage

Backfat depth (mm)

LMP

Page 133: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Prediction of the Lean Meat PercentageR script

> # Affects each item in a class of the partition 7<9<11<...<23> cutbfat = cut(dta$BFAT,breaks=seq(from=7,to=23,by=2))

> xcenters = seq(from=8,to=22,by=2) # Centers of the classes

> # Means of LMP in each class of the partition> ymeans = tapply(dta$LMP,cutbfat,mean)

> # Empirical effect curve> lines(xcenters,ymeans,lwd=2,type="b",pch=16,+ col="blue",cex=2)

Page 134: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Prediction of the Lean Meat Percentage

● ●

●●

●●●

●●

●●

●●●

10 15 20

5055

6065

Effect of backfat depth on Lean Meat Percentage

Backfat depth (mm)

LMP

●●

Page 135: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

CorrelationThe statistical evidence of a linear relationship should notdepend,• neither on the position• nor on the dispersion

of the marginal distributions of X and Y .

... can be assessed on the scaled series x and y .

DefinitionLet (x1, . . . , xn) be a series of numeric values, with empiricalmean x and standard deviation sx , then the scaled values xiare obtained by subtracting the mean and dividing by thestandard deviation:

xi =xi − x

sx.

Page 136: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

CorrelationR script

> bfat.scaled = scale(dta$BFAT)

> lmp.scaled = scale(dta$LMP)

> mean(bfat.scaled);mean(lmp.scaled)

[1] 2.366597e-16[1] 5.855451e-16

> sd(bfat.scaled);sd(lmp.scaled)

[1] 1[1] 1

Page 137: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

CorrelationScaled values can be used to identify extreme values:

R script

> # Identification of ’extreme’ backfat values> extrem.bfat = which(abs(bfat.scaled)>1.96)

> # Display of the ’outliers’> data.frame(WHICH=extrem.bfat,+ BFAT=bfat.scaled[extrem.bfat],LMP=lmp.scaled[extrem.bfat])

WHICH BFAT LMP1 6 2.817719 -1.0895382 43 -2.062037 1.4551893 48 2.167192 -2.437327

Page 138: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Correlation

DefinitionThe correlation coefficient rxy is the mean cross-product ofthe scaled values:

rxy =

∑ni=1 xi yi

n − 1.

Equivalently,

rxy =1

n − 1

∑ni=1(xi − x)(yi − y)

sxsy=

sxy

sxsy,

where sxy is the covariance of the series:

sxy =

∑ni=1(xi − x)(yi − y)

n − 1.

Page 139: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Correlationrxy can be interpreted as follows:• rxy ≈ 1 can be a good indicator of a clear and increasing

linear relationship between X and Y .• rxy ≈ −1 can be a good indicator of a clear and decreasing

linear relationship between X and Y .• rxy ≈ 0 can be a good indicator of an absence of a linear

relationship between X and Y .rxy should complete the visual impression deduced from ascatterplot

Page 140: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

CorrelationR script

> # Calculation of the correlation coefficient between> # backfat depth and LMP> cor(dta$BFAT,dta$LMP)

[1] -0.7770074

Page 141: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regressionRelationship between the LMP (Y ) and the backfat depth(X ): the way E(Y | X = x) depends on x can be assumed to bewell described by a linear function of x .

DefinitionIt is assumed that, given X = x , Y is normally distributed, withthe same standard deviation for all x .

There is a linear effect of X on Y if the conditional mean of Ygiven X = x is a linear function of x :

E(Y | X = x) = β0 + β1x ,

where β0 is the intercept parameter and β1 the slopeparameter. The above model is named simple linearregression model.

Page 142: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regressionWhy regression?

The statistical meaning of regression is inherited from

Francis Galton (1886). "Regression towards mediocrity inhereditary stature". The Journal of the Anthropological Institute

of Great Britain and Ireland, Vol. 15. 246–263

who aimed at understanding the heritability of the phenotypeheight in humans.

Page 143: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regression

●●

●●

● ● ●●

●●

● ● ●

●●

●●

● ●● ●● ●

● ●

● ● ●● ●

●●●

●●

● ● ●●

● ●● ●

● ●●

●● ●

● ● ● ●●

● ●●

●●

● ●●

● ●● ●

●●

●●

● ●

●●

●●

●●●

●●

●● ● ●●

●●

● ●●

●●

●●

●●●●

●●●

●●

●●●

● ●

●●

●●

● ●●

●● ●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

●● ●

●● ●●

●●●

●●

●● ●

●●●

●● ●●

●●

●●●

●●●●●●●

●●

● ● ●●

●●●

●● ●●

●●

●●

●● ● ●

● ●●

●●●●●

●● ●●

●●

●●●

● ●● ●● ●

●●●●●●

● ●●●

●●

●●

●●

●●●

●●● ●

●●● ●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●●

●●●

●●

●●● ●

● ●

●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●●

● ●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●●●

●●●

●●

●●

●●

●●●

●●●

●●● ●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●● ●●● ●

●●

●●

●●

●●●

●●

●●●

●● ●

● ●●

●●●

●●

● ●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●● ●

● ●● ●●

●●

●●

●●●

● ●

●●

● ●●●

●●●

● ●●●

● ●

●●●● ●●

● ●

● ●●

● ●●

●●

●●

●●

●●●

● ●●

●●

● ●●

●●●

●●

●●

●●●● ●

●● ● ●

● ●●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

● ●● ●●

●●●●

●●●

●●

●● ●●

● ●●● ●

●●

●●

●●●●●

●●

● ●

● ●

●● ●● ●● ●

●●

●●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●● ● ●

●●●

●●

● ●

●●●●

●●●

● ●●● ●

●●

●●

●●

● ●●

●●●

●●

●●● ●

●●

●● ● ●

●● ●●●

60 65 70 75

6065

7075

Galton (1886)'s data

Height of mid−parent (inches)

Hei

ght o

f chi

ld (

inch

es)

Page 144: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regression

●●

●●

● ● ●●

●●

● ● ●

●●

●●

● ●● ●● ●

● ●

● ● ●● ●

●●●

●●

● ● ●●

● ●● ●

● ●●

●● ●

● ● ● ●●

● ●●

●●

● ●●

● ●● ●

●●

●●

● ●

●●

●●

●●●

●●

●● ● ●●

●●

● ●●

●●

●●

●●●●

●●●

●●

●●●

● ●

●●

●●

● ●●

●● ●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

●● ●

●● ●●

●●●

●●

●● ●

●●●

●● ●●

●●

●●●

●●●●●●●

●●

● ● ●●

●●●

●● ●●

●●

●●

●● ● ●

● ●●

●●●●●

●● ●●

●●

●●●

● ●● ●● ●

●●●●●●

● ●●●

●●

●●

●●

●●●

●●● ●

●●● ●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●●

●●●

●●

●●● ●

● ●

●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●●

● ●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●●●

●●●

●●

●●

●●

●●●

●●●

●●● ●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●● ●●● ●

●●

●●

●●

●●●

●●

●●●

●● ●

● ●●

●●●

●●

● ●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●● ●

● ●● ●●

●●

●●

●●●

● ●

●●

● ●●●

●●●

● ●●●

● ●

●●●● ●●

● ●

● ●●

● ●●

●●

●●

●●

●●●

● ●●

●●

● ●●

●●●

●●

●●

●●●● ●

●● ● ●

● ●●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

● ●● ●●

●●●●

●●●

●●

●● ●●

● ●●● ●

●●

●●

●●●●●

●●

● ●

● ●

●● ●● ●● ●

●●

●●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●● ● ●

●●●

●●

● ●

●●●●

●●●

● ●●● ●

●●

●●

●●

● ●●

●●●

●●

●●● ●

●●

●● ● ●

●● ●●●

60 65 70 75

6065

7075

Galton (1886)'s data

Height of mid−parent (inches)

Hei

ght o

f chi

ld (

inch

es)

Page 145: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regression

●●

●●

● ● ●●

●●

● ● ●

●●

●●

● ●● ●● ●

● ●

● ● ●● ●

●●●

●●

● ● ●●

● ●● ●

● ●●

●● ●

● ● ● ●●

● ●●

●●

● ●●

● ●● ●

●●

●●

● ●

●●

●●

●●●

●●

●● ● ●●

●●

● ●●

●●

●●

●●●●

●●●

●●

●●●

● ●

●●

●●

● ●●

●● ●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●● ●

●●

●●

●● ●

●● ●●

●●●

●●

●● ●

●●●

●● ●●

●●

●●●

●●●●●●●

●●

● ● ●●

●●●

●● ●●

●●

●●

●● ● ●

● ●●

●●●●●

●● ●●

●●

●●●

● ●● ●● ●

●●●●●●

● ●●●

●●

●●

●●

●●●

●●● ●

●●● ●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●

●●●

●●●

●●

●●●

●●●

●●

●●● ●

● ●

●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●

●●

● ●●

●●●

●●●

● ●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

● ●●●

●●●

●●

●●

●●

●●●

●●●

●●● ●●

●●●

●●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

●●●●

●● ●●● ●

●●

●●

●●

●●●

●●

●●●

●● ●

● ●●

●●●

●●

● ●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●● ●

●● ●

● ●● ●●

●●

●●

●●●

● ●

●●

● ●●●

●●●

● ●●●

● ●

●●●● ●●

● ●

● ●●

● ●●

●●

●●

●●

●●●

● ●●

●●

● ●●

●●●

●●

●●

●●●● ●

●● ● ●

● ●●●

●●

●●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

● ●● ●●

●●●●

●●●

●●

●● ●●

● ●●● ●

●●

●●

●●●●●

●●

● ●

● ●

●● ●● ●● ●

●●

●●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●●

●● ●

●●

●●

●● ● ●

●●●

●●

● ●

●●●●

●●●

● ●●● ●

●●

●●

●●

● ●●

●●●

●●

●●● ●

●●

●● ● ●

●● ●●●

60 65 70 75

6065

7075

Galton (1886)'s data

Height of mid−parent (inches)

Hei

ght o

f chi

ld (

inch

es)

Page 146: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regression

Page 147: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regressionRegression models are models for the conditional expectationof Y , given a profile of p explanatory variables x = (x1, . . . , xp)′:

E(Y | X = x) = f (x),

where f gives its specific shape to the regression model.

The simple linear regression model is a particular case:• x is restricted to only one variable (hence simple);• f (x) has the specific form of a straight line;• the conditional distribution of Y given X = x is normal;• the conditional variance of Y given X = x is constant.

Page 148: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Linear regressionAlternatively,

Y = β0 + β1x + ε,

where ε = Y − E(Y | X = x) = Y − β0 − β1x is named theresidual term.

Given X = x , ε is normally distributed with• E(ε | X = x) = 0;• and Var(ε | X = x) = σ2.

Page 149: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Least-squares estimationData: (xi , yi)i=1,...,n.

Minimization of the least squares criterion SS(β):

SS(β) =n∑

i=1

(Yi − β0 − β1xi)2.

The least-squares estimators β0 and β1 are the minimizers ofSS(β).

For a sampling item with X = x , the fitted response value isY = β0 + β1x .

Analogously, the fitted regression line is the line with equationx 7→ β0 + β1x : the closest line from the data.

Page 150: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimatorsEstimating equations:{

∂SS∂β0

(β) = −2∑n

i=1(Yi − β0 − β1xi) = 0∂SS∂β1

(β) = −2∑n

i=1 xi(Yi − β0 − β1xi) = 0

Page 151: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimatorsEstimating equations:{

∂SS∂β0

(β) = −2∑n

i=1(Yi − β0 − β1xi) = 0∂SS∂β1

(β) = −2∑n

i=1 xi(Yi − β0 − β1xi) = 0

Dividing the first equation by n:

β0 = Y − β1x .

... the ’mean’ individual, with coordinates (x , y), lies in the fittedregression line.

Page 152: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimatorsEstimating equations:{

∂SS∂β0

(β) = −2∑n

i=1(Yi − β0 − β1xi) = 0∂SS∂β1

(β) = −2∑n

i=1 xi(Yi − β0 − β1xi) = 0

Plugging-in this expression in the second equation:

n∑i=1

xi(Yi − Y − β1[xi − x ]) = 0,

or equivalently, dividing by n − 1,

s2x β1 = sxy ,

β1 =sxy

s2x.

Page 153: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimatorsR script

> # Least-squares fit of the regression model> lmp.lm = lm(LMP ˜ BFAT,data=dta)

> # Extract estimated coefficients> beta = coef(lmp.lm)> beta

(Intercept) BFAT71.0607926 -0.8631092

Page 154: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimatorsR script

> # Adds the regression line on the scatterplot> abline(beta,lwd=2)

> # Adds a legend to the plot> legend("bottomleft",lwd=2,bty="n",col=c("blue","black"),+ legend=c("Empirical effect curve","Least-squares linear fit"))

Page 155: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Closed-form expression for least-squares estimators

● ●

●●

●●●

●●

●●

●●●

10 15 20

5055

6065

Effect of backfat depth on Lean Meat Percentage

Backfat depth (mm)

LMP

●●

Empirical effect curveLeast−squares linear fit

Page 156: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Residual standard deviationThe residuals εi = Yi − β0− β1xi linearly depend on each other:{

ε1 + . . .+ εn = 0x1ε1 + . . .+ xnεn = 0,

Degree-of-freedom corrected sample variance σ2:

σ2 =

∑ni=1(Yi − β0 − β1xi)

2

n − 2.

Page 157: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersβ0 and β1 are linear combinations of the Yi :

β1 =1

n − 1

n∑i=1

xi − xs2

xYi .

Page 158: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersβ0 and β1 are linear combinations of the Yi :

β1 =1

n − 1

n∑i=1

xi − xs2

xYi .

As a linear combination of the normally and independentlydistributed observations Yi , β1 is itself normally distributed with:

E(β1 | X = x) =1

n − 1

n∑i=1

xi − xs2

xE(Yi | Xi = xi),

= β0

[ 1n − 1

n∑i=1

xi − xs2

x

]+ β1

[ 1n − 1

n∑i=1

(xi − x)xi

s2x

],

= β1.

Page 159: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersβ0 and β1 are linear combinations of the Yi :

β1 =1

n − 1

n∑i=1

xi − xs2

xYi .

As a linear combination of the normally and independentlydistributed observations Yi , β1 is itself normally distributed with:

Var(β1 | X = x) =1

(n − 1)2

n∑i=1

(xi − x)2

s4x

Var(Yi | Xi = xi).

=σ2

(n − 1)2

n∑i=1

(xi − x)2

s4x

,

=σ2

n − 11s2

x.

Page 160: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersβ0 and β1 are linear combinations of the Yi :

β1 =1

n − 1

n∑i=1

xi − xs2

xYi .

As a linear combination of the normally and independentlydistributed observations Yi , β1 is itself normally distributed with:

Given X = x , β1 − β1 ∼ N (0;σ√

n − 11sx

)

The estimation accuracy is favored by• a small σ, or equivalently a good adequacy of the

regression model to the data,• a large sample size n,• a large dispersion of the values of x .

Page 161: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersTo sum up: β0 and β1 are normally distributed with means β0and β1 respectively and standard deviations σβ0

and σβ1respectively:

σ2β0

=σ2

n − 1

[n − 1n

+1s2

x

],

σ2β1

=σ2

n − 11s2

x.

Therefore, the following confidence intervals, with confidencelevel 1− α, are deduced for β0 and β1: for j = 0 or j = 1,

CI1−α(βj) =[βj − t(n−2)

1−α/2σβj; βj + t(n−2)

1−α/2σβj

],

where σβjis obtained by plugging-in the estimator σ2 of σ2

Page 162: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence intervals for the regression parametersR script

> # Confidence intervals for the regression coefficients> # level = 0.95 (default) sets the confidence level at 0.95

> cbind(coef(lmp.lm),confint(lmp.lm,level=0.95))

2.5 % 97.5 %(Intercept) 71.0607926 68.529254 73.5923316

BFAT -0.8631092 -1.046898 -0.6793204

Page 163: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineWe call confidence band of the regression line at level 1− αand we denote CB1−α(β) the following family of confidenceintervals:

CB1−α(β) = {CI1−α(β0 + β1x); for all x} ,

where CI1−α(β0 + β1x) is a confidence interval at level 1− α forβ0 + β1x .

Page 164: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineLet us start with the estimation of β0 + β1x :

Y = β0 + β1x = Y + β1(x − x) = Y +x − x

s2x

sxy ,

=1n

n∑i=1

[1 +

nn − 1

(x − x)(xi − x)

s2x

]Yi ,

=n∑

i=1

hi(x)Yi , where hi(x) =1n

+(x − x)

n − 1xi − x

s2x

.

Note:•∑n

i=1 hi(x) = 1•∑n

i=1 hi(x)xi = x

Page 165: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineLet us start with the estimation of β0 + β1x :

Y =n∑

i=1

hi(x)Yi , withn∑

i=1

hi(x) = 1,n∑

i=1

hi(x)xi = x .

Page 166: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineLet us start with the estimation of β0 + β1x :

Y =n∑

i=1

hi(x)Yi , withn∑

i=1

hi(x) = 1,n∑

i=1

hi(x)xi = x .

As a linear combination of the Yi , Y is normally distributed,with:

E(Y | X = x) =n∑

i=1

hi(x)E(Yi | Xi = xi),

= β0

[ n∑i=1

hi(x)]

+ β1

[ n∑i=1

hi(x)xi

],

= β0 + β1x .

Page 167: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineLet us start with the estimation of β0 + β1x :

Y =n∑

i=1

hi(x)Yi , withn∑

i=1

hi(x) = 1,n∑

i=1

hi(x)xi = x .

As a linear combination of the Yi , Y is normally distributed,with:

Var(Y | X = x) =n∑

i=1

h2i (x) Var(Yi | Xi = xi),

= σ2n∑

i=1

h2i (x) =

σ2

n

[1 +

nn − 1

(x − xsx

)2].

Page 168: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineIt is deduced from the above results that CI1−α(β0 + β1x) =

[Y − t(n−2)

1−α/2σ√

n

√1 +

nn − 1

(x − xsx

)2; Y + t(n−2)

1−α/2σ√

n

√1 +

nn − 1

(x − xsx

)2],

Whatever the model adequacy, the estimation accuracy can beimproved arbitrarily,

... provided the sample size can be increased with no limit.

Page 169: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression lineR script

# Defines a high resolution sequence of x values> x = seq(from=6,to=23,length=1000)

> # Calculates predictions for the sequence of x values> # with 95%-confidence intervals for the regression line

> pred = predict(lmp.lm,newdata=data.frame(BFAT=x),interval="confidence")> # pred has 3 columns: fitted values, upper and lower limits

> # Plots the upper limit of the confidence band> plot(x,pred[,"upr"],type="l",xlab="Backfat depth",ylab="LMP",ylim=c(48,68))

> lines(x,pred[,"lwr"]) # Adds the lower limit of the confidence band ...

> # Shades the confidence band

> polygon(c(x,rev(x)),c(pred[,"upr"],rev(pred[,"lwr"])),col="gray95")

> lines(x,pred[,"fit"],lwd=2) # Adds the regression line

Page 170: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for the regression line

10 15 20

5055

6065

Backfat depth

LMP

Page 171: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for predictionWhatever the estimation accuracy, the prediction accuracyhas also to account for the model adequacy.

Let Y ? be the unobserved response value of an item for whichX ? = x?:

Y ? = β0 + β1x? is the corresponding prediction

with Var(Y ? − Y ? | X = x , X ? = x?)

= Var(Y ? | X ? = x?) + Var(Y ? | X = x),

= σ2 +σ2

n

[1 +

nn − 1

(x? − xsx

)2].

Therefore, the lowest variance of the prediction error is reachedfor x? = x and equals σ2 + σ2/n.

Page 172: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for predictionR script

> # Calculates predictions for the sequence of x values> # with 95%-confidence intervals for the predictions> pred = predict(lmp.lm,newdata=data.frame(BFAT=x),interval="prediction")

> # Plots the limits of the confidence band> lines(x,pred[,"upr"])> lines(x,pred[,"lwr"])

> # Shades the confidence band> color = adjustcolor("darkgray",alpha=0.3) # Creates a transparent gray> polygon(c(x,rev(x)),c(pred[,"upr"],rev(pred[,"lwr"])),col=color)

Page 173: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Confidence band for prediction

10 15 20

5055

6065

Backfat depth

LMP

Page 174: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectYet another model comparison issue:

To which extent• M : Y = β0 + β1x + ε with RSS =

∑ni=1(Yi − Yi)

2

better fits to the data than

• M0 : Y = β0 + ε with RSS0 =∑n

i=1(Yi − Y )2?

Page 175: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectYet another model comparison issue:

To which extent• M : Y = β0 + β1x + ε with RSS =

∑ni=1(Yi − Yi)

2

better fits to the data than

• M0 : Y = β0 + ε with RSS0 =∑n

i=1(Yi − Y )2?

ANOVA equation:

RSS0 =n∑

i=1

(Yi − Y )2 + RSS.

Page 176: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectR script

> fit = fitted(lmp.lm) # Extracts fitted values

> plot(dta$LMP,fit,pch=16,xlim=c(48,67),ylim=c(48,67),+ xlab="Observed LMP values",ylab="Fitted LMP values")

> abline(a=0,b=1,lwd=2,col="gray")# Adds the line y=x to the plot

Page 177: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effect

●●

●●

●●

●●

●●

●●

50 55 60 65

5055

6065

Observed LMP values

Fitt

ed L

MP

val

ues

Page 178: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectYet another model comparison issue:

To which extent• M : Y = β0 + β1x + ε with RSS =

∑ni=1(Yi − Yi)

2

better fits to the data than

• M0 : Y = β0 + ε with RSS0 =∑n

i=1(Yi − Y )2?

The R2 coefficient to compare RSS and RSS0:

R2 =RSS0 − RSS

RSS0.

• 0 ≤ R2 ≤ 1;• R2 = 0: absence of an effect of x ;• R2 = 1: ’perfect’ effect of x .

Page 179: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectR script

> # Extracts the R2

> summary(lmp.lm)$r.squared

[1] 0.6037405

Page 180: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectYet another model comparison issue:

To which extent• M : Y = β0 + β1x + ε with RSS =

∑ni=1(Yi − Yi)

2

better fits to the data than

• M0 : Y = β0 + ε with RSS0 =∑n

i=1(Yi − Y )2?

The R2 coefficient to compare RSS and RSS0:

R2 = r2y ,y

• 0 ≤ R2 ≤ 1;• R2 = 0: absence of an effect of x ;• R2 = 1: ’perfect’ effect of x .

Page 181: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Measuring how strong is a linear effectR script

> cor(dta$LMP,fit) ˆ 2

[1] 0.6037405

Page 182: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for a linear effectThe R2 can be associated to a p-value for the following test:{

H0 : M does not fit better to the data thanM0H1 : M better fits to the data thanM0

F-test for the significance of the relationship between Y and x :

F =RSS0 − RSSRSS/(n − 2)

.

One degree of freedom for RSS0 − RSS =∑n

i=1(Yi − Y )2:

Yi − Y = β1(xi − x) proportional to xi − x .

Page 183: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for a linear effectR script

> # Extracts the F-test statistics

> summary(lmp.lm)$fstatistic

value numdf dendf88.36873 1.00000 58.00000

Page 184: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for a linear effectSignificance or not of an effect now relies on the judgement thatF = 88.369 is large or not regarding its distribution under thenull hypothesis.

The former null distribution is a Fisher distribution F1,n−2

R script

> # Displays the complete ANOVA table

> anova(lmp.lm)

Analysis of Variance Table

Response: LMPDf Sum Sq Mean Sq F value Pr(>F)

BFAT 1 425.90 425.90 88.369 2.915e-13Residuals 58 279.53 4.82

Page 185: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for a linear effectANOVA table:• Df: degrees of freedom, respectively 1 and n − 2;• Sum Sq: sum-of-squares, respectively RSS0 − RSS and

RSS;• Mean Sq: mean squares, respectively (RSS0 − RSS)/1

and RSS/(n − 2);• F value: F-statistics, the ratio of the mean squares;• Pr(>F): p-value, the probability that the F-statistics

exceeds F value under the null hypothesis.

Page 186: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Group-wise linear effectsCould the relationship between the LMP and the backfat depthbe specific of the genetic type?

In a regression modeling framework, this issue introducestwo explanatory variables:• the backfat depth, a numeric covariate;• and the genetic type, a factor.

The fact that the effect of the numeric covariate is not thesame according to the level of the grouping variable is calledan interaction effect between the two explanatory variables.

Page 187: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Introducing a group effect in a regression modelLet Yij stand for the response value of the j th sampling item,j = 1, . . . ,ni , in the i th group, i = 1, . . . , I

and xij the corresponding value of the explanatory variable.

Regression model within the 1st group (’reference’):

Y1j = µ+ βx1j + ε1j , ε1j ∼ N (0;σ)

Regression model within the ith group, with i 6= 1:

Yij = µ+ αi + (β + γi)xij + εij , ε1j ∼ N (0;σ)

where:• α2, . . . , αI are the group effect parameters;• γ2, . . . , γI are the interaction effect parameters.

Page 188: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Introducing a group effect in a regression modelLet Yij stand for the response value of the j th sampling item,j = 1, . . . ,ni , in the i th group, i = 1, . . . , I

and xij the corresponding value of the explanatory variable.

Note that:• the one-way analysis of variance model for the mean

comparison of Y across groups is a submodel, obtainedwith β = 0 and γ2 = . . . = γI = 0.

• the linear regression model for the study of the effect of xon Y is also a submodel, obtained with α2 = . . . = αI = 0and γ2 = . . . = γI = 0.

Page 189: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Introducing a group effect in a regression modelR script

> lmp.lm = lm(LMP ˜ BFAT*GENET,data=dta)

> coef(lmp.lm)

(Intercept) BFAT GENETP25 GENETP5080.2215328 -1.4575042 -9.7600733 -13.7187769

BFAT:GENETP25 BFAT:GENETP500.6268832 0.9550714

Page 190: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Introducing a group effect in a regression modelCorrespondence between the R output and the parameters:

Parameter Name in R Estimateµ (Intercept) 80.2215328β BFAT -1.4575042α2 GENETP25 -9.7600733α3 GENETP50 -13.7187769γ2 BFAT:GENETP25 0.6268832γ3 BFAT:GENETP50 0.9550714

Is the effect of the backfat depth on the LMP really• the most obvious in genetic sub-population P0,• less clear in sub-population P25

• and even less clear in sub-population P50?

Page 191: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for group effect in a regression modelThe corresponding model comparison issue:

To which extent• M : Yij = µ+ αi + (β + γi)xij + εij

better fits to the data than

• M0 : Yij = µ+ αi + βxij + εij?

Page 192: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for group effect in a regression modelR script

> lmp.lm0 = lm(LMP ˜ BFAT+GENET,data=dta)

> coef(lmp.lm0)

(Intercept) BFAT GENETP25 GENETP5071.33759632 -0.87167292 -0.34433658 -0.08245607

> sum(residuals(lmp.lm) ˆ 2) # RSS for the full model

[1] 229.4831

> sum(residuals(lmp.lm0) ˆ 2) # RSS for the submodel

[1] 278.2727

Page 193: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for group effect in a regression modelFor the following test:{

H0 : M does not fit better to the data thanM0H1 : M better fits to the data thanM0

the appropriate F-test is given by:

F =(RSS0 − RSS)/(I − 1)

RSS/(n − 2I).

General rules for the degrees of freedom• RSS0 − RSS: difference between the numbers of

parameters of the two models.• RSS: difference between the sample size n and the

number of parameters.Under the null hypothesis, F ∼ FI−1,n−2I .

Page 194: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for group effect in a regression modelR script

> anova(lmp.lm0,lmp.lm)

Analysis of Variance Table

Model 1: LMP ˜ BFAT + GENETModel 2: LMP ˜ BFAT * GENET

Res.Df RSS Df Sum of Sq F Pr(>F)1 55 278.272 53 229.48 2 48.79 5.6341 0.006045

Page 195: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Testing for group effect in a regression modelColumnwise description of the ANOVA table:• Res.Df: residual degrees of freedom;• RSS: residual sum-of-squares;• Df: degrees of freedom of RSS0 − RSS;• Sum of Sq: fitting gain RSS0 − RSS;• F: F-test statistics for the comparison of Model 1 andModel 2;

• Pr(>F): p-value of the test.

Page 196: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Tests for pairwise comparisonsR script

> # Sidak correction for a control of the FWER at level 0.05> alpha = 1-(1-0.05) ˆ (1/3)> alpha[1] 0.01695243

Page 197: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Tests for pairwise comparisonsR script

> lmp.lm = lm(BFAT ˜ GENET*BFAT,data=dta)

> summary(lmp.lm)$coefficients

Estimate Std. Error t value Pr(>|t|)(Intercept) 80.2215328 3.2957112 24.341190 1.935662e-30

BFAT -1.4575042 0.2144210 -6.797394 9.555535e-09GENETP25 -9.7600733 3.6821573 -2.650640 1.056916e-02GENETP50 -13.7187769 4.1457585 -3.309111 1.687184e-03

BFAT:GENETP25 0.6268832 0.2468264 2.539774 1.406063e-02BFAT:GENETP50 0.9550714 0.2879532 3.316759 1.649469e-03

Page 198: David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625math.agrocampus-ouest.fr/.../digitalAssets/103542_SignifRelLecture.pdf · Does the genetic type of a pig affect his LMP, his fat

Effect at a population level Decision making procedure Testing for a group effect Linear effect

Tests for pairwise comparisonsR script

> tmp = dta # Temporary dataset similar to dta

> tmp$GENET = relevel(dta$GENET,"P25")> # tmp$GENET = dta$GENET except that the reference level is P25

> tmp.lm = lm(BFAT ˜ GENET*BFAT,data=tmp)

> summary(tmp.lm)$coefficients # P25 vs P50

Estimate Std. Error t value Pr(>|t|)(Intercept) 70.4614595 1.6421236 42.908743 7.681692e-43GENETP0 9.7600733 3.6821573 2.650640 1.056916e-02

GENETP50 -3.9587036 3.0036929 -1.317945 1.931901e-01BFAT -0.8306210 0.1222575 -6.794030 9.675360e-09

GENETP0:BFAT -0.6268832 0.2468264 -2.539774 1.406063e-02GENETP50:BFAT 0.3281883 0.2277884 1.440759 1.555347e-01