bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfwith...
TRANSCRIPT
![Page 1: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/1.jpg)
Bivariate data (two quantitative variables)Hypothetical scatter plots
1 2 3 4 5
2.5
3.5
4.5
5.5
Corr. coeff=0
1 2 3 4 5
2.5
3.5
4.5
Corr. coeff=0.4
1 2 3 4 5
2.5
3.5
4.5
5.5
Corr. coeff=-0.2
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=-0.5
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=0.6
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=0.8
1 2 3 4 5
3.03.54.04.55.0
Corr. coeff=-0.7
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=-0.9
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=0.9
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=0.95
1 2 3 4 5
3.0
4.0
5.0
Corr. coeff=-0.95
1 2 3 4 53.0
4.0
5.0
Corr. coeff=-0.99
![Page 2: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/2.jpg)
Correlation coefficient
Aim: establish and estimate association between two variables.
Formula: rYZ =
∑ni=1(yi − y)(zi − z)
(n − 1)√
S2Y S2
Z
. (Pearson’s)
Property: −1 ≤ rYZ ≤ 1.
Confidence intervals and test of the hypothesis ρ = 0 useassumption (Y ,Z ) bivariate normal with correlation coefficient ρ.
If variables not normal, other coefficients used:Kendall’s correlation coefficient τ use rank of observations, insteadof values.Spearman’s correlation coefficient rS is also computed from ranks.
![Page 3: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/3.jpg)
An example
60 80 100 120 140 160 180 200
34
56
78
Calories and price of some brands of beer
Calories
Price
![Page 4: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/4.jpg)
An example
60 80 100 120 140 160 180 200
34
56
78
Calories and price of some brands of beer
Calories
Price
rYZ : 0.419937 (Pearson)95% confidence interval:0.2271418–0.5810573
![Page 5: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/5.jpg)
An example
60 80 100 120 140 160 180 200
34
56
78
Calories and price of some brands of beer
Calories
Price
rYZ : 0.419937 (Pearson)95% confidence interval:0.2271418–0.5810573P(> |rYZ ||ρ = 0) = 6.31·10−5
τ : 0.3488677 (Kendall)P(> |τ |) = 3.39 · 10−6
rS : 0.5008197 (Spearman)P(> |rS |) = 1.053 · 10−6
![Page 6: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/6.jpg)
An example
60 80 100 120 140 160 180 200
34
56
78
Calories and price of some brands of beer
Calories
Price
rYZ : 0.419937 (Pearson)95% confidence interval:0.2271418–0.5810573P(> |rYZ ||ρ = 0) = 6.31·10−5
τ : 0.3488677 (Kendall)P(> |τ |) = 3.39 · 10−6
rS : 0.5008197 (Spearman)P(> |rS |) = 1.053 · 10−6
In this case results are similar with all coefficients.
![Page 7: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/7.jpg)
Linear regressionWith bivariate data, we can choose to predict Y on the basis of X :
Y = α + βX + ε (ε error).
For each value xi of X , there are:
yi (observed value) and yi = α + βxi (predicted value).
α and β are chosen to minimize∑n
i=1(yi − yi )2.
20 30 40 50 60 70 80
100
120
140
160
180
200
220
A regression
eta
pressione
obs. - pred.
![Page 8: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/8.jpg)
Linear regressionWith bivariate data, we can choose to predict Y on the basis of X :
Y = α + βX + ε (ε error).
For each value xi of X , there are:
yi (observed value) and yi = α + βxi (predicted value).
α and β are chosen to minimize∑n
i=1(yi − yi )2.
20 30 40 50 60 70 80
100
120
140
160
180
200
220
A regression
eta
pressione
obs. - pred.β =
∑ni=1(yi − y)(xi − x)∑n
i=1(xi − x)2
α = y − βx .
![Page 9: Bivariate data (two quantitative variables)anal1/matstat_biotech/1011/diario/lez_regr.pdfWith bivariate data, we can choose to predict Y on the basis of X: Y = + X + " ("error): For](https://reader034.vdocuments.site/reader034/viewer/2022042913/5f4b128347952b38975cfe0d/html5/thumbnails/9.jpg)
Linear regressionWith bivariate data, we can choose to predict Y on the basis of X :
Y = α + βX + ε (ε error).
For each value xi of X , there are:
yi (observed value) and yi = α + βxi (predicted value).
α and β are chosen to minimize∑n
i=1(yi − yi )2.
20 30 40 50 60 70 80
100
120
140
160
180
200
220
A regression
eta
pressione
obs. - pred.β =
∑ni=1(yi − y)(xi − x)∑n
i=1(xi − x)2
α = y − βx .
Formulae similar to correlation,but interpretation very different.