vooruitblik 10 en 11 dinsdag 30 september 2008. chapter 10 correlation and regression 1. correlation...
TRANSCRIPT
![Page 1: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/1.jpg)
Vooruitblik 10 en 11
Dinsdag 30 september 2008
![Page 2: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/2.jpg)
Chapter 10Correlation and Regression
1. Correlation
2. Regression
3. Variation and Prediction Intervals
4. Rangorde correlatie
![Page 3: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/3.jpg)
1. Correlation
• Verband tussen twee gemeten variabelen in een dataset op interval of ratio nivo
• In dit boek: alléén lineaire verbanden
• Let op de voorwaarden!
• Maat: Pearson PM correlatie r of rho
• Geen correlatie: r = 0, maximale correlatie r = -1 of +1
• Kritische waarden: tabel A-6
![Page 4: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/4.jpg)
Scatterplots of Paired Data
Figure 10-2
![Page 5: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/5.jpg)
Scatterplots of Paired Data
Figure 10-2
![Page 6: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/6.jpg)
Formula 10-1
nxy – (x)(y)
n(x2) – (x)2 n(y2) – (y)2r =
The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample.
Calculators can compute r
Formula
![Page 7: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/7.jpg)
Figure 10-3
Hypothesis Test for a Linear Correlation
![Page 8: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/8.jpg)
2. Regression
• Vervolg op correlatie
• Berekening van regressielijn in de scatterplot: de lijn die het beste past in de puntenwolk
• Doel: voorspellen van waarden
![Page 9: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/9.jpg)
Regression
The typical equation of a straight line y = mx + b is expressed in the form y = b0 + b1x, where b0 is the y-intercept and b1 is the slope.
^
The regression equation expresses a relationship between x (called the independent variable, predictor variable or explanatory variable), and y (called the dependent variable or response variable).
^
![Page 10: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/10.jpg)
Formulas for b0 and b1
Formula 10-2n(xy) – (x) (y)
b1 = (slope)n(x2) – (x)2
b0 = y – b1 x (y-intercept)Formula 10-3
calculators or computers can compute these values
![Page 11: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/11.jpg)
Given the sample data in Table 10-1, find the regression equation.
Example: Old Faithful - cont
![Page 12: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/12.jpg)
Procedure for Predicting
Figure 10-7
![Page 13: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/13.jpg)
3. Variation and Prediction Intervals
• Vervolg op regressielijn
• (hfst 7) Confidence interval = interval schatting van populatie parameters: proportie, gemiddelde, variantie
• Hier: interval schatting van de schatting van de waarde van een variabele
![Page 14: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/14.jpg)
Key Concept
In this section we proceed to consider a method for constructing a prediction interval, which is an interval estimate of a predicted value of y.
![Page 15: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/15.jpg)
y - E < y < y + E^ ^
Prediction Interval for an Individual y
where
E = t2 se n(x2) – (x)2
n(x0 – x)2
1 + +1n
x0 represents the given value of x
t2 has n – 2 degrees of freedom
![Page 16: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/16.jpg)
Standard Error of Estimate
The standard error of estimate, denoted by se
is a measure of the differences (or distances) between the observed sample y-values and the predicted values y that are obtained using the regression equation.
Definition
^
![Page 17: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/17.jpg)
4. Rangorde correlatie
• Non-parametrische methode = verdelingsvrije toets = geen aannames mbt. Verdeling in de opulatie
• Associatietest op twee variabelen• Spearman’s: rs (sample) of voor populatie: rhos
• Procedure in fig 10.10 (p.537)
![Page 18: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/18.jpg)
voorbeeld
![Page 19: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/19.jpg)
1. Goodness-of-fit: multinominaal
2. Kruistabellen (contingency tables)
3. Variantie analyse (ANOVA)
Chapter 11Multinomial Experiments and Contingency Tables
![Page 20: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/20.jpg)
OverviewWe focus on analysis of categorical (qualitative
or attribute) data that can be separated into different categories (often called cells).
Use the 2 (chi-square) test statistic (Table A- 4).
The goodness-of-fit test uses a one-way frequency table (single row or column).
The contingency table uses a two-way frequency table (two or more rows and columns).
![Page 21: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/21.jpg)
1. Goodness-of-fit: multinominaal
• Komt een feitelijke kansverdeling op een nominale variabele overeen met een verwachte verdeling?
• H0: p1 = x, p2 = y, p3 = z, p4 = etc..
• H1: Tenminste één van de gevonden proporties is afwijkend van de verwachte kans.
![Page 22: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/22.jpg)
Goodness-of-Fit Test in Multinomial Experiments
Critical Values1. Found in Table A- 4 using k – 1 degrees of
freedom, where k = number of categories.
2. Goodness-of-fit hypothesis tests are always right-tailed.
2 = (O – E)2
E
Test Statistics
![Page 23: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/23.jpg)
Example: Last Digit Analysis
Test the claim that the digits in Table 11-2 do not occur with the same frequency.
![Page 24: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/24.jpg)
Relationships Among the 2 Test Statistic, P-Value, and Goodness-of-Fit
Figure 11-3
![Page 25: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/25.jpg)
2. Kruistabellen (contingency tables)
• In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for categorical data arranged in a table with a least two rows and at least two columns.
• We present a method for testing the claim that the row and column variables are independent of each other.
• We will use the same method for a test of homogeneity, whereby we test the claim that different populations have the same proportion of some characteristics.
![Page 26: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/26.jpg)
491
213
704
377
112
489
31
8
39
899
333
1232
Black White Yellow/OrangeRow Totals
Controls (not injured)
Cases (injured or killed)
Column Totals
For the upper left hand cell:
= 513.714E =(899)(704)
1232
Case-Control Study of Motorcycle Drivers
(row total) (column total) E =
(grand total)
899
1232704
899
1232
![Page 27: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/27.jpg)
491513.714
213
704
377
112
489
31
8
39
899
333
1232
Black White Yellow/OrangeRow Totals
Cases (injured or killed)Expected
Column Totals
Controls (not injured)Expected
190.286
356.827
132.173
28.459
10.541
2 2 22 ( ) (491 513.714) (8 10.541)
...513.714 10.541
O E
E
2 8.775
Case-Control Study of Motorcycle Drivers
![Page 28: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/28.jpg)
H0: Row and column variables are independent.
H1: Row and column variables are dependent.
The test statistic is 2 = 8.775
= 0.05
The number of degrees of freedom are
(r–1)(c–1) = (2–1)(3–1) = 2.
The critical value (from Table A-4) is 2.05,2 = 5.991.
Case-Control Study of Motorcycle Drivers
![Page 29: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/29.jpg)
We reject the null hypothesis. It appears there is an association between helmet color and motorcycle safety.
Case-Control Study of Motorcycle Drivers
Figure 11-4
![Page 30: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/30.jpg)
3. Variantie analyse (ANOVA)
• ANalysis Of VAriance
• H0 = meerdere populatie gemiddeldes zijn gelijk
• F-verdeling (tabel A7)
• Toets op P-waarde
![Page 31: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/31.jpg)
![Page 32: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/32.jpg)
![Page 33: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/33.jpg)
TOT SLOT: Bayesiaanse statistiek
• Teksten en 2 opdrachten (worden uitgedeeld)
• 1. Intuïtieve benadering• 2. Formele benadering
![Page 34: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/34.jpg)
Voorbeeldprobleem
• Gegeven: In Orange County VS is 51 % man, 9.5% van de mannen rookt sigaren, tegenover 1.7% van de vrouwen
• Gevraagd: Hoe groot is de kans dat een willekeurige sigarenroker een man is?
![Page 35: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/35.jpg)
1. Intuïtieve benadering
![Page 36: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/36.jpg)
2. Formele benadering
![Page 37: Vooruitblik 10 en 11 Dinsdag 30 september 2008. Chapter 10 Correlation and Regression 1. Correlation 2. Regression 3. Variation and Prediction Intervals](https://reader036.vdocuments.site/reader036/viewer/2022081602/551a75ca5503463e778b619e/html5/thumbnails/37.jpg)
Einde vooruitblik
• Volgende week (week 6): – Vragenuur– Geen nieuwe stof– Voorbereiding proeftentamen