Download - WeekOne - sciences.ucf.edu
Week One • Slides posted; new work item posted; videos for sta6236 on line on you tube
• Pep talk for regression, especially in light of predicAve analyAcs
• Reality check (informaAve quiz) reveals that many of you need to review reference distribuAons (normal, t(n), F(ν1, ν2), χ2(n), matrix analysis; likely hypothesis tesAng (p-‐value low means null must go), confidence/predicAon intervals; Take it upon yourself to proceed accordingly.
1
Regression Models
Supply answers to the question: ‘What is the relationship between the variables?’
Answer in the form of equations involving: Numerical response (dependent) variable One or more numerical or categorical independent (explanatory) variables
Emphasis on prediction & estimation rather than inference (hypothesis testing)
2
Another Reason: Demonstrate No Relationship!
3
Brief Recap: Pep Talk for Regression
• Syllabus/Text/JMP Pro 11 tied together • Regression analysis used extensively by
statisticians as well as non-statisticians • Exploratory mode (Y versus X or X’s) • Estimation • Prediction • Understanding
4
Regression to help with the placement of incisions for roboAc surgery. PracAce 6-‐7 Ames and good to go. Help the lesser experienced benefit from the experts. Take results from many successful surgeries, save these measurements; given a paAent’s height, weight, other factors, produce distance from ASIS, TIC to camera incision.
5
GeXng familiar with JMP pla[orms (or their equivalents)
• Get ahold of JMP Pro 11 (various opAons) • Take tutorials, pracAce loading data sets
– Fit Y by X – Fit Model – Analyze distribuAon – Formula – Table manipulaAons – Note linkages
• Read ahead in text as noted in FAQ for STA 6236 • Review matrix analysis if rusty • One demo on You Tube using SLOPE.jmp file
6
Data sets ASLM Ch1ta1
7
PracAce with “Genuine” Fake Made up Data
• Consider a model of the form y = 10 + 5x + error where x: 1, 2, 3, 4, 5 replicated
• Check how you did • Fit it. (true, with error) • Play with parameters, assumpAons • Try it with 1000 replicated data sets
8
You tube Video #2 part2 PracAce with “Genuine” Fake Made up Data
• Consider a model of the form y = 10 + 5x + error where x: 1, 2, 3, 4, 5 replicated
• Check how you did • Fit it. (true, with error) • 15 Minute Limit!!!! LLL • Need to create the groups and then fiddle with assumpAons
• Play with parameters, assumpAons • Try it with 1000 replicated data sets
9
Homework assignment
• Simulated data problem; do this one as well
10
Zillow 2011 data
• Fit line • Fit orthog • Fit x on y • Fit Y on X; fit model
11
12
Fit Y by X
13
Functional v. Statistical Relations
Functional Relation: One Y-value for each X-value
14
Statistical Relation
Example 1: Y = Year-end employee evaluation X = Mid-year evaluation
15
Curvilinear Statistical Relation
Example 2: x = Age y = Steroid level in blood
Regression Objectives: Characterize the statistical Relation and/or predict new values
16
Statistical Relations
• Have distribution of Y-values for each X • The mean changes systematically with X
17
Simple Linear Regression Model: The Assumptions
18
Notes on Simple Linear Regression Model (Regression “lite”)
1. Model is simple, because only one predictor (X)
2. Model is linear because parameters enter linearly
3. Since X = X 1 (and X 2, X 3, etc. not present) model is first-order
19
Homework problems. (Best to run many data sets. Examine output. Interpret. Where does it come from?)
• What I have done in the past: Do as many problems as
you wish. Student solution manual is floating around out in the cloud. Evidently easy to find. Do as you wish, to get comfortable with the material.
• Not to be graded, but I may go through them in an extended class period following usual lecture format class (to be determined). You should be comfortable doing these types of problems. For quiz situation, you should also be comfortable extracting relevant material from JMP output.
• 1.5, 1.6 (draw plot by hand), 1.7, 1.10, 1.11, 1.13, 1.16, 1.18, 1.19 (needs software and data disc that comes with the book), 1.20 through 1.28, 1.29, 1.32, 1.33, 1.34, 1.35, 1.36, 1.39,1.43, 1.44, 1.45, 1.46
20
Features of Model
1. εi is a random variable, so Yi is also a random variable 2. Mean of Yi is the regression function:
3. εi is the vertical deviation of Yi from the mean at Xi
21
Features of Model (continued) 4. Variance is constant:
Var(Yi) = Var(β0 + β1 Xi + εi) = Var(εi) = σ2
5. Yi is uncorrelated with Yj for i ≠ j
6. In summary, regression model (1.1) implies that responses Yi come from probability distributions whose means are E(Yi) = β0 + β1 Xi and whose variances are σ 2, the same for all levels of X. Any two responses Yi and Yj are uncorrelated.
22
Illustration of Simple Linear Regression
Error terms NOT assumed to be normally distributed—no distributional assumptions made other than on moments.
23
Section 1.4
• Observational Data – Lung cancer impacts from smoking/smoking
cessation; suggests causation but not provable
• Experimental Data – Feasible to control assignment of subjects to
treatments; STA 5205 designed experiment • Completely Randomized Design
– Free of “bias”…possibly not efficient
24
25
Least Squares Estimators: Properties
1. Linear: We will show they are each linear combinations of the Yi ’s
2. Unbiased: E{b0} = β0 and E{b1} = β1 3. Best: Minimum variance (maximum precision) among
all linear, unbiased estimators of these parameters. 4. Estimators
Gauss-Markov Theorem: If the assumptions hold, the LS estimators are “BLUE”:
26
27
Relationship to Features of Model
• Y = f(X) does not appear linear • εi and εj uncorrelated? • Presumption of declining values of y • Large drop in y suggests long duration
until next y observed • Is X fixed? • Other than that, … no problem!
28
Alternative Version of Model: Use Centered Predictor(s)
where:
Same slope, different intercept! 29
Estimating the Regression Function
Example: Persistence Study. Each of 3 subjects given a difficult task. Yi is the number of attempts before quitting.
Sub ject i : 1 2 3Age X i : 20 55 30
Number of attempts Yi : 5 12 10
30
Estimating the Regression Function
Scatter Plot:
20 30 40 50
0
5
10
15
Age
Attempts
Hypothesis: E{Y} = β0 + β1 X How do we estimate β0 and β1?
31
Criteria for choice of β0 and β1
• Sum of perpendicular distances ┴
• Sum of vertical distances (absolute values) ↕
• Sum of vertical distances squared (↕)2
• Sum of horizontal distances (n)
32
Least Squares Criterion
Find the values of β0 and β1 that minimize the least squares objective function Q, given the sample, (X1, Y1), …, (Xn,Yn). Call those minimizing values: b0 and b1.
33
Persistence Study
Who wins? Which fit is better?
34
How do we find b0 and b1? Calculus: 1. Take partial derivatives with respect to β0 and β1,
and set equal to zero 2. Get two equations and two unknowns, solve.
Denote solutions by b0 and b1:
35
36
37
Least Squares Estimators: Properties
1. Best: Minimum variance (maximum precision) 2. Linear: We will show they are each linear
combinations of the Yi ’s 3. Unbiased: E{b0} = β0 and E{b1} = β1 4. Estimators
Gauss-Markov Theorem: If the assumptions hold, the LS estimators are “BLUE”:
38
Example 1: Toluca Company Data
39
Toluca Company Fit
Bivariate Fit of Work Hours by Lot Size
40
JMP Pro 11 Output
Output from Fit Y by X Platform
41
Estimating the mean response at X
Regression Function:
Estimator:
Using centered-X model:
42
Residuals!
Estimated residuals are key to assessing fit and validity of assumptions
True Residuals (always unknown!): εi = Yi – (β0 + β1 Xi )
Estimated Residuals:
43
More Properties related to b0 and b1
1. 2.
5. The regression line passes through the point
3.
4.
44
Estimating the variance, σ2
Single population (no X’s for now, just Y’s):
Degrees of freedom is n-1 here because the mean was estimated using one statistic, namely Y. For regression, mean is estimated by Y, which uses two statistics, b0 and b1.
= SSE / (n-1)
^
_
45
Tolucca data, ch01ta1
• Easy to load via best guess (then delete a column probably).
• Can copy and paste from EXCEL as well
46
s2 is the “mean square for error” in Analysis of Variance (MSE = SSE/DF)
47
SSE = 54825.46 MSE = 2384 48.82 is square root of 2384, esAmate of standard deviaAon, σ
More Properties related to b0 and b1
1. 2.
5. The regression line passes through the point
3.
4.
48
49
50
51
52
53
August 27, 2014
• No class on Labor Day…in case you forgot
• Next class after tonight is Sept. 3
54
Study Guide • Understand terminology • Assumptions for simple linear regression • Least squares criterion; normal equations • Derive estimators b0, b1 for β0, β1, resp. • Sense in which the LS are BLUE • Be able to extract relevant numbers from
JMP Pro 11 output (e.g., estimates; fitted model)
• Properties of b0, b1
55
Study Guide (continued) • Assumptions for normal error regression • LS versus MLE estimators • Yi is normal, distribution of linear combination of these Yi ’s
• Properties of the ki ’s • Distribution, mean and variance of b1 • Difference between confidence interval on the regression
function and prediction interval for future observations at xh
• SSTO=SSR+SSE and why we care about ANOVA • General linear test/full, reduced model • Definition and interpretation of R2
56
Normal Error Regression Model
Add to the “assumptions” one more item:
Notes: 1. N(0,σ2) implies normally distributed with mean zero and variance σ2. 2. “Uncorrelated” implies independence for
normal errors. 3. Normality is a strong assumption—might not be true!
57
One “Rationale” for Normality Suppose the true model involves 21 weak predictors: X and Z1, …, Z20 so that:
Yi = β0 + β1 Xi + β2 Z1,i +β3 Z2,i + … + β20 Z20,i +εi But we use:
Yi = β0 + β1 Xj + εi So that
εi = β2 Z1,i +β3 Z2,i + … + β20 Z20,i Central Limit Theorem suggests normality of εi. (not implausible that error terms normal)
58
Maximum Likelihood Estimation
Rationale: Use as estimates, those values of the parameters that maximize the likelihood of the observed data
Case 1: Single sample; estimate µ. Assume σ2 = 100. Data: n = 3; Y1 = 250, Y2=265, Y3=259. Which is more likely, µ = 230 or µ = 259?
59
Maximum Likelihood Estimators
The values of β0, β1, and σ2 that maximize the likelihood function, namely β0, β1, and σ2, are called the maximum likelihood estimators. Some results:
^^ ^
60
Some KEY Points from Appendix 1
Note: Review Appendix 1 with special attention to A.1: Summation and product notation A.3: Random variables A.4: Normal and related distributions A.6: Inferences about population mean A.7: Comparisons of population means A.8: Inferences about population variance
61
Linear Combinations of Random Variables
Let Y1, . . ., Yn be random variables, and a1, . . ., an are constants. Then:
Z = a1Y1 + . . . + anYn is a linear combination of the random variables Y1, . . . ,Yn
62
Examples of Linear Combinations
1. Example 1: Difference of two random variables
2. Example 2: The sample mean
63
Examples of Linear Combinations
1. Example 1: Difference of two random variables
X - Y
2. Example 2: The sample mean X = X1/n + X2/n + … + Xn/n
_
64
Expectation and Variance of Linear Combinations
1. Expectation (A.29a). Let E{Yi} = µi, for i = 1,2, . . ., n, and let Z = a1Y1 + . . . + anYn. Then:
E{Z} = Σ ai µi
2. Variance (A.31): In addition to the above, Assume that the {Yi} are mutually independent and σ2{Yi} = σi
2, i = 1,2, . . ., n. Then:
σ2{Z} = a12
σ12 + . . . + an
2σn2
65
Examples of Linear Combinations 1. Example 1: Difference of two random variables
E(X-Y) = E(X) – E(Y) Var(X-Y) = Var(X) + Var(Y)
2. Example 2: The sample mean
E(X-bar)= 1/n [E(X1) + …+ E(Xn)] = (1/n) (n*µ) = µ Var(X-bar) = (1/n)2 [σ2 + … + σ2] = σ2/n
66
Expectation and Variance of Linear Combinations: Examples
Example 4: Let Z = Y. Find E{Z} and σ2{Z}.
_
67
68
t Distribution Examples
Example 5: Find the t statistic corresponding to the sample average in a sample of size n. Assume E{Yi} = µ0, for i = 1,2, . . ., n
69
Linear Combinations of Independent Normal RVs (A.40)
70
Chapter 2
71
72