linear regression a method of calculating a linear equation for the relationship between two or more...

19
Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points.

Upload: magdalene-gardner

Post on 21-Dec-2015

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Linear Regression A method of calculating a linear

equation for the relationship between two or more variables using multiple data points.

Page 2: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Linear Regression General form: Y = a + bX + e “Y” is the dependent variable “X” is the explanatory variable “a” is the intercept parameter “b” is the slope parameter “e” is an error term or residual

Page 3: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Regression Results Y and X come from data A computer program calculates

estimates of a and b e is the difference between a + bX and

the actual value of Y corresponding to X OLS estimates of a and b minimize the

sum of the squared residuals ∑e2

“OLS” is Ordinary Least Squares

Page 4: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

The Regression Line and the Residual

Page 5: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Electricity Demand Example Data for residential customers in U.

S. states Y is millions of kilo-watt-hours sold X is population Other data include per capita

income, price of electricity (cents/kwh) and price of natural gas

Page 6: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Variable Means 2004

Milkwh Pop Pkwh Pgas Income

Mean 25364.4902 5756576.549 8.997647059 11.39823529 32344.62745Standard Deviation 25605.09342 6502776.218 2.411107288 3.107310545 5307.123526Minimum 1834 506529 6.13 4.88 24379Maximum 120330 35893799 18.13 27.15 52101Count 51 51 51 51 51

Page 7: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Variable Means 2012

Milkwh Pop Pkwh Pgas IncomeMean 26,951 6,198,605 10.35 12.14 42,492Standard Dev. 27,327.10 7,036,162 4.1633 6.40641 7,605.86Minimum 2,003 582,658 6.9 7.43 33,073Maximum 137,412 38,332,521 34 52.86 74,710Observations 51 51 51 51 51

Page 8: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Excel Regression: MilkwhSUMMARY OUTPUT

Regression StatisticsMultiple R 0.868539405R Square 0.754360698Adjusted R Square 0.749347651Standard Error 12819.23929Observations 51

ANOVAdf SS MS F Significance F

Regression 1 24728728564 2.47E+10 150.4795 1.49762E-16Residual 49 8052311901 1.64E+08Total 50 32781040465

Coefficients Standard Error t Stat P-valueIntercept 5677.407437 2407.873624 2.357851 0.022417Pop 0.003419929 0.000278791 12.26701 1.5E-16

Page 9: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Actual vs. Predicted Milkwh

Page 10: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Useful Statistics and Tests t-statistic: is estimated coefficient

significantly different from zero? Coefficient of determination

or R-square: % variation explained F-statistic: statistical significance of the

entire regression equation; OR is the R-square significantly different from zero?

Find these on the Excel regression output.

Page 11: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Confidence and Significance Levels

99% Confidence = 1% Significance P-value of 0.01 or less

95% Confidence = 5% Significance P-value of 0.05 or less

90% Confidence = 10% Significance P-value of 0.10 or less

Smaller Significance Levels Are Better Find P-values for t and F statistics on Excel

regression output

Page 12: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Multiple & Nonlinear Regression Multiple Regression

Y= a + bX + cW + dZ Nonlinear Regression

Quadratic: Y = a + bX + cX2 Log-Linear: Y = aXbZc

Or Ln Y = (ln a) + b(ln X) + c(ln Z)

Page 13: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Multiple Regression: Milkwh

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.886907555R Square 0.786605011Adjusted R Square 0.777713553Standard Error 12072.10091Observations 51

ANOVAdf SS MS F Significance F

Regression 2 25785730685 1.29E+10 88.4675 7.95099E-17Residual 48 6995309780 1.46E+08Total 50 32781040465

Coefficients Standard Error t Stat P-valueIntercept 22354.84026 6594.71082 3.389814 0.001407Pop 0.003568436 0.000268271 13.30163 1.02E-17Pkwh -1948.545245 723.5281324 -2.69312 0.009721

Page 14: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Quadratic Regression: Milkwh

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.955685683R Square 0.913335125Adjusted R Square 0.89682753Standard Error 8224.476787Observations 51

ANOVAdf SS MS F Significance F

Regression 8 29940075691 3.74E+09 55.32818 7.73456E-20Residual 42 2840964773 67642018Total 50 32781040465

Coefficients Standard Error t Stat P-valueIntercept -22436.17995 40605.45259 -0.55254 0.583507Pop 0.005502909 0.000520656 10.56919 2.09E-13Popsq -6.71958E-11 1.69781E-11 -3.9578 0.000286Pkwh 14663.0069 4863.649065 3.014816 0.004349Pkwhsq -850.7008024 242.651212 -3.50586 0.001097PGas -2752.477572 2070.841636 -1.32916 0.190971PGsq 196.1586297 77.84468361 2.519872 0.015629Income -1.417855478 2.096191368 -0.6764 0.502497Incsq 1.19331E-05 2.84886E-05 0.418872 0.677444

Page 15: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Log Linear Regression: LnMilkwh

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.9883461R Square 0.976828014Adjusted R Square 0.974813058Standard Error 0.175309427Observations 51

ANOVAdf SS MS F Significance F

Regression 4 59.59683702 14.89921 484.7889 5.81585E-37Residual 46 1.413736172 0.030733Total 50 61.01057319

Coefficients Standard Error t Stat P-valueIntercept 0.033793671 1.792943436 0.018848 0.985044LnPop 1.01392625 0.023922385 42.384 1.63E-38LnPkwh -0.9191971 0.123573131 -7.43849 2.01E-09LnPgas 0.416254503 0.113418083 3.670089 0.000629LnInc -0.45117695 0.179298162 -2.51635 0.015409

Page 16: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Demand Regression Project

DATA: ElecDemandData2012.xls under Project Materials on D2L

 1. Using the data file above, run a linear regression of

Dependent Variable: MilkwhExplanatory Variables: Pop, Pkwh, PGas, Income.

 Which coefficients (including the constant) are statistically significant at the 10% level or better?Which are not significant?How much of the variation in the dependent variable is explained by the estimated equation?Is the equation as a whole statistically significant? At what level? 

Page 17: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Finding the Marginal Revenue Equation: Overview Evaluate estimated demand at means of all

explanatory variables except price Calculate average effect of non-price variables

to get demand equation in this form Q = A - b(P)

Rearrange to find Inverse Demand equation P = (A/b) - (1/b)Q

MR has twice the slope of inverse demand MR = (A/b) – (2/b)Q

The end result is an equation, not a number

Page 18: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Finding the Marginal Revenue Equation: Example

Write your regression equation in this form Milkwh = 11,000 – 3600Pkwh + 0.0041Pop + 2150PGas

11,000 is the intercept or constant coefficient -3600, 0.0041, and 2150 are estimated coefficients These are made-up numbers for this example

Use the mean values of the non-electricity-price variables Pop=5,756,577 PGas=11.4

Substitute into your regression equation and simplify Milkwh = 11,000 – 3600Pkwh + 0.0041(5,756,577) + 2150(11.4) Milkwh= [11,000 + 23,601.97 + 24,510] – 3600(Pkwh) Milkwh = 59,111.97 – 3600(Pkwh) This is Q = A – bP from the previous slide

Page 19: Linear Regression A method of calculating a linear equation for the relationship between two or more variables using multiple data points

Finding Marginal Revenue Example, Continued

Milkwh = 59,111.97 – 3600(Pkwh) From end of previous slide

Rearrange to find Inverse Demand P = (A/b) - (1/b)Q = (A – Q)/b Pkwh = (59,111.97 – Milkwh)/(3600) Pkwh = 16.42 – 0.00028(Milkwh) This is the inverse demand equation

Marginal Revenue has twice the slope MR = 16.42 – 0.00056(Milkwh)