l4&5 multiple regression 2010b

Upload: nguyen-huynh

Post on 10-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 L4&5 Multiple Regression 2010B

    1/77

    Chi-square goodness-of-fit test

    The chi-square test can be used to determine whether the sample data conform toany kind of expected distribution and the data is categorical (nominal orordinal). The test determines whether the data fits a given distribution, such asuniform, normal,

    Where:

    f0 = frequency of observed (or actual) values

    fe = frequency of expected (or theoretical) values

    k = number of categoriesm = number of parameters being estimated from the sample data

    1

    e

    eo

    f

    ff 22 )( = df = k 1 m

  • 8/8/2019 L4&5 Multiple Regression 2010B

    2/77

    Chi-square test for independence

    The Chi-square test for independence is based on the count in acontingency (or cross tabs) table. It tests whetherthe counts for therow categories areprobabilistically independent of the counts for thecolumn categories.

    Where:

    Oi= Observed number of observations in category I

    Ei = Expected number of observations in category I

    2

    ij

    ijij

    ijE

    EO 22

    )( = df = (row 1)(col 1)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    3/77

    Chi-square test Local survey

    x In a national survey, consumers were asked the question, In general, how wouldyou rate the level of service that business in this country provide?

    x The distribution of responses was in the National column:x Suppose a manager wants to find out whether this result apply to his customers

    of her store in the city.x She did a similar survey to 207 randomly selected customers in her

    store and observed the results as in the

    Local column.x She can use Chi-square test to see

    if her observed frequencies of responsesare the same frequencies that would be

    expected on the national survey.3

    Nationalresponse

    Local response (of207 asked)

    Excellent 8% 21

    Pretty good 47% 109Only fair 34% 62

    Poor 11% 15

  • 8/8/2019 L4&5 Multiple Regression 2010B

    4/77

    Clive Morley 4

    Hypothesis Testing Local Survey Example

    x Example using Excel

    Microsoft Excel

    Worksheet

  • 8/8/2019 L4&5 Multiple Regression 2010B

    5/77

    Clive Morley 5

    Steps in Hypothesis Testing

    1: State the null and alternative hypotheses.

    2: Make a judgment about the population distribution,the level of measurement, and then select theappropriate statistical test.

    3: Decide upon the desired level of significance.

    4: Collect data from a sample and compute thestatistical test to see if the level of significance ismet.

    5: Accept or reject the null hypothesis.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    6/77

    Clive Morley 6

    Contingency Tables

    Two way table

    Test whether rows and columns are associated (orindependent)

    Can calculate expected numbers in each cell if rows andcolumns independentCompare with actual (observed)

    = (Oi - Ei)/Ei

  • 8/8/2019 L4&5 Multiple Regression 2010B

    7/77

    Clive Morley 7

    Contingency Tables - Example

    Two way tableeg. responses to question 6 (a, b, or c) by two groups:

    Q6 Group 1 Group 2

    a 10 18b 12 22

    c 15 26

    The numbers in the table are counts (frequencies) of thenumber falling into each category

  • 8/8/2019 L4&5 Multiple Regression 2010B

    8/77

    Clive Morley 8

    Contingency Tables - Example

    Q6 Group 1 Group 2

    a 27% 27%

    b 32 33

    c 41 39

  • 8/8/2019 L4&5 Multiple Regression 2010B

    9/77

    Clive Morley 9

    Contingency Tables - Example

    = 0.014

    prob = 0.993

    Not significant

    Q6 Group 1 Group 2a 10 18

    b 12 22c 15 26

    Chi-square test statistic = 0.0142

    p-value = 0.9929

    Microsoft Excel

    Worksheet

  • 8/8/2019 L4&5 Multiple Regression 2010B

    10/77

    Clive Morley 10

    Statistical Decision

    For t-test (for mean or proportion):Null Hypothesis: no-change situation

    For Chi square test:

    Null Hypothesis: two variable sets are independent

    test value t > t critical-value (usually 2): Reject Null Hypothesis

    p-value < alpha (usually 0.5): Reject Null Hypothesis

    test value Chi-square > Chi-square critical-valua: Reject Null Hypothesis

  • 8/8/2019 L4&5 Multiple Regression 2010B

    11/77

    Clive Morley 11

    Type I and Type II errors

    Two ways a hypothesis test result can be wrong:

    I - find hypothesis is wrong, when it is correct

    II - find hypothesis is correct, when it is wrong

  • 8/8/2019 L4&5 Multiple Regression 2010B

    12/77

    Clive Morley 12

    Type I and Type II errors

    TEST FINDS

    REALITYHypothesis correct Hypothesis wrong

    Hypothesis correct type II error

    Hypothesis wrong type I error

    (test significance level)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    13/77

    Clive Morley 13

    Type I and Type II errors

    Prob value = observed probability of type I error

    In control charts, control limits often set at 3 standard deviationsequivalent to setting probability of type I error at 0.003

    minimizes reacting when dont need to

    Using t = 2equivalent to setting probability of type I error at 0.05

  • 8/8/2019 L4&5 Multiple Regression 2010B

    14/77

    BUSM 4074Management Decision

    Making

    Prof. Clive MorleyGraduate School of Business

  • 8/8/2019 L4&5 Multiple Regression 2010B

    15/77

    BUSM 4074 Management DecisionMaking

    4. Multiple regression

    5. Multiple regression (cont)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    16/77

    Unit 4&5 - Learning Objectives

    x To understand the use of the multiple regression technique,including linear, log-log, logit, autoregressive and time seriesmodels

    x To be able to carry out straightforward multiple regression

    model estimationx To be able to interpret standard computer output from a multiple

    regression exercise, including to assess variables forsignificance, estimate the size of an explanatory variablesimpact on the dependent variable, assess model fit and to use

    the model to estimate values of the dependent variable

    Clive Morley 16

  • 8/8/2019 L4&5 Multiple Regression 2010B

    17/77

    as my salary increases, computers are getting

    cheaper,

    therefore to get cheaper computers, pay me more

    What is wrong with this (very attractive) argument?

  • 8/8/2019 L4&5 Multiple Regression 2010B

    18/77

    Multiple regression

    very powerful widely used statistical techniquemany applications in all sorts of areas

    used to estimate the relationship between variables

    For example, Y might be the sales of a certain item and Xthe price of it. The linear relationship is estimated:

    Y = a + bX

  • 8/8/2019 L4&5 Multiple Regression 2010B

    19/77

    Multiple regression

    parameters a and b are estimated from data on thevariables X and Y

    correlation establishes whether a linear relationship existsand how strong it is

    regression estimates what the relationship is

  • 8/8/2019 L4&5 Multiple Regression 2010B

    20/77

    Multiple regression

    model is readily extended to include other explanatoryvariables: for example, sales (Y) might depend on price(X1), buyers incomes (X2) and advertising expenditure(X3), giving the equation to be estimated

    Y = a + b1X1 + b2X2 + b3X3

    Data on a number of cases (eg. various sales areas ordifferent times) for all the variables is needed

  • 8/8/2019 L4&5 Multiple Regression 2010B

    21/77

    Multiple regression

    the explanatory variables do not exactly predict the valueof Y

    due to random effects

    the impacts of other (hopefully minor) variables, etc.

    so the equation does not exactly fit

    residuals

  • 8/8/2019 L4&5 Multiple Regression 2010B

    22/77

    Purposes of multiple regression

    x to estimate the equation, so we canpredict Y for givenvalues of the explanatory variables, or

    x to estimate the effects of variables on Y (through the b

    parameters of the variables of interest, and also throughthe variables correlation with Y), or

    x to determine which potential explanatory variableshave a significant impact on Y (through testing thesignificance of the relevant b values).

  • 8/8/2019 L4&5 Multiple Regression 2010B

    23/77

    Theory Least squares

    Computer finds the values for parameters that give theline of best fit

    Best fit is defined as minimising the sum of squarederrors (SSE)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    24/77

    Theory model specification

    Y is some function of a lot of explanatory variables

    Narrow the lot ofexplanatory variables down to those

    expected to be important (ignore others)

    Then specify functional form of relationship linear isusual starting point for regression

    (but see discussion of log-log models below)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    25/77

    Theory model specification

    Model specification which variables, linear (or other)form, etc based on relevant theory

    Estimated relationship thenbased on data

  • 8/8/2019 L4&5 Multiple Regression 2010B

    26/77

    Multiple regression

    overall fit of the equation estimated is measured byR-squared (R)the proportion of the variation in Y explained by theequation

    also is the square of the correlationbetween the fitted andactual Y values

    Each parameter estimated (and hence each variable) can betested for individual significance

  • 8/8/2019 L4&5 Multiple Regression 2010B

    27/77

    Linear Regression Example

    Data: House Price(y)

    Sq Feet(x)

    245 1400312 1600

    279 1700308 1875199 1100219 1550

    405 2350324 2450319 1425255 1700

  • 8/8/2019 L4&5 Multiple Regression 2010B

    28/77

    Plot of data

    Linear Regression Example

    y = 75.814 + 0.123xSq.Fteet

  • 8/8/2019 L4&5 Multiple Regression 2010B

    29/77

    29

    Simple Linear Regression Model

    iii xy ++= 10

    Random error for thisXi value

    ii

    Slope = 1

    Intercept = 0

    Observed

    value of Y forXi

    Y

    Xi

    cmXY ^

    +=

  • 8/8/2019 L4&5 Multiple Regression 2010B

    30/77

    X(SqFt)

    Y($000)

    Predicted Y

    Y

    Y - Y

    1400 245 251.92 -6.92316

    1600 312 273.88 38.12329

    1700 279 284.85 -5.85348

    1875 308 304.06 3.93716

    1100 199 218.99 -19.99284

    1550 219 268.39 -49.38832

    2350 405 356.20 48.79749

    2450 324 367.18 -43.17929

    1425 319 254.67 64.33264

    1700 255 284.85 -29.85348

    30

    Excel Residual Output for House Price model

    It shows how well the regression line fits the datapoints. The best and worst predictions were 3.94 and

    64.3, respectively.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    31/77

    31

    _

    _

    Measures of variation

    SSE: Sum of Squares of Error, SSR: Sum of Squares of RegressionXi

    Y

    X

    Yi

    SSyy = (Yi - Y)2SSE = (Yi - i )2

    _

    YY

    Y_

    SSR = (i - Y)2

  • 8/8/2019 L4&5 Multiple Regression 2010B

    32/77

    Total Sum ofSquares

    Regression Sum ofSquares

    Error Sum ofSquares

    32

    x Total variation is made up of two parts: SS yy = SSR + SSE

    =

    2)( YYSSR i =2)( ii YYSSE

    =

    2)( YYSSiyy

    Where: Y = Average value of the dependent variable

    Yi = Observed values of the dependent variable

    i = Predicted value of Y for the given X i value

    _

    SSyy = Total Sum of SquaresMeasures the variation of the Yi values around their mean YSSR = Regression Sum of SquaresExplained variation attributable to the relationship between X and YSSE = Error Sum of Squares

    Variation attributable to factors other than the relationship between X and Y

    Measures of variation

  • 8/8/2019 L4&5 Multiple Regression 2010B

    33/77

    33

    Standard Error of the Estimate

    ( )

    =

    =

    xybyby

    SSE

    yy10

    2

    2

    The standard error of the estimate is the standard deviation oftheerror of a regression model

    Sum of SquaresError

    Standard Error

    of the

    Estimate 2

    =

    n

    SSEse

    Standard Error of the Estimate tells us how spread-out the errors is.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    34/77

    Computer output:Correlation R = 0.837 , R-squared = 0.700

    Coefficient t sig

    Constant 75.813 2.508 0.0204

    Sq. feet 0.123 7.009 0.0000

    Linear Regression Example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    35/77

    Linear regression - Example

    the model estimated is:House Price = 75.813 + 0.123 Sq.Feet

    x correlation between House Price and Sq. Feet is high, at0.837, and the fit of the regression model is quite strongR = 0.700, i.e. 70%

    x The Sq. Feet variable is highly significantt = 7.009,p = 0.0000

    x Implicit hypothesis is thatcoefficient is zero, i.e. variablehas no impact

  • 8/8/2019 L4&5 Multiple Regression 2010B

    36/77

    Linear regression - Example

    Add another variable to data - LocationPrice Sq. Feet Location

    245 1400 2

    312 1600 3

    279 1700 4308 1875 3

    199 1100 5

    219 1550 1

    405 2350 1

    324 2450 5319 1425 4

    etc

  • 8/8/2019 L4&5 Multiple Regression 2010B

    37/77

    Linear regression - Example

    Computer output:

    Correlation R = 0.839 , R-squared = 0.705Without Location: Correlation R = 0.837 , R-squared = 0.700

    Coefficient t sig

    Constant 73.510 2.366 0.0282

    Sq. Feet

    Location

    0.120

    2.283

    6.475

    0.525

    0.0000

    0.6050

  • 8/8/2019 L4&5 Multiple Regression 2010B

    38/77

    Slight improvement in R

    Location not significant (sig orp-value high)

    - considerdropping from model

    Linear Regression Example

    Coefficient t sig

    Constant 73.510 2.366 0.0282

    Sq. FeetLocation

    0.1202.283

    6.4750.525

    0.00000.6050

  • 8/8/2019 L4&5 Multiple Regression 2010B

    39/77

    Linear regression - example

    Model Market to Book Value (MBV) as function ofRevenueData

    Company MBV Revenue

    12

    3

    2.0111.814

    1.522

    39.5054.165

    10.40645

    1.8261.824

    7.6022.942

    6

    7etc

    1.337

    1.650

    5.228

    1.697

  • 8/8/2019 L4&5 Multiple Regression 2010B

    40/77

    Linear regression - example

    OutputDep Var: MBV N: 71

    Multiple R: 0.318

    Squared multiple R: 0.101

    variable coefficient t value sig

    Constant 2.010 11.465 0.000

    Revenue 0.046 2.789 0.007

  • 8/8/2019 L4&5 Multiple Regression 2010B

    41/77

    Linear regression - example

    Model is:MBV = 2.010 + 0.046 Revenue

    Fit not great (R = 0.10, 10%)But significant (F = 7.778,p = 0.007)

    Revenue variable is significant

    (t = 2.789, p = 0.007)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    42/77

    Linear regression - example

    More factors (variables) impact on MBV and need to beconsidered

    *** WARNING ***

    Case 1 has large leverage

    (Leverage = 0.243)Case 8 has large leverage

    (Leverage = 0.163)

    Case 56 is an outlier

    (Standardized Residual = 5.167)Durbin-Watson D Statistic 1.682

    First Order Autocorrelation 0.140

  • 8/8/2019 L4&5 Multiple Regression 2010B

    43/77

    Multiple regression

    Avoid step-wise regression

    Look fornon-linear patterns in scattered plot

    Diagnostic checksx Multicollinearity (different xs move together in systematic way)x

    Autocorrelation (successive error terms are correlated with eachother)x Outliers (data points that are not together with the rest)x Heteroscedasticity (non-constant variance)x Leverage (observation with large effects on outcomes)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    44/77

    Multiple regression - Example

    Hospital and Nursing Salary example

    (9.10 of textbook)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    45/77

    Multiple regression - Example

    Dialogue box

    Dependent: Annual Nursing Salary

    Independents: Number of beds in home

    Annual medical in-patient daysAnnual total patient days

    Rural (1) and non-rural (0) homes

  • 8/8/2019 L4&5 Multiple Regression 2010B

    46/77

    Multiple regression - Example

    Model Summary

    R R Square Adjusted R Square

    0.8803 0.775 0.7557

    Std. Error of the Estimate $82,024.63

    ANOVA F SigRegression 40.4375 0.000

  • 8/8/2019 L4&5 Multiple Regression 2010B

    47/77

    Multiple regression - Example

    R=0.88 Coeff. of correlation, the relationship between 2variables. R=-1: strong, negative relationship. R=1: strong,positive relationship. R=0: no relationship between 2 variables.

    R Square =0.775 Coeff. of Determination. 77.5% of a change in Ycan be explained by a change in X. The other 22.5% is by some

    other factors. This fit is quite strong.Adjusted R Square=0.7557 Adjusted for multiple variables. A

    decrease Adjusted R Square means the newly added variable is notsignificant.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    48/77

    Multiple regression - Example

    Std. Error of the Estimate $82,024.63Sig=0.000 Significant fit

    p=0.1799 (beds) Too high (compared to ), shouldconsider dropping beds as a variable of fit

  • 8/8/2019 L4&5 Multiple Regression 2010B

    49/77

    Multiple regression - Example

    Coeff-icients

    StandardError

    t value p value(sig)

    Constant (Intercept) 113.5003 495.4654 0.2291 0.8198

    Number of beds in home 9.6399 7.0804 1.3615 0.1799

    Annual medical in-patientdays (100s)

    -7.4072 2.4012 -3.0848 0.0034

    Annual total patient days(100s)

    15.7674 2.7550 5.7232 0.0000

    Rural (1) and non-rural (0)homes

    -79.5796 288.1857 -0.2761 0.7837

  • 8/8/2019 L4&5 Multiple Regression 2010B

    50/77

    Multiple regression - Example

    The interpretation of the coefficients is that if the in-patient days,total patient days and rural factor are held constant, then the annualnursing salary is expected to increase by $9.64 for each extra bed inhome. Similarly, annual nursing salary is expected to increase (decrease)by

    $-740.72, $1576.74 and $-79.58 for each extra in-patient day, patientday and rural factor, respectively, other variables held constant. The $11300 can be interpreted as the annual base salary.

    Coeff-icients Standard Error t value p value (sig)

    Constant (Intercept) 113.5003 495.4654 0.2291 0.8198

    Number of beds in home 9.6399 7.0804 1.3615 0.1799Annual medical in-patient days (100s) -7.4072 2.4012 -3.0848 0.0034

    Annual total patient days (100s) 15.7674 2.7550 5.7232 0.0000

    Rural (1) and non-rural (0) homes -79.5796 288.1857 -0.2761 0.7837

  • 8/8/2019 L4&5 Multiple Regression 2010B

    51/77

    Multiple regression - Example

    x Compare the intercepts and theslopes of multipleregression with those of linear regression?

    Changes have occurred. (Difficult to analyse in detail)

    x Sestill is the error of the estimate. Note that multipleregression yields abetter s

    ethan those in linear

    x R2 similarly (but would increase with extra xs)x

    Adjusted R2

    a decrease indicates an added x notbelong to the equation.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    52/77

    Multiple regression - Example

    x Tolerance stats OK (> 0.1), so no multicollinearityissue. If individual R2 is too high (almost equal R2 ofmultiple regression): suspect multicollinearity!

    x Durbin-Watson stat d = 2.4789, somewhat negativelyautocorrelation issue. 1 < d 2 would indicate noautocorrelation concern!

    x

    Outlier see graphs of residuals. Normal shape onhistogram, random (no patterned) on scatter plots.

  • 8/8/2019 L4&5 Multiple Regression 2010B

    53/77

    Multiple regression - Example

    RegressionStandardizedResidual

    6.00

    5.00

    4.00

    3.00

    2.00

    1.00

    0.00

    -1.00

    -2.00

    -3.00

    -4.00

    Histogram

    Dependent Variable: Current Salary

    Frequen

    cy

    160

    140

    120

    100

    80

    60

    40

    20

    0

    Std. Dev=1.00

    Mean=0.00

    N=474.00

    S t a n d a r d iz e d R e

    0

    5

    1 0

    1 5

    2 0

    - 3 .5 - 3 - 2 .5 - 2 - 1 .5 - 1 - 0 .5 0 0 .5 1 1 .5 2 2 .5 3 3 .5

    V a r ia

    Freque

    ncy

    Standardized Residual distribution is relatively normal. Relatively good fit

  • 8/8/2019 L4&5 Multiple Regression 2010B

    54/77

    Multiple regression - Example

    Scatter plot Randomly distributed. Both sides of 0.00

    Scatterplot

    Dependent Variable: Current Salary

    Current Salary

    140000120000100000800006000040000200000

    RegressionStandardizedResidual

    8

    6

    4

    2

    0

    -2

    -4

    -6

    -2500.0000

    -2000.0000

    -1500.0000

    -1000.0000

    -500.0000

    0.0000

    500.0000

    1000.0000

    1500.0000

    2000.0000

    0 10 20 30 40 50 60

    Estimate

    The red plot is an example of aheteroscedasticity (unequal variance

    distribution)

  • 8/8/2019 L4&5 Multiple Regression 2010B

    55/77

    Multiple regression - Example

    Dummy variables categorical data, related to dependentvariable

    Other names: indicators, 0-1 variables

    If dummy variable = 1: correct categorydummy variable = 0: not in that category

    The coefficient of this variable indicates the dependentvariable difference due to this (dummy) variable

  • 8/8/2019 L4&5 Multiple Regression 2010B

    56/77

    Salary = 113.50 + 9.64Bed 7.41In-ptDay + 15.77Tot-ptDay 79.58Rural

    Rural = 0 and Rural = 1: Salary difference = - $7958(rural is lower)

    Two or more categorical variables can be involved. Itindicates the y difference when the rest is the same

    Coeff-icients Standard Error t value p value (sig)

    Constant (Intercept) 113.5003 495.4654 0.2291 0.8198Number of beds in home 9.6399 7.0804 1.3615 0.1799

    Annual medical in-patient days (100s) -7.4072 2.4012 -3.0848 0.0034

    Annual total patient days (100s) 15.7674 2.7550 5.7232 0.0000

    Rural (1) and non-rural (0) homes -79.5796 288.1857 -0.2761 0.7837

    Multiple regression - Example

    A l i R i

  • 8/8/2019 L4&5 Multiple Regression 2010B

    57/77

    Analysing a Regression

    x

    p-value of the Regressionx p-value of each x, to consider dropping it or not

    x Adjusted R-square value

    x Standard Error of the Regression estimate

    x Scatter plot of residuals Randomness. Outliers. Heteroscedacisticity (non-

    equal variance)x Histogram of the residuals

    x Durbin-Watson statistic (d)

    Linear, Quadratic and Log regressionl

  • 8/8/2019 L4&5 Multiple Regression 2010B

    58/77

    example

    The Public Service Electric Company produces differentquantities of electricity each month, depending on thedemand. File Poly and Log examples - Power.xls listthe number of units of electricity produced (Units) and

    the total cost of producing these (Cost) for a 36-monthperiod. How can regression be used to analyse the

    relationship between Cost and Units?

    M l i l i E l

  • 8/8/2019 L4&5 Multiple Regression 2010B

    59/77

    Multiple regression - Example

    R Square 0.7359Standard Error 2733.7424

    R Square 0.8216Standard Error 2280.7998

    L d l

  • 8/8/2019 L4&5 Multiple Regression 2010B

    60/77

    Log model

    Very often we use multiple regression to fit a multiplicative model:Y = aX1b1 X2b2 X3b3

    Any explanatory variable change by 1%, the dependent variablechange by a constant percentage

    This can be estimated by making a logarithmic transformation ofthe equation, which gives:

    ln(Y) = ln(a) + b1ln(X1)+ b2ln(X2)+ b3ln(X3)

    L d l

  • 8/8/2019 L4&5 Multiple Regression 2010B

    61/77

    Log model

    Thus we can calculate ln(Y), ln(X1), ln(X2), ln(X3)

    And regress these variables in the usual way, to estimatethe parameters of the original equation.

    L d l l

  • 8/8/2019 L4&5 Multiple Regression 2010B

    62/77

    Log model example

    File CarSales.xls contains annual data (1970 1999) ondomestic auto sales in the United States. The variables

    are defined as:Sales: annual domestic auto sales (in number of units)PriceIndex: consumer price index of transportationIncome: real disposable income

    Interest: prime rate interest

    M lti l i E l

  • 8/8/2019 L4&5 Multiple Regression 2010B

    63/77

    Regression and Correlation

    Observations 30R Square 0.5414

    Standard Error 758049.7773Adjusted R Square 0.4680Multiple R 0.7358

    Multiple regression - Example

    LogRegres Coefficients t value p value

    Intercept -110360558.48 -45.9500 0.0000

    Log(Sales) 7522741.47 54.4195 0.0000

    Log(PriceIndex) 35983.70 0.2297 0.8202Log(Income) -162258.29 -0.6222 0.5395

    Log(Interest) -13588.13 -0.2133 0.8328

    Regression and Correlation

    Observations 30

    R Square 0.9956

    Standard Error 74199.1103Adjusted R Square 0.9949

    Multiple R 0.9978

    MultiRegres Coefficients t value p valueIntercept 513941538.55 0.7356 0.4688

    Year -258651.57 -0.7234 0.4761

    PriceIndex -18121.97 -0.4786 0.6364

    Income 2175.75 1.1204 0.2732

    Interest -8895378.05 -1.4810 0.1511

    Multiple regression Example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    64/77

    Multiple regression - Example

    Log modelProbably slightlybetter model

    R-square = 0.99, Good

    Less outliers, slightly better

    Residual plots not necessarily better

    Multiple Regression Goal

  • 8/8/2019 L4&5 Multiple Regression 2010B

    65/77

    Multiple Regression Goal

    Remove any unimportant(multicorrelation orautocorrelation, etc) variables

    out of the equation and decidewhich variable(s) are importantfor the regression model.

    Use that model for yourprediction.

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    66/77

    Multiple regression time series example

    Plot CarSales.xls data, Year vs. Sales

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    67/77

    Multiple regression time series example

    Period Sales (000)

    2003 Quarter I 25.4

    2003 Quarter II 23.8

    2003 Quarter III 22.0

    2003 Quarter IV 28.6

    2004Quarter I 28.5

    2004 Quarter II 27.0etc

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    68/77

    Multiple regression time series example

    0 5 10 15 20TIME

    20

    25

    30

    35

    40

    45

    SA

    LES

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    69/77

    Multiple regression time series example

    Create dummy variables for the Quarters and time period

    Period Sales Time QII QIII QIV

    2003 Q I 25.4 1 0 0 0

    2003 Q II 23.8 2 1 0 02003 Q III 22.0 3 0 1 0

    2003 Q IV 28.6 4 0 0 1

    2004 Q I 28.5 5 0 0 0

    2004 Q II 27.0 6 1 0 0etc

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    70/77

    Multiple regression time series example

    Squared multiple R: 0.987

    Effect Coefficient t P

    CONSTANT 23.679 50.5 0.000

    TIME 1.005 28.5 0.000

    QII -2.525 -5.2 0.000QIII -5.070 -9.8 0.000

    QIV 0.450 0.9 0.401

    Could drop QIV and re-estimate

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    71/77

    Multiple regression time series example

    Model as estimated is:

    Sales

    = 23.679 + 1.005 Time - 2.525QII 5.070QIII + 0.450QIV

    Say data ended at Time = 24 i.e. 2008 QIVUse model to forecast

    e.g. forecast sales in 2009 in quarters I and II

    Multiple regression time series example

  • 8/8/2019 L4&5 Multiple Regression 2010B

    72/77

    Multiple regression time series example

    2009 quarters I is Time = 25, QI = 1, QII=0, QIII=0, QIV=0

    Sales = 23.679 + 1.005 25 - 0 0 + 0

    = 48.808 i.e. $48,800

    2009 quarters II is Time = 26, QI = 0, QII=1, QIII=0, QIV=0Sales = 23.679 + 1.005 26 - 0 2.525 + 0

    = 47.284 i.e. $47,300

    Autoregression

  • 8/8/2019 L4&5 Multiple Regression 2010B

    73/77

    Autoregression

    Another way of dealing with time series is Autoregression

    Often used when Durbin-Watson indicates autocorrelation (acommon issue with time series data)

    Or because it makes theoretical sensethat one periods value depends (partly) on the previousvalue of the series

    Use previous (lagged) values as explanatory variable

    Autoregression

  • 8/8/2019 L4&5 Multiple Regression 2010B

    74/77

    u o eg ess o

    In example, add another variable, which is the lagged sales

    Period Sales Time QII QIII QIV lagSales2003 Q I 25.4 1 0 0 0 -

    2003 Q II 23.8 2 1 0 0 25.42003 Q III 22.0 3 0 1 0 23.82003 Q IV 28.6 4 0 0 1 22.02004 Q I 28.5 5 0 0 0 28.6

    2004 Q II 27.0 6 1 0 0 28.5etc

    Autoregression

  • 8/8/2019 L4&5 Multiple Regression 2010B

    75/77

    g

    The lagged variable would replace the Time (trend) variable

    First data point is lost, as we dont have a lagged value for it

    Can handle seasonality by having another variable, Sales

    lagged by the seasonality period (e.g. 4 terms)

    Logit regression

  • 8/8/2019 L4&5 Multiple Regression 2010B

    76/77

    g g

    If the dependent variable is categorical, not metrice.g. accounting graduates, membership of CPA Aust ornot is dependent variableX variables might be gender, age, importance of

    joining cost, importance of brand status, etc

    Regression possible, special technical issues

    Reference

  • 8/8/2019 L4&5 Multiple Regression 2010B

    77/77

    Ragsdale (2008) chapter 9, + pp522-28