abbott 351tutorial3 f08tutorial

Upload: jsmith84

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    1/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Stata 10Tutorial 3

    TOPIC: The Basics of OLS Estimation in Stata

    DATA: auto1.dta (theStata-format data file you created inTutorial 1)

    or

    auto2.dta (theStata-format data file you created inTutorial 2)

    TASKS:Stata Tutorial 3 is intended to introduce you to theStataregresscommand for OLS (Ordinary Least Squares) estimation of linearregression models. It also demonstrates several post-estimation commandsthat are often used with theregress command, and otherStatacommands

    that are useful in displaying and using the results of the regress command.

    TheStatacommands introduced in this tutorial are:codebook Displays properties of variables in the current data set.regress Performs OLS estimation of linear regression models.

    _b[varname] Contains thecoefficient estimatefor the regressor varname._se[varname] Contains thestandard error of the coefficient estimate for the

    regressor varname.e( ) Saves selected results from most recentregress command.

    vce Displays estimated covariance matrix of coefficient estimates.matrix get Accesses coefficient estimates and the covariance matrix.display Computes and displays the values of algebraic expressions.scalar Defines the contents of scalar variables.scalar list Lists the names and values of currently-defined scalar

    variables.scalar drop Eliminates previously-defined scalars from memory.matrix Defines matrices and performs matrix computations.matrix list Lists contents of a vector or matrix.

    NOTE: Statacommands arecase sensitive. All Statacommands must be typed inthe Command window in lower case letters.

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 1 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    2/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    HELP: Statahas an extensive on-lineHelp facility that provides fairly detailedinformation (including examples) on all Statacommands. Students shouldbecome familiar with theStataon-lineHelp system. In the course of doingthis tutorial, take the time to browse theHelp information on some of the

    aboveStatacommands. To access the on-lineHelp for anyStatacommand:

    choose (click on) Help from theStatamain menu bar click onStata Command in theHelp drop down menu type the full name of theStatacommand in theStatacommand dialog

    box and click OK

    Preparing for Your StataSession

    In Stata Tutorial 1, you created and saved on your own diskette theStata-formatdata setauto1.dta. In Stata Tutorial 2, you created and saved on your own diskettetheStata-format data set auto2.dta. Either of these twoStata-format data sets iscompletely adequate for doingStata Tutorial 3.

    Before beginning your Statasession, use Windows Explorer to copy theStata-format data set auto1.dta or auto2.dta to theStataworking directoryon the C:-drive or D:-drive of the computer at which you are working.

    On the computers in Dunning 350, the defaultStataworking directory isusually C:\data.

    On the computers in MC B111, the defaultStataworking directory is usuallyD:\courses.

    Start Your StataSession

    To start your Stata session, double-click on theStata 10 icon in the Windows

    desktop.

    After you double-click theStata 10 icon, you will see the now familiar screen offour Statawindows.

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 2 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    3/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Record Your StataSession log using

    To record your Stata session, including all theStatacommands you enter and theresults (output) produced by these commands, make a .log file named

    351tutorial3.log. To open (begin) the .log file351tutorial3.log, enter in theCommand window:

    l og usi ng 351t ut or i al 3. l ogThis command opens a text-format file called351tutorial3.log in the currentStataworking directory. Remember that once you have opened the351tutorial3.log file,a copy of all the commands you enter during your Statasession and of all theresults they produce is recorded in that 351tutorial3.log file.

    An alternative way to open the.log file351tutorial3.log is to click on theLogbutton; click onSave as type: and selectLog ( *. l og) ; click on theFile name:box and type the file name351tutorial3; and click on theSave button.

    Loading a Stata-Format Data Set into Stata use

    Load, or read, into memory the data set you are using. To load theStata-formatdata fileauto1.dta into memory, enter in the Command window:

    use auto1This command loads into memory theStata-format data setauto1.dta.

    Alternatively, to load theStata-format data fileauto2.dta into memory, enter inthe Command window:

    use auto2

    This command loads into memory theStata-format data setauto2.dta.

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 3 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    4/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Summarizing and Viewing the Current Data Set describe and list

    To summarize the contents of the current data set, use thedescribe command.Recall that thedescribe command displays a summary of the contents of the

    current data set in memory, which in this case is either auto1.dta or auto2.dta.

    To summarize the contents of the current data set in memory, enter in theCommand window:

    descr i beThe display generated by thisdescribe command may indicate that the currentdata set is sorted by the indicator variableforeign.

    If the current data set is not sorted by the indicator variableforeign, sort it andinspect the results by entering the following commands:

    sor t f or ei gndescr i be

    To see directly how the observations in the current data set are ordered, enterthe command:

    l i st f or ei gn pr i ce wei ght mpgNotice that all observations for domestic cars occur before the observations forforeign cars.

    Calculating descriptive summary statistics summarize

    Recall that thesummarize command calculates and displays descriptive summarystatistics for some or all of thenumeric variables in the current data set.

    To calculate and display basic descriptive summary statistics for all variablesand all observations in the current data set, enter in the Command window:

    summar i ze

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 4 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    5/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Displaying variable characteristics codebook

    Thecodebookcommand displays the characteristics, or properties, of anyvariable(s) in the current data set.

    To display the characteristics of the variablemake, enter in the Commandwindow:

    codebook makeExamine carefully the screen display: it tells you that make is astring variablewith maximum length of 17 characters, thatmake takes 74 distinct values in thedata set, and that there are no missing values for the variablemake. It alsodisplays some examples of the string values taken by the variable make.

    Display the characteristics of some of thenumeric variables in the data set byentering in the Command window:

    codebook pr i ce mpg wei ght f orei gnAgain, examine carefully the screen display for each of thesenumericvariables. How is the displayed information for the indicator variableforeigndifferent from that for the other numeric variables?

    OLS estimation of linear regression models regress

    The basicStatacommand for OLS estimation of linear regression models is theregress command. SubsequentStata tutorials will help you learn how to use andinterpret the many features of theregress command. This section introduces you tothe basic features of theregress command.

    To estimate by OLS the simple linear regression model given by the PREii10ii uweightpriceY ++== (1)

    for the full sample of observations in the current data set, enter in the Commandwindow:

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 5 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    6/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    r egr ess pr i ce wei ght Examine the results of this command. See what elements of the resultsdisplayed by theregress command you can identify.

    Now estimate by OLS the simple linear regression model given by the PREii10i umpgprice ++= (2)

    for the full sample of observations in the current data set. Enter in theCommand window:

    r egr ess pr i ce mpgAgain, examine the results of thisregress command. Most of the displayedresults of theregress command will be meaningless to you now, but you willsoon learn what they all mean.

    Options to use with the regresscommand

    This section introduces you to someregress commandoptionsand to some othercommands that are often used with theregress command.

    Basic Syntax ofregress Command

    [by varname:] regress depvar varlist [ifexp] [in range] [,options]

    where

    depvar is the user-supplied name of the regressand, or dependent variable,Y i;

    varlist is a list of the user-supplied names of the regressors, or independentvariables, X1i, X2i, ..., Xki.

    Optional components of theregress command are enclosed in square brackets [ ],which are not typed as part of the command.

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 6 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    7/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Two of theoptionsavailable with theregress command are:

    level(#) specifies the confidence level #, in percent, to be used forcomputing confidence intervals of the regression coefficients;

    noconstant suppresses the constant term in the regression function; i.e.,sets the intercept coefficient0 equal to zero.

    Computing OLS Regression Equations for Subsets of Sample Observations

    -- the ifoption on regress

    The ifoption can be used with theregress command to compute separate OLSsample regression equations for specified subsamples of observations. For

    example, suppose you wish to compute separate OLS estimates of regressionequation (1) for domestic and foreign cars. Recall that foreign is an indicatorvariablethat distinguishes between foreign and domestic cars; foreign is definedto equal 1 for foreign cars, and 0 for domestic cars.

    To compute OLS estimates of regression equation (1) for thesubsample ofdomestic cars, enter in the Command window:

    r egr ess pr i ce wei ght i f f or ei gn==0 Similarly, to compute OLS estimates of regression equation (1) for the

    subsample offoreign cars, enter in the Command window:

    r egr ess pr i ce wei ght i f f or ei gn==1 An alternative way to estimate separate OLS regression equations for specified

    subsamples of observations is to use thebysort varname:option in front of theregress command. For example, to estimate regression equation (1) separatelyfor the subsamples of foreign and domestic cars, enter the commands:

    bysor t f or ei gn: r egr ess pr i ce wei ght This command does the same thing as the two regress commands thatimmediately preceded it. Note that thebysort option checks to see if the currentdataset in memory is sorted according to the values of the indicator variableforeign: if it is not sorted, then thebysort option performs the sort before

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 7 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    8/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    executing the ensuingregress command; if it is already sorted by foreign, thenStata immediately proceeds to execute theregress command.

    Computing Confidence Intervals for the Regression Coefficients level(#)

    The level(#) option on theregress command can be used to change theconfidencelevel used in constructing two-sided confidence intervals for the regressioncoefficients. The#is set to the desired confidence level, such as 90 or 99. If thelevel(#) option is not specified, theregress command computestwo-sided 95percent confidence intervals for each regression coefficient; the default value ofthe confidence level #is therefore 95 (or 1 =0.95). To compute two-sidedconfidence intervals for confidence levels other than 95 percent, use the level(#)option on theregress command.

    To computetwo-sided 90 percent confidence intervals for each regressioncoefficient, enter either of the commands:

    r egr ess pr i ce wei ght , l evel ( 90) r egr ess, l evel ( 90)

    To computetwo-sided 99 percent confidence intervals for each regressioncoefficient, enter either of the commands:

    r egr ess pr i ce wei ght , l evel ( 99) r egr ess, l evel ( 99)

    Compare the two-sided confidence intervals for these three difference confidencelevels. For which confidence level are the confidence intervals widest? For whichconfidence level are the confidence intervals narrowest?

    Accessing Coefficient Estimates and Standard Errors _b[] and_se[]

    Basic Syntax:

    _b[varname] (or its synonym_coef[varname]);_se[varname]

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 8 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    9/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    wherevarnameis the user-supplied variable name for one of the regressors inthe most recentregress command.

    Accessing coefficient estimates. _b[varname], or its synonym_coef[varname], contains thecoefficient estimatefor the regressor varnameinthe most recentregress command. Thus,_b[weight] and_coef[weight] both

    contain the value of the OLS estimate of the regression coefficient on the

    regressor weight in the previous OLS regression.1

    Use either of the following display commands to simply display in the Results

    window the value of the OLS slope coefficient estimate :1

    di spl ay _b[ wei ght ] di spl ay _coef [ wei ght ]

    TheStatasystem variable_cons is always equal to the number 1, and refers to

    the intercept coefficient estimate when used with_b[ ] and_coef[ ]. Thus,

    _b[_cons] and_coef[_cons] both contain the value of the OLS estimate of

    the intercept coefficient0

    0 in the previous OLS regression. To display in the

    Results window the value of the OLS slope coefficient estimate , enter either

    of the following commands in the Command window:0

    di spl ay _b[ _cons] di spl ay _coef [ _cons] Note:_b[ ] and_coef[ ] are equivalent; that is, they are synonyms. Therefore,

    _b[_cons] =_coef[_cons] = = the OLS intercept coefficient estimate;0

    _b[weight] =_ coef[weight] = = the OLS slope coefficient estimate1for the regressor weight.

    Accessing estimated standard errors. _se[varname] contains theestimatedstandard error of the coefficient estimate for the regressor varnamein the most

    recentregress command. Thus,_se[weight] contains , the estimated

    standard error of the OLS slope coefficient estimate . Similarly,_se[_cons]

    )(es 1

    1

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 9 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    10/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    contains , the estimated standard error of the OLS intercept coefficient

    estimate .

    )(es 0

    0 Enter the followingdisplay commands to display in the Results window the

    values of and from the previous OLS regression:se$($ )1 )(es 0

    di spl ay _se[wei ght ] di spl ay _se[ _cons]

    Saving Coefficient Estimates and Standard Errors scalar

    Statastores the values of coefficient estimates in_b[ ] (or in_ coef[ ]) and the

    values of estimated standard errors in_se[ ] only temporarily that is until anothermodel estimation command such asregress is entered.

    To save the values of coefficient estimates and their estimated standard errorsfor subsequent use, you can usescalar commands to assign names to these values.

    Basic Syntax:

    scalar scalar_name= exp

    wherescalar_nameis the user-supplied name for the scalar and exp is an algebraicexpression or function.

    The followingscalar commands name and save the values of theOLScoefficient estimates and from the most recentregress command, which

    performed OLS estimation of regression equation (1). Enter the commands:0 1

    scal ar b0 = _b[ _cons]scal ar b1 = _b[ wei ght ]

    The followingscalar commands name and save the values of theestimatedstandard errors for the OLS coefficient estimates and . Enter the

    commands:0 1

    scal ar seb0 = _se[ _cons] scal ar seb1 = _se[ wei ght ]

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 10 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    11/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    You may also wish to generate theestimated variancesof the OLS coefficientestimates and , which are equal to the squares of the corresponding

    estimated standard errors. Enter the following scalar commands to do this:0 1

    scal ar varb0 = seb0 2scal ar varb1 = seb1 2

    To display the values of scalar variables, use thescalar list command. Thescalar list command for scalars is the analog of the list command for variables.

    To list the values ofall currently-defined scalars, enter either of the followingcommands:

    scal ar l i st _al lscal ar l i st

    To list only the values of the scalarsb0 andb1, enter the following command:scal ar l i st b0 b1

    Displaying the variance-covariance matrix of the coefficient estimates

    vce

    You can display thevariance-covariance matrix for the OLS coefficientestimates from the most recentregress command. This matrix contains the

    estimated variances of the OLS coefficient estimates and in the diagonal

    cells, and the estimated covariance of and in the off-diagonal cells. Enter

    in the Command window the following two commands:

    0 1

    0 1

    vcemat r i x l i st e( V)

    Examine the display. The format of the estimated variance-covariance matrix

    for the coefficient estimates and is as follows:0 1

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 11 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    12/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    =

    )(raV),(voC

    ),(voC)(raV

    )(raV),(voC

    ),(voC)(raV

    010

    101

    010

    011

    where the second equality reflects the fact the variance-covariance matrix is

    symmetric, which in turn follows from the fact that .

    Note that the following definitions are used:

    ),(voC),(voC 0110 =

    = the estimated variance of ;)(raV 0 0

    = the estimated variance of ;)(raV 1$1

    = the estimated covariance of and .),(voC),(voC 0110 = 0$1

    Saving the coefficient vector and variance-covariance matrix matrix

    To save the entire vector of OLS coefficient estimates and the associated variance-covariance matrix, you can use thematrix get or matrix command.

    Basic Syntax:

    matrix matname= get(internal_Stata_matrix_name)

    wherematnameis the user-supplied name given to the matrix or vector, andinternal_Stata_matrix_nameis the internal name that Statagives to the matrix orvector.

    1.The internal name thatStatagives to thevector of OLS coefficient estimatesis_b or e(b).

    2.The internal name thatStatagives to theestimated variance-covariance matrixisVCE or e(V).

    To save the vector of OLS coefficient estimatesand give it the namebvec,enter the following command:

    mat r i x bvec = get ( _b)

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 12 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    13/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    An alternative way to save the vector of OLS coefficient estimates is to use amatrix command and thee(b) matrix function. To save the OLS coefficientvector and give it the nameb, enter the following command:

    mat r i x b = e( b) Use the followingmatrix list commands to display the saved coefficient

    vectorsbvec andb (which are, of course, identical):

    mat r i x l i st bvecmat r i x l i st b

    To save the estimated variance-covariance matrix and give it the nameV1,enter the followingmatrix get command:

    mat r i x V1 = get ( VCE) Alternatively, a simplematrix command and thee(V) matrix function can be

    used to save the estimated variance-covariance matrix and give it the nameV2.Enter the followingmatrix command:

    mat r i x V2 = e( V) The followingmatrix list commands can be used to display the estimated

    variance-covariance matricesV1 andV2 (which are identical):

    mat r i x l i st V1mat r i x l i st V2

    Displaying and Saving Selected Regression Results e( )

    Stata temporarily stores selected results from the most recently executed regresscommand in thee( ) function. The contents of thee( ) function change each time a

    new regress command is executed.

    Let N denote the number of sample observations on which the last regresscommand was executed (here N =74), and K the total number of estimatedregression coefficients (K =2 for the simple regression model (1)). The followingscalars are saved ine( ) functions after each regress command is executed:

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 13 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    14/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    1. number of observations N2. explained sum of squares ESS3. degrees of freedom for ESS K14. residual sum of squares RSS5. degrees of freedom for RSS NK6. ANOVA F-statistic F[K1, NK]7. R-squared R28. adjusted R-squared 2R 9. root mean square error =

    Enter again theregress command for estimating PRE (1):

    r egr ess pr i ce wei ght

    To display all of the saved results for the most recent regress command, enterthe following command:

    er et ur n l i st

    Examine carefully the results of this command. It displays all the results thatStata temporarily saves from execution of aregress command.

    Todisplay(but not save) the current contents of individual scalar e( ) functionsfor the most recentregress command, enter the followingdisplay commands:

    di spl ay e( N)di spl ay e( mss)di spl ay e( df _m)di spl ay e( r ss)di spl ay e( df _r )di spl ay e( F)di spl ay e( r 2)di spl ay e( r 2_a)di spl ay e( r mse)

    Tosavethe current contents ofe( ) for the most recent regress command asnamed scalars, enter the following scalar commands:

    scal ar N = e( N)scal ar ESS = e( mss)

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 14 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    15/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    scal ar df ESS = e(df _m)scal ar RSS = e( r ss)scal ar df RSS = e( df _r )scal ar Fst at = e( F)scal ar Rsq = e( r 2)

    scal ar adj Rsq = e( r 2_a)scal ar si gma = e( r mse)

    To display the values of the scalars created by the foregoing commands, enterthe followingscalar list command:

    scal ar l i st N ESS df ESS RSS df RSS Fst at Rsq adj Rsq si gma

    You can also use thescalar command to save other results of theregresscommand. For example, enter the following commands to create and display

    some additional scalars for the sample regression equation obtained by OLSestimation of equation (1):

    scal ar K = df ESS + 1scal ar TSS = ESS + RSSscal ar df TSS = N - 1scal ar l i st N K TSS ESS RSS df TSS df ESS df RSS

    scal ar Rsq1 = ESS/ TSSscal ar Rsq2 = 1 - RSS/ TSSscal ar l i st Rsq Rsq1 Rsq2scal ar si gmasq = si gma 2scal ar si gmasq1 = RSS/ df RSSscal ar si gma1 = sqr t ( si gmasq1)scal ar l i st si gma si gma1 si gmasq si gmasq1

    You have just saved a lot of scalars. To display or listall of the currently-defined scalars, enter either of the following commands:

    scal ar l i stscal ar l i st _al l

    Compare the results of these two scalar commands; they should be identical.

    Now use thescalar drop command to drop some of the redundant scalars youhave created. Enter the command:

    scal ar drop Rsq2 si gmasq1 si gma1

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 15 of 16 pages

  • 7/28/2019 Abbott 351Tutorial3 f08tutorial

    16/16

    ECONOMICS 351* -- Stata 10Tutorial 3 M.G. Abbott

    Preparing to End Your StataSession

    Before you end your Stata session, you should do two things.

    First, if you wish to save the current data set (although it is entirely your choice,as you will not need it for future tutorials), use the following save command tosave the current data set asStata-format data set auto3.dta:

    save auto3 Second, close the .log file you have been recording. Enter the command:

    l og cl ose

    End Your StataSession -- exit

    You are now ready to conclude your Statasession.

    To end your Stata session, use theexit command. Enter the command:exi t or exi t , cl ear

    Cleaning Up and Clearing Out

    After returning to Windows, you should copy all the files you have used andcreated during your Statasession to your own portable electronic storage device.

    These files will be found in theStataworking directory, which is usually C:\dataon the computers in Dunning 350. There are two files you will want to be sure youtake with you: theStata-format data setauto3.dta; and theStata log file351tutorial3.log. Use the Windowscopy command to copy any files you want tokeep to your own portable electronic storage device (e.g., flash memory stick) in

    the E:-drive (or to a diskette in the A:-drive).

    Finally, as a courtesy to other users of the computing classroom, please delete allthe files you have used or created from theStataworking directory.

    ECON 351* -- Fall 2008: Stata 10Tutorial 3 Page 16 of 16 pages