lecture 3
DESCRIPTION
lect2TRANSCRIPT
Introduction to Stata – Lecture 3: Panel Data
Hayley Fisher
1 March 2010
Key reference: Cameron and Trivedi (2009), chapter 8.
1 Data used in this lecture
This lecture uses data from the first four waves of the British Household Panel Survey (BHPS). I will makethe relevant source files temporarily available on my website but cannot host them there permanently.You can get the full set of files from the ESDS (http://www.esds.ac.uk/findingData/bhps.asp). If youwant to learn more about the BHPS and how to use it in Stata, I recommend the BHPS introductorycourses provided by the UK Longitudinal Studies Centre (ULSC) at the University of Essex – detailsand course materials are available at http://www.iser.essex.ac.uk/survey/bhps/courses. These noteshave been loosely based on parts of their course.
2 The British Household Panel Survey
The BHPS began in 1991 and has interviewed its initial sample, and additional household members,every year since then. 5,500 households were selected initially, with additional samples of Scotland,Wales and Northern Ireland added since. Currently 17 waves of the survey are available. For a fulldescription of the survey see Taylor, Brice, Buck and Prentice-Lane (2009).
Longitudinal datasets such as the BHPS are rarely provided in a format that is straightforward toread into Stata and start working with. The BHPS is available in a number of formats including Stata,but as a series of files containing different variables, split by year and different parts of the survey to givemanageable file sizes. I am using the ‘individual response’ and ‘household response’ files from the firstfour waves of the survey. A substantial part of this lecture will be devoted to putting together a panelfrom these datasets.
2.1 Assembling a cross section using individual and household data
Start your do-file to assemble your dataset by defining a macro for the folder in which the originaldatafiles are stored. This makes it easy to alter the folder referenced if necessary in future. Here I use aglobal macro:
global dir BHPS
To recall a global macro we prefix it with a $ sign – so here $dir.We could simply read in the entire first file in question (aindresp), but this is a large dataset
with many variables. Instead, we can load in just specific variables. We need to look at the code-book accompanying the dataset to choose the variables (alternatively look at the online codebooks athttp://www.iser.essex.ac.uk/survey/bhps/documentation/volume-b-codebooks). We read in the specificvariables by typing:
use ahid apno asex aage pid amastat ahgspn aqfachi afiyr afiyrl using $dir/aindresp
Note that all variables except pid have the prefix a. This is a convention in the BHPS data files – allfiles and variables associated with wave 1 have the prefix a, for wave 2 it is b and so on. Let’s describethe data to see what has been loaded here.
1
. describe
Contains data from BHPS/aindresp.dta
obs: 10,264
vars: 10
size: 348,976 (99.9% of memory free)
----------------------------------------------------------------------------
storage display value
variable name type format label variable label
----------------------------------------------------------------------------
ahid long %12.0g household identification number
apno byte %8.0g person number
asex byte %8.0g asex sex
pid long %12.0g cross-wave person identifier
amastat byte %8.0g amastat marital status
ahgspn byte %8.0g ahgspn pno of spouse/partner
aage byte %8.0g aage age at date of interview
aqfachi byte %8.0g aqfachi highest academic qualification
afiyrl double %10.0g afiyrl annual labour income (1.9.90-1.9.91)
afiyr double %10.0g afiyr annual income (1.9.90-1.9.91)
----------------------------------------------------------------------------
Sorted by:
Three variables here are vital for the construction of our panel dataset. ahid is a household identificationnumber which we will use to match data from the household file, and apno is a person identificationnumber within a given household. This can be used in combination with, for example, ahgspan tomatch couples together. pid is a cross-wave person identifier – it has no a prefix since it matches thesame variable in all waves – this connects people over time. We also have data on individuals’ sex, age,academic qualifications, labour income and total income.
We are going to merge in data from the household file, so we need to sort the individual data by thehousehold identification number and save it.
. sort ahid
. save aind, replace
Then we load data from the household response file – hhresp.dta.
use ahid atenure ahhsize ankids afihhyr using $dir/ahhresp
. describe
Contains data from BHPS/ahhresp.dta
obs: 5,511
vars: 5
size: 104,709 (99.9% of memory free)
--------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------
ahid long %12.0g household identification number
ahhsize byte %8.0g ahhsize number of persons in household
ankids byte %8.0g ankids number of children in household
atenure byte %8.0g atenure housing tenure
afihhyr double %10.0g afihhyr annual household income (1.9.90-1.9.91)
--------------------------------------------------------------------------------------
Sorted by:
This gives household ID numbers and data on household size, number of children, how housing is ownedand total household income. We see there are 5,511 observations (as opposed to 10,264 from the individualdataset). After sorting by household ID we can merge the two datasets together:
2
. merge ahid using aind
variable ahid does not uniquely identify observations in aind.dta
. tabulate _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 6 0.06 0.06
3 | 10,264 99.94 100.00
------------+-----------------------------------
Total | 10,270 100.00
. keep if _merge==3
(6 observations deleted)
. drop _merge
We see that there are 6 observations for which there is no individual data, just household data. I dropthese.
Another useful command for describing data is codebook. This produces a codebook based on whatis in Stata. I have extracted the codebook from my log file and posted it on my website for reference.
Having saved the dataset, we can do some analysis with this cross section. First, recode the missingvalues. Page A3-14 of Taylor et al. (2009) outlines the way missing values are handled in these datafiles– any negative values are in fact missing. We can recode these easily together using mvdecode:
. mvdecode _all, mv(-9/-1)
atenure: 17 missing values generated
ahgspn: 1 missing value generated
aqfachi: 371 missing values generated
afiyrl: 352 missing values generated
afiyr: 352 missing values generated
Here _all can be replaced with a list of variables. Having created the necessary variables, simple crosssection regressions can be performed. Here I show that the xi prefix can be used to create interactionterms as well as a series of category dummies.
. generate married=amastat==1
. generate age2=aage^2
. generate lwages=log(afiyrl)
(3937 missing values generated)
. xi: regress lwages aage age2 i.asex*i.married ankids i.aqfachi, vce(robust)
i.asex _Iasex_1-2 (naturally coded; _Iasex_1 omitted)
i.married _Imarried_0-1 (naturally coded; _Imarried_0 omitted)
i.asex*i.marr~d _IaseXmar_#_# (coded as above)
i.aqfachi _Iaqfachi_1-7 (naturally coded; _Iaqfachi_1 omitted)
Linear regression Number of obs = 6318
F( 12, 6305) = 246.80
Prob > F = 0.0000
R-squared = 0.3383
Root MSE = .92103
------------------------------------------------------------------------------
| Robust
lwages | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
aage | .1932962 .0072313 26.73 0.000 .1791205 .2074719
age2 | -.0023262 .0000903 -25.75 0.000 -.0025033 -.0021491
_Iasex_2 | -.3133317 .0411562 -7.61 0.000 -.3940119 -.2326515
_Imarried_1 | .4771106 .0390765 12.21 0.000 .4005075 .5537138
3
_IaseXmar_~1 | -.7551111 .0507519 -14.88 0.000 -.854602 -.6556201
ankids | -.2082833 .0148482 -14.03 0.000 -.2373909 -.1791756
_Iaqfachi_2 | -.1241103 .0967663 -1.28 0.200 -.3138051 .0655845
_Iaqfachi_3 | -.1621818 .0964851 -1.68 0.093 -.3513254 .0269619
_Iaqfachi_4 | -.3929866 .0912033 -4.31 0.000 -.5717762 -.2141971
_Iaqfachi_5 | -.5157779 .0899938 -5.73 0.000 -.6921963 -.3393595
_Iaqfachi_6 | -.5263538 .1001379 -5.26 0.000 -.7226582 -.3300494
_Iaqfachi_7 | -.7906337 .0903131 -8.75 0.000 -.967678 -.6135893
_cons | 5.916767 .1677032 35.28 0.000 5.588011 6.245522
------------------------------------------------------------------------------
Here _Iasex_2 gives the effect of being female, _Imarried_1 the effect of being married for men, and_IaseXmar_~1 the difference in the effect of being married on wages between women and men.
2.2 Assembling a two wave panel
Panel data can be in two formats – long or wide. Wide data stores each variable separately for eachwave, so only has one observation for each individual:
PID inc1 inc2 inc3 inc41 200 210 220 2502 600 660 700 7503 250 280 200 2104 150 190 250 300
Long data stores all observations of a variable, for example income, in the same variable, and has a wavevariable and multiple observations for each individual:
PID wave inc1 1 2001 2 2101 3 2201 4 2502 1 6002 2 6602 3 7002 4 750
To use the panel data features in Stata you need to have your data in long format. If your data is inwide format you can reconfigure it using the reshape command.
To put together a two wave panel in long format we need to extract the same data for wave 2 – withprefix b. This is done as above:
. use bhid bpno bsex bage pid bmastat bhgspn bqfachi bfiyr bfiyrl using $dir/bindresp
. sort bhid
. save bind, replace
file bind.dta saved
. use bhid btenure bhhsize bnkids bfihhyr using $dir/bhhresp
. sort bhid
. merge bhid using bind
variable bhid does not uniquely identify observations in bind.dta
. keep if _merge==3
(2 observations deleted)
. drop _merge
4
. save wave2, replace
file wave2.dta saved
There are two other steps to take to both the wave 1 and wave 2 datasets constructed here – to add awave variable (just using generate, and to remove the prefix which is done using the command renpfix.
. generate wave=2
. renpfix b
To combine the two files we use the append command.1
. use wave1
. generate wave=1
. renpfix a
. append using wave2
Listing the first 20 observations shows that we now have a long panel dataset with two time periods.
. sort pid wave
. list pid wave hid sex age mastat in 1/20, clean
pid wave hid sex age mastat
1. 10002251 1 1000209 female 91 never ma
2. 10004491 1 1000381 male 28 never ma
3. 10004491 2 2000148 male 29 never ma
4. 10004521 1 1000381 male 26 never ma
5. 10004521 2 2000148 male 27 never ma
6. 10007857 1 1000667 female 57 widowed
7. 10007857 2 2000296 female 59 widowed
8. 10014578 1 1001221 female 54 married
9. 10014578 2 2000369 female 55 married
10. 10014608 1 1001221 male 57 married
11. 10014608 2 2000369 male 58 married
12. 10016813 1 1001418 male 36 married
13. 10016813 2 2000504 male 37 married
14. 10016848 1 1001418 female 32 married
15. 10016848 2 2000504 female 33 married
16. 10017933 1 1001507 female 49 married
17. 10017933 2 2000717 female 49 married
18. 10017968 1 1001507 male 46 married
19. 10017968 2 2000717 male 46 married
20. 10019057 1 1001604 female 59 never ma
2.3 Creating a longer panel
When performing the same operations on several waves of data, we can write do-files more efficientlyusing the foreach and forvalues loop commands. Here I use foreach to perform the same commandsjust substituting the wave prefix each time.2 The command to extract all of the data is shown below.
foreach w in a b c d {
use ‘w’hid ‘w’pno ‘w’sex ‘w’age pid ‘w’mastat ‘w’hgspn ‘w’qfachi ‘w’fiyr ‘w’fiyrl ///
using $dir/‘w’indresp
sort ‘w’hid
save ‘w’ind, replace
1To create a wide panel we would not remove the prefixes and would use merge to combine the datasets.2The forvalues command would be used when you want to loop over numbers rather than letters or variables.
5
clear
use ‘w’hid ‘w’tenure ‘w’hhsize ‘w’nkids ‘w’fihhyr using $dir/‘w’hhresp
sort ‘w’hid
merge ‘w’hid using ‘w’ind
keep if _merge==3
drop _merge
renpfix ‘w’
generate wave = index("abcd","‘w’")
save wave‘w’, replace
}
We start the forvalues comman by defining what should be replaced in each iteration of the loop – inthis case w, and giving the list of values to substitute (a, b, c, d for the first 4 waves). We then writeout the code substituting ‘w’ where the prefix would normally be. This reproduces the steps we wentthrough above. The new function used here is index – this returns the position of ‘w’ in the list abcdand so generates the wave variable.
After this code has been run, we can use a similar loop to append the files together, and to deletethe files created in the process:
foreach w in a b c {
append using wave‘w’
}
compress
save BHPS, replace
foreach w in a b c d {
capture erase wave‘w’.dta
capture erase ‘w’ind.dta
}
Note that compress ensures that the data is being stored as efficiently as possible. Sorting and listingthe dataset produced shows:
. sort pid wave
. list pid wave hid sex age mastat in 1/20, clean
pid wave hid sex age mastat
1. 10002251 1 1000209 female 91 never ma
2. 10004491 1 1000381 male 28 never ma
3. 10004491 2 2000148 male 29 never ma
4. 10004521 1 1000381 male 26 never ma
5. 10004521 2 2000148 male 27 never ma
6. 10004521 3 3000192 male 28 never ma
7. 10007857 1 1000667 female 57 widowed
8. 10007857 2 2000296 female 59 widowed
9. 10007857 3 3000257 female 59 widowed
10. 10014578 1 1001221 female 54 married
11. 10014578 2 2000369 female 55 married
12. 10014578 3 3000389 female 56 married
13. 10014608 1 1001221 male 57 married
14. 10014608 2 2000369 male 58 married
15. 10014608 3 3000389 male 59 married
16. 10016813 1 1001418 male 36 married
17. 10016813 2 2000504 male 37 married
18. 10016813 3 3000508 male 37 married
6
19. 10016813 4 4000307 male 39 married
20. 10016848 1 1001418 female 32 married
This shows a panel in long format. Note that age mostly increases by one year between waves for eachindividual (the age variable here is age at interview date which can vary), whilst sex is constant.
In order to perform analysis exploiting the panel dimension of the dataset we must declare the data tobe a panel – we do this using xtset, and declaring the panel variable (here pid) and time variable (herewave). Note that the panel variable and time variable must together uniquely identify every observationin the dataset.
. xtset pid wave
panel variable: pid (unbalanced)
time variable: wave, 1 to 4, but with gaps
delta: 1 unit
This shows that we have an unbalanced panel – as seen in the list above we do not have an observationfor every person in every time period. Some useful commands to investigate panel data are xtdescribe,xtsum and xttrans:
. xtdescribe
pid: 10002251, 10004491, ..., 47737689 n = 12350
wave: 1, 2, ..., 4 T = 4
Delta(wave) = 1 unit
Span(wave) = 4 periods
(pid*wave uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 4 4 4 4
Freq. Percent Cum. | Pattern
---------------------------+---------
7643 61.89 61.89 | 1111
1009 8.17 70.06 | 1...
679 5.50 75.55 | 11..
596 4.83 80.38 | ...1
527 4.27 84.65 | 111.
458 3.71 88.36 | .111
418 3.38 91.74 | ..11
290 2.35 94.09 | .1..
197 1.60 95.68 | ..1.
533 4.32 100.00 | (other patterns)
---------------------------+---------
12350 100.00 | XXXX
xtdescribe gives information about the panel structure – we see that there are 12,350 individuals and4 time periods, and that 62% of people have observations in all four time periods.
. xtsum sex age nkids fiyr lwages mastat
Variable | Mean Std. Dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
sex overall | 1.531028 .4990427 1 2 | N = 39190
between | .4995023 1 2 | n = 12350
within | 0 1.531028 1.531028 | T-bar = 3.17328
| |
age overall | 44.00559 18.43386 15 97 | N = 39190
7
between | 18.93358 15 96.5 | n = 12350
within | 1.044435 32.33892 50.33892 | T-bar = 3.17328
| |
nkids overall | .5951008 .9762426 0 9 | N = 39190
between | .9375915 0 9 | n = 12350
within | .2487689 -3.154899 3.595101 | T-bar = 3.17328
| |
fiyr overall | 8939.016 8929.79 0 287481.8 | N = 37455
between | 8186.321 0 160891.5 | n = 11982
within | 3631.936 -70011.81 204307.5 | T-bar = 3.12594
| |
lwages overall | 8.849195 1.166257 -3.321733 12.56891 | N = 23609
between | 1.206297 -3.321733 11.61299 | n = 8323
within | .4933544 1.250891 14.18753 | T-bar = 2.8366
| |
mastat overall | 2.497844 2.031626 0 6 | N = 39185
between | 2.043197 0 6 | n = 12350
within | .5396737 -2.002156 6.247844 | T-bar = 3.17287
xtsum gives summary statistics and shows variation between individuals and within individuals – sowe see that sex does not vary within individuals, and that log wages vary more between individualsthan within individuals. We also see the total number of observations (N), number of individuals withobservations (n) and average number of time periods for each individual.
. xttrans married, freq
| Married=1
Married=1 | 0 1 | Total
-----------+----------------------+----------
0 | 10,669 492 | 11,161
| 95.59 4.41 | 100.00
-----------+----------------------+----------
1 | 367 15,312 | 15,679
| 2.34 97.66 | 100.00
-----------+----------------------+----------
Total | 11,036 15,804 | 26,840
| 41.12 58.88 | 100.00
xttrans gives an indication of whether there are transitions between groups for categorical variables –for example, we see here that 96% of unmarried individuals remain unmarried in the next period, and98% of married individuals remain married in the next period.
3 Regression using panel data
Having set up our dataset we can perform some regressions. As previously, I use a local macro to storemy list of independent variables:
local xlist "age age2 i.sex*i.married nkids i.qfachi"
We can estimate a pooled OLS regression using the regress command seen in the last lecture. Weshould use robust standard errors clustered by individual.
. xi: regress lwages ‘xlist’, vce(cluster pid)
i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted)
i.married _Imarried_0-1 (naturally coded; _Imarried_0 omitted)
i.sex*i.married _IsexXmar_#_# (coded as above)
8
i.qfachi _Iqfachi_1-7 (naturally coded; _Iqfachi_1 omitted)
Linear regression Number of obs = 23520
F( 12, 8285) = 409.27
Prob > F = 0.0000
R-squared = 0.3092
Root MSE = .96854
(Std. Err. adjusted for 8286 clusters in pid)
------------------------------------------------------------------------------
| Robust
lwages | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .1890312 .0055574 34.01 0.000 .1781373 .1999251
age2 | -.0022714 .0000708 -32.10 0.000 -.0024101 -.0021327
_Isex_2 | -.3258962 .0300076 -10.86 0.000 -.3847186 -.2670737
_Imarried_1 | .488806 .0286158 17.08 0.000 .4327118 .5449002
_IsexXmar_~1 | -.654709 .0376179 -17.40 0.000 -.7284494 -.5809685
nkids | -.2203951 .0115555 -19.07 0.000 -.2430467 -.1977435
_Iqfachi_2 | -.1272603 .077578 -1.64 0.101 -.2793325 .024812
_Iqfachi_3 | -.1781531 .0785097 -2.27 0.023 -.3320517 -.0242544
_Iqfachi_4 | -.4158935 .0734584 -5.66 0.000 -.5598903 -.2718966
_Iqfachi_5 | -.5313831 .0726375 -7.32 0.000 -.6737708 -.3889953
_Iqfachi_6 | -.617404 .080617 -7.66 0.000 -.7754335 -.4593746
_Iqfachi_7 | -.8375914 .073551 -11.39 0.000 -.9817697 -.693413
_cons | 6.046451 .1298038 46.58 0.000 5.792003 6.300899
------------------------------------------------------------------------------
Fixed and random effects regressions are both carried out using the xtreg command. In both cases weshould again get cluster robust standard errors. The default is for Stata to estimate random effects whenxtreg is used – you must specify the option “fe” to get fixed effects:
. xi: xtreg lwages ‘xlist’, fe vce(cluster pid)
i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted)
i.married _Imarried_0-1 (naturally coded; _Imarried_0 omitted)
i.sex*i.married _IsexXmar_#_# (coded as above)
i.qfachi _Iqfachi_1-7 (naturally coded; _Iqfachi_1 omitted)
Fixed-effects (within) regression Number of obs = 23520
Group variable: pid Number of groups = 8286
R-sq: within = 0.0451 Obs per group: min = 1
between = 0.1672 avg = 2.8
overall = 0.1293 max = 4
F(11,8285) = 38.35
corr(u_i, Xb) = -0.3782 Prob > F = 0.0000
(Std. Err. adjusted for 8286 clusters in pid)
------------------------------------------------------------------------------
| Robust
lwages | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .2902458 .0173775 16.70 0.000 .2561815 .3243102
age2 | -.0030951 .0002108 -14.68 0.000 -.0035083 -.0026819
_Isex_2 | (dropped)
9
_Imarried_1 | .0473639 .0421936 1.12 0.262 -.0353461 .1300739
_IsexXmar_~1 | -.0574721 .0682978 -0.84 0.400 -.1913529 .0764086
nkids | -.1347352 .0193899 -6.95 0.000 -.1727443 -.0967261
_Iqfachi_2 | .0239754 .2786775 0.09 0.931 -.5223023 .5702531
_Iqfachi_3 | -.2838254 .2629136 -1.08 0.280 -.7992019 .231551
_Iqfachi_4 | -.1524719 .2703457 -0.56 0.573 -.6824172 .3774734
_Iqfachi_5 | -.4847609 .2739852 -1.77 0.077 -1.021841 .0523187
_Iqfachi_6 | -.4576768 .3040223 -1.51 0.132 -1.053637 .138283
_Iqfachi_7 | -.3651604 .3077017 -1.19 0.235 -.9683329 .238012
_cons | 3.198992 .44452 7.20 0.000 2.327621 4.070362
-------------+----------------------------------------------------------------
sigma_u | 1.1636036
sigma_e | .59940523
rho | .79029066 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Here sex is dropped – this is because it is invariant over time. The lack of variance in qualifications andmarital status explains the imprecise coefficient estimates here.
. xi: xtreg lwages ‘xlist’, re vce(cluster pid)
i.sex _Isex_1-2 (naturally coded; _Isex_1 omitted)
i.married _Imarried_0-1 (naturally coded; _Imarried_0 omitted)
i.sex*i.married _IsexXmar_#_# (coded as above)
i.qfachi _Iqfachi_1-7 (naturally coded; _Iqfachi_1 omitted)
Random-effects GLS regression Number of obs = 23520
Group variable: pid Number of groups = 8286
R-sq: within = 0.0341 Obs per group: min = 1
between = 0.3457 avg = 2.8
overall = 0.3053 max = 4
Random effects u_i ~ Gaussian Wald chi2(12) = 4356.47
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 8286 clusters in pid)
------------------------------------------------------------------------------
| Robust
lwages | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .2133805 .0058073 36.74 0.000 .2019984 .2247626
age2 | -.0025222 .0000737 -34.22 0.000 -.0026667 -.0023778
_Isex_2 | -.4601064 .0312502 -14.72 0.000 -.5213557 -.3988571
_Imarried_1 | .341099 .0275778 12.37 0.000 .2870475 .3951504
_IsexXmar_~1 | -.4701444 .0385164 -12.21 0.000 -.5456351 -.3946536
nkids | -.201973 .0116565 -17.33 0.000 -.2248194 -.1791267
_Iqfachi_2 | -.0940869 .0991879 -0.95 0.343 -.2884916 .1003178
_Iqfachi_3 | -.1635236 .0960624 -1.70 0.089 -.3518024 .0247552
_Iqfachi_4 | -.3019628 .0911546 -3.31 0.001 -.4806225 -.1233031
_Iqfachi_5 | -.4841929 .0903848 -5.36 0.000 -.6613438 -.3070421
_Iqfachi_6 | -.524829 .0975616 -5.38 0.000 -.7160463 -.3336117
_Iqfachi_7 | -.785563 .0915491 -8.58 0.000 -.9649959 -.6061301
_cons | 5.485352 .1464953 37.44 0.000 5.198227 5.772478
-------------+----------------------------------------------------------------
sigma_u | .87706892
sigma_e | .59940523
10
rho | .68163491 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Stata reports the standard deviations of the error components estimated in sigma_u and sigma_e. Wealso see different R2 statistics for within and between variation. These can be tabulated if the estimateshave been stored.
. esttab POLS FE RE, b se stats(r2 r2_o r2_b r2_w)
------------------------------------------------------------
(1) (2) (3)
lwages lwages lwages
------------------------------------------------------------
age 0.189*** 0.290*** 0.213***
(0.00556) (0.0174) (0.00581)
age2 -0.00227*** -0.00310*** -0.00252***
(0.0000708) (0.000211) (0.0000737)
_Isex_2 -0.326*** 0 -0.460***
(0.0300) (0) (0.0313)
_Imarried_1 0.489*** 0.0474 0.341***
(0.0286) (0.0422) (0.0276)
_IsexXmar_~1 -0.655*** -0.0575 -0.470***
(0.0376) (0.0683) (0.0385)
nkids -0.220*** -0.135*** -0.202***
(0.0116) (0.0194) (0.0117)
_Iqfachi_2 -0.127 0.0240 -0.0941
(0.0776) (0.279) (0.0992)
_Iqfachi_3 -0.178* -0.284 -0.164
(0.0785) (0.263) (0.0961)
_Iqfachi_4 -0.416*** -0.152 -0.302***
(0.0735) (0.270) (0.0912)
_Iqfachi_5 -0.531*** -0.485 -0.484***
(0.0726) (0.274) (0.0904)
_Iqfachi_6 -0.617*** -0.458 -0.525***
(0.0806) (0.304) (0.0976)
_Iqfachi_7 -0.838*** -0.365 -0.786***
(0.0736) (0.308) (0.0915)
_cons 6.046*** 3.199*** 5.485***
(0.130) (0.445) (0.146)
------------------------------------------------------------
r2 0.309 0.0451
r2_o 0.129 0.305
r2_b 0.167 0.346
r2_w 0.0451 0.0341
11
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Estimation can also be easily implemented in first differences using the regress command and dif-ference operator “D.”. We do not need to generate variables in first differences. The option noconstant
is used so that Stata does not add a constant term (which would be differenced out). For example:
. regress D.(lwage age age2 female married nkids), vce(cluster pid) noconstant
Linear regression Number of obs = 14667
F( 4, 6146) = 65.00
Prob > F = 0.0000
R-squared = 0.0208
Root MSE = .73681
(Std. Err. adjusted for 6147 clusters in pid)
------------------------------------------------------------------------------
| Robust
D.lwages | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
D1. | .3179554 .0211961 15.00 0.000 .2764036 .3595071
age2 |
D1. | -.0034294 .0002499 -13.72 0.000 -.0039192 -.0029395
female |
D1. | (dropped)
married |
D1. | .0144043 .0390605 0.37 0.712 -.062168 .0909767
nkids |
D1. | -.0957519 .0221907 -4.31 0.000 -.1392534 -.0522504
------------------------------------------------------------------------------
3.1 Hausman test
Stata can easily perform a Hausman test – that is, a test of whether the individual effects are random.The null hypothesis is that both fixed and random effects are consistent, the alternative hypothesis isthat random effects is not consistent. We must first estimate the fixed and random effects models – andwithout robust standard errors. Then, the Hausman test is conducted using the hausman command.
. quietly xi: xtreg lwages ‘xlist’, fe
. estimates store FE1
. quietly xi: xtreg lwages ‘xlist’, re
. estimates store RE1
. hausman FE1 RE1, sigmamore
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| FE1 RE1 Difference S.E.
-------------+----------------------------------------------------------------
age | .2902458 .2133805 .0768653 .0125061
age2 | -.0030951 -.0025222 -.0005729 .0001535
_Imarried_1 | .0473639 .341099 -.293735 .0336022
_IsexXmar_~1 | -.0574721 -.4701444 .4126723 .0470245
nkids | -.1347352 -.201973 .0672378 .0120169
12
_Iqfachi_2 | .0239754 -.0940869 .1180623 .1387245
_Iqfachi_3 | -.2838254 -.1635236 -.1203018 .1676963
_Iqfachi_4 | -.1524719 -.3019628 .1494909 .1630687
_Iqfachi_5 | -.4847609 -.4841929 -.000568 .1697845
_Iqfachi_6 | -.4576768 -.524829 .0671522 .2006431
_Iqfachi_7 | -.3651604 -.785563 .4204026 .1869317
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg
Test: Ho: difference in coefficients not systematic
chi2(11) = (b-B)’[(V_b-V_B)^(-1)](b-B)
= 238.04
Prob>chi2 = 0.0000
Cameron and Trivedi (2009) recommend using the sigmamore option. Here we see the null hypothesisis clearly rejected with a p-value of 0.0000 so the random effects estimates are not consistent.
4 Creating variables to identify changes in variables
We may wish to create a variable which records whether a certain status has changed. For example,whether marital status has changed. Once data is declared to be a panel this is straightforward. Let’sfirst recode mastat so that it has just three categories:
recode mastat (0=.) (1/2=1) (3/5=2) (6=3), generate(ma)
Then to find changes we generate a new variable which incorporates the lagged value and current valueof ma:
generate mach=(10*L.ma)+ma
Having labelled the values we have a useful marital change variable. So we can see that there are 352instances of individuals going from never having been married to having a partner in this sample.
. tabulate mach
marital change | Freq. Percent Cum.
-----------------------------+-----------------------------------
stayed in couple | 16,849 63.94 63.94
partnership ended | 360 1.37 65.30
partnered -> never married! | 113 0.43 65.73
ex-partner -> partnership | 180 0.68 66.41
stayed ex-partner | 3,622 13.74 80.16
never married -> partnership | 352 1.34 81.49
never married -> ex-partner | 14 0.05 81.55
stayed never married | 4,863 18.45 100.00
-----------------------------+-----------------------------------
Total | 26,353 100.00
References
Cameron, A. Colin and Pravin K. Trivedi, Microeconometrics Using Stata, Texas: Stata Press,2009.
Taylor, Marcia Freed, John Brice, Nick Buck, and Elaine Prentice-Lane, “British HouseholdPanel Survey User Manual Volume A: Introduction, Technical Report and Appendices,” ISER,University of Essex, Colchester 2009.
13