econ686: panel data analysis€¦ · time series data. panel data combine the two, and panel data...

1

Upload: others

Post on 01-May-2020

22 views

Category:

Documents


8 download

TRANSCRIPT

Yang Zhenlin

[email protected]://www.mysmu.edu/faculty/zlyang/

ECON686: Panel Data AnalysisTerm II, 2019-2020

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1Chapter 1: Introduction

2

Concept of panel data or longitudinal dataPopular sources and examples of panel data

Benefits and limitations of panel data

Basic panel data models, and related important concepts.

Introduction to Matrix Algebra

Introduction to Econometrics

Introduction to Stata

An overview of the course

This chapter presents some basics for panel data analysis, and basics for using the popular software Stata, including

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

3

What is Panel Data?

Panel data refers to observations made on N units (individuals, households, firms, countries, etc.), over T points in time.

In economics and social sciences, this can be achieved by surveying a number of units, and following them over time.

If observations are made on N units at a fixed time point, we obtain a cross-section data; if observations are made on one unit over Ttime periods, we obtain a time series data.

Panel data combine the two, and panel data analysis represents a marriage of regression and time series analysis.

Panel data are usually observed at regular time intervals (monthly, yearly, etc.), and are balanced (all units are observed at all periods).

Panel data could be a short panel (many units over few time periods), a long panel (few units over many time periods), and a large panel (many units over many time periods).

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

4

Sources of Panel Data

In economics, short panels are synonymous with micro panels, and panels with small to moderate N are called macro panels. There are many open sources for construction of panel data.

The well-known sources for micro panel data include: Panel Study of Income Dynamics (PSID), by Institute of Social Research

at University of Michigan, http://psidonline.isr.umich.edu. National Longitudinal Survey (NLS), a set of surveys sponsored by the

Bureau of Labor Statistics, http://www.bls.gov/nls/home.htm. Current Population Survey (CPS), by Bureau of Census for the Bureau of

Labor Statistics, http://www.census.gov/cps. Living Standard Measurement Study (LSMS), by World Bank,

http://www.worldbank.org/LSMS. German Social-Economic Panel, http://www.diw.de/soep. Canadian Survey of Labor Income Dynamics (SLID), collected by

Statistics Canada, www.statcan.gc.ca.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

5

Sources of Panel Data Japanese Survey on Consumers (JPSC), www.kakeiken.or.jp. German Social-Economic Panel, http://www.diw.de/soep. Russian Longitudinal Monitoring Survey, 1992, by Carolina Population

Center, U. of North Carolina, http://www.cpc.unc.edu/projects/. Korea Labor and Income Panel Study,

http://www.kli.re.kr/klips/en/about/introduce.jsp.

See Sec.1.1 of Baltagi (2013) for details on sources of panel data.

The well-known macro panels include: Penn World Table (PWT), www.nber.org, 188 Countries, 1950-2004. World Bank, http://data.worldbank.org. International Monetary Fund (IMF), www.imf.org. United Nations, http://unstats.un.org/unsd/economic_main.htm. European Central Bank, www.ecb.int.

There are also many ready-for-use panel data sets. For example,

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

6

Example 1: Statewide Capital ProductivityThe data, from Munnell (1990), gives indicators related to public capital productivity for 48 US states observed over 17 years (1970-1986). It can be downloaded by clicking the link below:

http://people.stern.nyu.edu/wgreene/Econometrics/PanelDataEconometrics.htm

and then choosing “Panel Data Sets”. It has been extensively used for illustrating the applications of the regular panel data models, and recently the applications of spatial panel data models.

STATE YR P_CAP HWY WATER UTIL PC GSP EMP UNEMP

ALABAMA 1970 15032.67 7325.8 1655.68 6051.2 35793.8 28418 1010.5 4.7

ALABAMA 1971 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2

ALABAMA 1972 15972.41 7765.42 1764.75 6442.23 38670.3 31303 1072.3 4.7

ALABAMA 1973 16406.26 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9

ALABAMA 1974 16762.67 8025.52 1734.85 7002.29 42057.31 33749 1169.8 5.5

ALABAMA 1975 17316.26 8158.23 1752.27 7405.76 43971.71 33604 1155.4 7.7

ALABAMA 1976 17732.86 8228.19 1799.74 7704.93 50221.57 35764 1207 6.8

ALABAMA 1977 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2 7.4

ALABAMA 1978 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5 6.3...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

7

Example 1: Statewide Capital Productivity

Variables in the data file (productivity.csv) are:• STATE = state name• ST_ABB = state abbreviation• YR = year, 1970,...,1986• P_CAP = public capital• HWY = highway capital• WATER = water utility capital• UTIL = utility capital• PC = private capital• GSP = gross state product• EMP = employment• UNEMP = unemployment rate

See Baltagi (2005, p. 25) for the analysis of these data. The article on which the analysis is based is Munnell, A., "Why has Productivity Declined? Productivity and Public Investment," New England Economic Review, 1990, pp. 3-22. The data can also be downloaded from the website for Baltagi's text: https://www.wiley.com/legacy/wileychi/baltagi3e/

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

8

Example 2: Cigarette Demand This is another well known panel data that has been applied under various panel data model frameworks, non-spatial or spatial, fixed effects or random effects, static or dynamic. In particular, the demand equations for cigarettes for United States were estimated, based on a panel of 46 states over 30 time periods (1963-1992), given on the Wiley website for Baltagi (2005): https://www.wiley.com/legacy/wileychi/baltagi3e/.

Variables in the data file Cigar.txt are:(1) STATE = State abbreviation.(2) YR = YEAR.(3) Price per pack of cigarettes.(4) Population.(5) Population above the age of 16.(6) CPI = Consumer price index with (1983=100)(7) NDI = Per capita disposable income.(8) C = Cigarette sales in packs per capita.(9) PIMIN = Minimum price in adjoining states per pack of cigarettes.

Several time dummies corresponding to the major policy interventions in 1965, 1968 and 1971 can be added into the model.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

9

Example 3: Returns to Schooling DataThe Returns to Schooling Data with 595 Individuals and 7 Years, were analysed in Cornwell, C. and Rupert, P. (1988), "Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variable Estimators," Journal of Applied Econometrics, 3, pp. 149-155. See Baltagi (2005, Sec. 7.5) for further analysis. The data were downloaded from the same websites.Variables in the file cornwell&rupert.csv are

EXP = work experienceWKS = weeks workedOCC = occupation, 1 if blue collar, IND = 1 if manufacturing industrySOUTH = 1 if resides in southSMSA = 1 if resides in a city (SMSA)MS = 1 if marriedFEM = 1 if femaleUNION = 1 if wage set by union contractED = years of educationBLK = 1 if individual is blackLWAGE = log of wage

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

10

Why Should We Use Panel Data There are numerous benefits of using panel data:

Panel data enable us to control for individual heterogeneity; Panels give more informative data, more variability, less collinearity

among the variables, more degrees of freedom, and more efficiency; With panel data, one is better able to study dynamics of adjustment; They are more suitable for identifying and measuring effects that are

not detectable in pure cross-section and pure time-series data; They allow us to construct and test more complicated behavioural

models than do pure cross-section and pure time-series data; Micro panel data gathered on individuals, firms, and households can

be measured more accurately than similar variables measured at the macro level. Biases resulted from aggregation over time or individuals may be reduced or eliminated;

Macro panel data, on the other hand, have longer time series, and panel unit root tests have standard asymptotic distributions.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

11

Why Should We Use Panel Data Limitations of panel data include:

Design and data collection problems; Distortions of measurement errors; Selectivity problems:

• Self selection: • Nonresponse: • Attrition:

Short time series dimension; Cross-section dependence: macro panels on countries or regions

with long time series that do not account for cross-country dependence may lead to misleading inference.

See Sec.1.2 of Baltagi (2013) for details on the limitation of panel data.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

12

Basic Panel Data Models

A panel data regression differs from a regular cross-section or a time series regression in that it has a double subscripts, i.e.,𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛼𝛼 + 𝑋𝑋𝑖𝑖𝑖𝑖′ 𝛽𝛽 + 𝑢𝑢𝑖𝑖𝑖𝑖, i = 1, …, N; t = 1, …, T. (1.1)

where i represents individuals, households, firms, countries, etc., and t represents time, 𝛼𝛼 is a scalar parameter, 𝛽𝛽 is a K×1 vector of parameters, 𝑋𝑋𝑖𝑖𝑖𝑖 is the itth observation on K explanatory variables.

If the set of disturbances {𝑢𝑢𝑖𝑖𝑖𝑖} are independent, and identically distributed (iid), then Model (1.1) is not different from a regular multiple linear regression model.

If 𝑢𝑢𝑖𝑖𝑖𝑖 = 𝜇𝜇𝑖𝑖 + 𝑣𝑣𝑖𝑖𝑖𝑖, where 𝜇𝜇𝑖𝑖 denotes the unobservable individual-specific effects, and {𝑣𝑣𝑖𝑖𝑖𝑖} are the remainder disturbances (idiosyncratic errors) which are iid across i and t, then Model (1.1) is called the one-way error component model.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

13

Basic Panel Data Models If further 𝑢𝑢𝑖𝑖𝑖𝑖 = 𝜇𝜇𝑖𝑖 + 𝜆𝜆𝑖𝑖 +𝑣𝑣𝑖𝑖𝑖𝑖, where 𝜆𝜆𝑖𝑖 denotes the unobservable

time-specific effects, then Model (1.1) is called the two-way error component model.

The 𝜇𝜇𝑖𝑖 is time-invariant, representing unobserved heterogeneity in individuals (innate ability, motivation, etc.); the 𝜆𝜆𝑖𝑖 is individual-invariant, representing unobserved macro economic shock at time t.

The individual and time effects {𝜇𝜇𝑖𝑖} and {𝜆𝜆𝑖𝑖} could be correlated with the time-varying regressors 𝑋𝑋𝑖𝑖𝑖𝑖 in an arbitrary manner. If this is the case, {𝜇𝜇𝑖𝑖} and {𝜆𝜆𝑖𝑖} have to be treated as unknown parameters, giving rise to a panel data model called the fixed effects model;

Otherwise, if {𝜇𝜇𝑖𝑖} and {𝜆𝜆𝑖𝑖} are uncorrelated with 𝑋𝑋𝑖𝑖𝑖𝑖, then they are treated as iid random variables, giving the random effects model.

Thus, we could have (i) one-way fixed effects model, (ii) one-way random effects model, (iii) two-way fixed effects model, and (iv) two-way random effects model. The fixed vs random effects specification is an important issue in panel data modelling.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

14

A matrix is a rectangular array of numbers, denoted:

A = 𝑎𝑎𝑖𝑖𝑖𝑖 = 𝐀𝐀 𝑖𝑖𝑖𝑖 =

𝑎𝑎11 𝑎𝑎12𝑎𝑎21 𝑎𝑎22 ⋯

𝑎𝑎1𝐾𝐾𝑎𝑎2𝐾𝐾

⋮ ⋮ ⋱ ⋮𝑎𝑎𝑛𝑛1 𝑎𝑎𝑛𝑛2 ⋯ 𝑎𝑎𝑛𝑛𝐾𝐾

where aik denotes the element of A in ith row and kth column; the first index i always for row and the second index k for column..

Some basics on vector and matrix are necessary for the study of panel data models.

Introduction to Matrix Algebra

• A vector is an ordered set of numbers arranged either in a row or a column. Thus, a row vector is also a matrix with one row, and a column vector is a matrix with one column.

• A matrix can also be viewed as a set of column vectors or as a set of row vectors.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

15

Introduction to Matrix Algebra

• A symmetric matrix is one in which aik = aki for all i and k.• A diagonal matrix is a square matrix whose only nonzero elements

appear on the main diagonal, that is, moving from upper left to lower right.

• An identity matrix is a scalar matrix with ones on the diagonal. This matrix is always denoted Ik, a k×k identity matrix.

• A zero matrix or null matrix is one whose elements are all zero. • A triangular matrix is one that has only zeros either above or below the

main diagonal. If the zeros are above the diagonal, the matrix is lower triangular, otherwise the upper triangular.

The dimensions of a matrix are the numbers of rows and columns it contains. “A is an n× K matrix” (read “n by K”) will always mean that A has n rows and K columns. If n equals K, then A is a square matrix. Several particular types of square matrices occur frequently in econometrics.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

16

Introduction to Matrix AlgebraEquality of Matrices: Matrices (or vectors) A and B are equal if and only if they have the same dimensions and each element of A equals the corresponding element of B. That is,

A = B if and only if aik = bik for all i and k.

Transposition: The transpose of a matrix A, denoted A′, is obtained by creating the matrix whose kth row is the kth column of the original matrix. Thus, if B = A′, then the kth column of A will be the kth row of B. If A is n×K, then A′ is K×n. For a symmetric matrix A, n = K and A′ = A.

Matrix Addition: The operations of addition and subtraction are extended to matrices as:

A + B = [aik + bik]; A − B = [aik − bik]

Vector Multiplication: Matrices are multiplied by using the inner product. The inner product, or dot product, of two column vectors, a and b, is a scalar and is written as

a′b = a1b1 + a2b2 +· · ·+anbn. = b′a.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

17

Introduction to Matrix AlgebraMatrix MultiplicationFor an n×K matrix A and a K×M matrix B, the product matrix, C = AB, is an n×M matrix whose ikth element is the inner product of row i of A and column k of B. That is,

C = AB ⇒ cik = 𝒂𝒂𝑖𝑖′bk, where 𝒂𝒂𝑖𝑖′ is the ith row of A, and bk is the kth column of B.

Note: To multiply two matrices, the number of columns in the first must be the same as the number of rows in the second, so that they are conformable for multiplication.

Scalar MultiplicationScalar multiplication of a matrix is the operation of multiplying every element of the matrix by a given scalar. For scalar c and matrix A, cA = [caik].

Idempotent MatrixAn idempotent matrix, M, is one that is equal to its square: M2 = MM = M. If M is a symmetric idempotent matrix, then M′M = M.Note: all of the idempotent matrices we shall encounter are symmetric.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

18

Introduction to Matrix AlgebraDeterminant of a Square MatrixFor a 2×2 matrix, its determinant is, |𝐀𝐀| = 𝑎𝑎 𝑐𝑐

𝑏𝑏 𝑑𝑑 = 𝑎𝑎𝑑𝑑 − 𝑏𝑏𝑐𝑐.For a 3×3 matrix,

|𝐀𝐀| =𝑎𝑎11 𝑎𝑎12 𝑎𝑎13𝑎𝑎21 𝑎𝑎22 𝑎𝑎23𝑎𝑎31 𝑎𝑎32 𝑎𝑎33

= 𝑎𝑎11𝑎𝑎22𝑎𝑎33 + 𝑎𝑎31𝑎𝑎12𝑎𝑎23+ 𝑎𝑎21𝑎𝑎32𝑎𝑎13−𝑎𝑎31𝑎𝑎22𝑎𝑎13− 𝑎𝑎21𝑎𝑎12𝑎𝑎31 − 𝑎𝑎11𝑎𝑎32𝑎𝑎23

For a general square matrix, its determinant is calculated in a similar way.

Trace of a Square MatrixFor a square n×n matrix, its trace is, tr(𝐀𝐀) = ∑𝑖𝑖=1𝑛𝑛 𝑎𝑎𝑖𝑖𝑖𝑖.

Inverse of a Square MatrixTo solve the system Ax = b for x, if we could find a square matrix B such that BA = I, then the following would be obtained:

BAx = Bb ⇒ x = Bb.If the matrix B exists, then it is the inverse of A, denoted B = A−1.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

19

Introduction to Matrix AlgebraProperties of Transpose (A + B)′ = A′ + B′ (AB)′ = B′A′ (ABC)′ = C′B′A′

Properties of Trace tr(cA) = ctr(A), ▪ tr(IK) = K, tr(A′) = tr(A), ▪ tr(AB) = tr(BA), tr(A + B) = tr(A) + tr(B), ▪ a′a = tr(a′a) = tr(aa′).

Properties of Determinant |A| = |A′|, ▪ |cA| = ck|A|, c is a constant, k = dim(A), |AB| = |A||B|, ▪ If A has a zero row (column), |A| = 0.

Properties of Inverse |A−1| = |A| −1

(A−1)−1 = A (A−1)′ = (A′) −1

(AB) −1 = B−1A−1, if both inverse matrices exist.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

20

Introduction to Matrix AlgebraVector Space: Any set of vectors, closed under scalar multiplications and addition, is called a vector space.Linear Dependence: A set of vectors is linearly dependent if at least one of the vectors in the set can be written as a linear combination of the others.Linear Independence: A set of vectors is linearly independent if and only if the only solution to α1a1 + α2a2 + · · · + αKaK = 0 is

α1 = α2 = · · · = αK = 0.Basis Vectors: A set of vectors in a vector space is a basis for that vector space if they are linearly independent and any vector in the vector space can be written as a linear combination of that set of vectors.

Rank of A Matrix Column Space: the vector space spanned by its column vectors; Column Rank: the dimension of the vector space that is spanned by

its column vectors; Rank of a Matrix = its column rank = its row rank; Full Rank Matrix: column (row) rank = number of columns (rows).

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

21

Introduction to Matrix Algebra

Partitioned Matrices:

𝐀𝐀 = 𝐀𝐀𝟏𝟏𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟏𝟏 𝐀𝐀𝟐𝟐𝟐𝟐

Determinant of Partitioned Matrices:𝐀𝐀𝟏𝟏𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟏𝟏 𝐀𝐀𝟐𝟐𝟐𝟐

= 𝐀𝐀𝟐𝟐𝟐𝟐 𝐀𝐀𝟏𝟏𝟏𝟏 − 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏𝐀𝐀𝟐𝟐𝟏𝟏 = 𝐀𝐀𝟏𝟏𝟏𝟏 𝐀𝐀𝟐𝟐𝟐𝟐 − 𝐀𝐀𝟐𝟐𝟏𝟏𝐀𝐀𝟏𝟏𝟏𝟏−𝟏𝟏𝐀𝐀𝟏𝟏𝟐𝟐

Inverse of Partitioned Matrices: let 𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐 = 𝐀𝐀𝟏𝟏𝟏𝟏 − 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏𝐀𝐀𝟐𝟐𝟏𝟏,

𝐀𝐀𝟏𝟏𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟏𝟏 𝐀𝐀𝟐𝟐𝟐𝟐

−1=

𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐−𝟏𝟏 −𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐

−𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏

−𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏𝐀𝐀𝟐𝟐𝟏𝟏𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐−𝟏𝟏 𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏𝐀𝐀𝟐𝟐𝟏𝟏𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐

−𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏 + 𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏

A useful relation:

𝐀𝐀𝟐𝟐𝟐𝟐 − 𝐀𝐀𝟐𝟐𝟏𝟏𝐀𝐀𝟏𝟏𝟏𝟏−𝟏𝟏𝐀𝐀𝟏𝟏𝟐𝟐−1 = 𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏𝐀𝐀𝟐𝟐𝟏𝟏𝐀𝐀𝟏𝟏𝟏𝟏.𝟐𝟐

−𝟏𝟏 𝐀𝐀𝟏𝟏𝟐𝟐𝐀𝐀𝟐𝟐𝟐𝟐−𝟏𝟏.

See for more details, the Appendix A: Matrix Algebra,of William H. Greene, Econometric Analysis, 7th Ed.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

22

Introduction to EconometricsThe multiple linear regression model is used to study the relationship between a dependent variable and independent variables:

𝑦𝑦 = 𝛽𝛽0 + 𝑥𝑥1𝛽𝛽1 + 𝑥𝑥2𝛽𝛽2 + ⋯+ 𝑥𝑥𝐾𝐾𝛽𝛽𝐾𝐾 + 𝜀𝜀where y is the dependent or explained variable and x1, . . . , xK are the independent or explanatory variables. The y is called the regressand, and the x1, . . . , xK are called the regressors or covariates.• The term ε is a random disturbance, so named because it “disturbs”

an otherwise stable relationship. It is also called a “random error”. The model says: y relates to x1, . . . , xK linearly with a random error.

To estimate the model and to make inferences about the regression coefficients 𝛃𝛃 = (𝛽𝛽0,𝛽𝛽1, , . . . ,𝛽𝛽𝐾𝐾)′, we collect n observations (the data):

𝐲𝐲 =

𝑦𝑦1𝑦𝑦2⋮𝑦𝑦𝑛𝑛

; 𝐗𝐗 =

1 𝑥𝑥11 𝑥𝑥12 ⋯ 𝑥𝑥1𝐾𝐾1 𝑥𝑥21 𝑥𝑥22 ⋯ 𝑥𝑥2𝐾𝐾⋮ ⋮ ⋮ ⋱ ⋮1 𝑥𝑥𝑛𝑛1 𝑥𝑥𝑛𝑛2 ⋯ 𝑥𝑥𝑛𝑛𝐾𝐾

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

23

Introduction to EconometricsThe model is written for each observation:

𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝑥𝑥𝑖𝑖1𝛽𝛽1 + 𝑥𝑥𝑖𝑖2𝛽𝛽2 + ⋯+ 𝑥𝑥𝑖𝑖𝐾𝐾𝛽𝛽𝐾𝐾 + 𝜀𝜀𝑖𝑖 = 𝐱𝐱𝑖𝑖′𝛃𝛃 + 𝜀𝜀𝑖𝑖where 𝐱𝐱𝑖𝑖′ = 1, 𝑥𝑥𝑖𝑖1, 𝑥𝑥𝑖𝑖2,⋯ , 𝑥𝑥𝑖𝑖𝐾𝐾 , or in matrix form:

𝐲𝐲 = 𝐗𝐗𝛃𝛃 + 𝛆𝛆,where 𝛆𝛆 = (𝜀𝜀1, 𝜀𝜀2,⋯ , 𝜀𝜀𝑛𝑛)′.

The least squares estimation: choose value for 𝛃𝛃 such that the sum of squares of the ‘errors’ is minimized:

�𝛃𝛃 minimizes 𝑆𝑆 𝛃𝛃 = ∑𝑖𝑖=1𝑛𝑛 𝑦𝑦𝑖𝑖 − 𝐱𝐱𝑖𝑖′𝛃𝛃 2 = (𝐲𝐲 − 𝐗𝐗𝛃𝛃)′(𝐲𝐲 − 𝐗𝐗𝛃𝛃).

The necessary condition for a minimum is

𝜕𝜕𝑆𝑆 𝛃𝛃𝜕𝜕𝛃𝛃 = −2𝐗𝐗′𝐲𝐲 + 2𝐗𝐗′𝐗𝐗𝛃𝛃 = 𝟎𝟎.

Therefore, the least squares estimator of 𝛃𝛃 is:�𝛃𝛃 = (𝐗𝐗′𝐗𝐗)−𝟏𝟏𝐗𝐗′𝐲𝐲.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

24

Introduction to EconometricsNote: the existence of (𝐗𝐗′𝐗𝐗)−𝟏𝟏 requires X has full column rank (K+1), i.e., the columns of X are linearly independent.

Assumption: (i) X has full column rank, (ii) E(𝜀𝜀𝑖𝑖 𝐗𝐗 = 0, and (iii) Var(𝜀𝜀𝑖𝑖 𝐗𝐗 = 𝜎𝜎2, for all i = 1, 2, …, n.

Under Assumptions (i)-(iii), we have

𝐄𝐄 �𝛃𝛃 𝐗𝐗 = 𝐄𝐄[ 𝐗𝐗′𝐗𝐗 −𝟏𝟏𝐗𝐗′𝐲𝐲 𝐗𝐗

= 𝐗𝐗′𝐗𝐗 −𝟏𝟏𝐗𝐗′𝐄𝐄(𝐲𝐲|𝐗𝐗) = 𝐗𝐗′𝐗𝐗 −𝟏𝟏𝐗𝐗′𝐗𝐗𝛃𝛃 = 𝛃𝛃;

𝐕𝐕𝐕𝐕𝐕𝐕 �𝛃𝛃 𝐗𝐗 = 𝐕𝐕𝐕𝐕𝐕𝐕[ 𝐗𝐗′𝐗𝐗 −𝟏𝟏𝐗𝐗′𝐲𝐲 𝐗𝐗

= 𝐗𝐗′𝐗𝐗 −𝟏𝟏𝐗𝐗′𝐕𝐕𝐕𝐕𝐕𝐕 𝐲𝐲 𝐗𝐗 𝐗𝐗 𝐗𝐗′𝐗𝐗 −𝟏𝟏 = 𝜎𝜎2 𝐗𝐗′𝐗𝐗 −𝟏𝟏

If further, (iv) 𝛆𝛆| 𝐗𝐗 ~ 𝐍𝐍(𝟎𝟎,𝝈𝝈𝟐𝟐𝐈𝐈𝒏𝒏), then �𝛃𝛃 | 𝐗𝐗 ~ 𝑵𝑵 𝛃𝛃, 𝜎𝜎2 𝐗𝐗′𝐗𝐗 −𝟏𝟏 .

An unbiased estimator of 𝜎𝜎2 is, 𝑠𝑠2 = 1𝑛𝑛−𝐾𝐾−1

(𝐲𝐲 − 𝐗𝐗�𝛃𝛃)′(𝐲𝐲 − 𝐗𝐗�𝛃𝛃).

See Chapters 2 & 3 of William H. Greene, Econometric Analysis, 7th Ed.

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

25

For new Stata users, we suggest entering Stata by clicking on the Stata icon, opening one of the Stata example data sets, and doing some basic statistical analysis. To use the menus: Select File > Example datasets... . Click on Example datasets installed with Stata. Click on describe for auto.dta, #for descriptions of variables. Click on use for auto.dta, #to read the dataset into Stata.

We are using Stata/SE 15 for Windows for this course. Other software such as R and Matlab can be used, but on your own.

We can get a quick glimpse at the data by browsing them in the Data Editor. • This can be done by clicking on the Data Editor (Browse) button, • or by selecting Data > Data Editor > Data Editor (Browse) from the

menus, • or by simply typing the command browse in the command window.

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

26

⋅ sysdescribe auto.dta

Contains data 1978 Automobile Dataobs: 74 13 Apr 2016 17:45vars: 12 size: 3,478

-----------------------------------------------------------------------storage display value

variable name type format label variable label-----------------------------------------------------------------------make str18 %-18s Make and Modelprice int %8.0gc Pricempg int %8.0g Mileage (mpg)rep78 int %8.0g Repair Record 1978headroom float %6.1f Headroom (in.)trunk int %8.0g Trunk space (cu. ft.)weight int %8.0gc Weight (lbs.)length int %8.0g Length (in.)turn int %8.0g Turn Circle (ft.)displacement int %8.0g Displacement (cu. in.)gear_ratio float %6.2f Gear Ratioforeign byte %8.0g origin Car type-----------------------------------------------------------------------Sorted by: foreign

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

27

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

28

The auto.dta is a cross-section data, and the key Stata commands for analyzing cross-section data are • browse: to see the data,• describe: describing the data,• summarize: summarizing the cross-section data,• regress: performing linear regression of a response variable on

a set of explanatory variables.

For more information, see Stata manual gsw.pdf:• click Help > PDF documentation > [GS] Getting started >

[GSW] Getting started with Stata for Windows, • or go folder: Program Files (x86) in C: drive, • locate folder Stata15 > docs, and find file gsw.pdf.

The most useful manual is [U] User’s Guide, or the file u.pdf.

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

29

⋅ describe

Contains data from C:\Program Files (x86)\Stata15\ado\base/a/auto.dtaobs: 74 1978 Automobile Datavars: 12 13 Apr 2016 17:45size: 3,182 (_dta has notes)

-------------------------------------------------------------------------------storage display value

variable name type format label variable label-------------------------------------------------------------------------------make str18 %-18s Make and Modelprice int %8.0gc Pricempg int %8.0g Mileage (mpg)rep78 int %8.0g Repair Record 1978headroom float %6.1f Headroom (in.)trunk int %8.0g Trunk space (cu. ft.)weight int %8.0gc Weight (lbs.)length int %8.0g Length (in.)turn int %8.0g Turn Circle (ft.)displacement int %8.0g Displacement (cu. in.)gear_ratio float %6.2f Gear Ratioforeign byte %8.0g origin Car type-------------------------------------------------------------------------------Sorted by: foreign

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

30

⋅ summarize price mpg headroom trunk weight length turn gear_ratio

Variable | Obs Mean Std. Dev. Min Max-------------+---------------------------------------------------------

price | 74 6165.257 2949.496 3291 15906mpg | 74 21.2973 5.785503 12 41

headroom | 74 2.993243 .8459948 1.5 5trunk | 74 13.75676 4.277404 5 23

weight | 74 3019.459 777.1936 1760 4840-------------+---------------------------------------------------------

length | 74 187.9324 22.26634 142 233turn | 74 39.64865 4.399354 31 51

gear_ratio | 74 3.014865 .4562871 2.19 3.89

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

31

⋅ regress price mpg headroom trunk weight length displacement foreign

Source | SS df MS Number of obs = 74-------------+---------------------------------- F(7, 66) = 13.25

Model | 371020030 7 53002861.4 Prob > F = 0.0000Residual | 264045367 66 4000687.37 R-squared = 0.5842

-------------+---------------------------------- Adj R-squared = 0.5401Total | 635065396 73 8699525.97 Root MSE = 2000.2

------------------------------------------------------------------------------price | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------mpg | -15.34676 70.80743 -0.22 0.829 -156.7183 126.0248

headroom | -673.1991 372.6962 -1.81 0.075 -1417.311 70.91289trunk | 60.50331 91.91919 0.66 0.513 -123.0193 244.0259

weight | 4.596442 1.197555 3.84 0.000 2.205446 6.987438length | -83.39048 35.7935 -2.33 0.023 -154.8545 -11.92645

displacement | 10.0928 5.946751 1.70 0.094 -1.780274 21.96587foreign | 3764.15 664.6912 5.66 0.000 2437.051 5091.249

_cons | 6357.472 5315.07 1.20 0.236 -4254.406 16969.35------------------------------------------------------------------------------

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

32

For analyzing panel data,

• click Help > PDF documentation > [XT] Longitudinal Data/Panel Data,

• or go folder: Program Files (x86) in C: drive, locate folder Stata15 > docs, and find file xt.pdf.

• All Stata manuals can be found in Help > PDF documentation. Or directly go to the folder where Stata is installed, typically, C: > Program Files (x86) > Stata15 > docs.

The manual xt.pdf documents the xt commands and is referred to as [XT] in cross-references. If you are new to xt commands, we recommend that you read the following sections first:

• [XT] xt Introduction to xt commands• [XT] xtset Declare a dataset to be panel data• [XT] xtreg Fixed-, between-, and random-effects, and population-

averaged linear models.

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

33

Setupxtset Declare data to be panel data

Data management and exploration toolsxtdescribe Describe pattern of xt dataxtsum Summarize xt dataxttab Tabulate xt dataxtdata Faster specification searches with xt dataxtline Panel-data line plots

Linear panel regression estimatorsxtreg Fixed-, between-, and random-effects, and population-averaged

linear modelsxtregar Fixed- and random-effects models with an AR(1) disturbancextgls Fit panel-data models by using GLSxtpcse Linear regression with panel-corrected standard errorsxthtaylor Hausman–Taylor estimator for error-components modelsxtivreg Instrumental variables and two-stage least squares

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

34

Other Useful Commands: help: find information for a Stata command, e.g., at the command window,

type “help regress”, “help function”; search: it does a keyword search, and is useful if the Stata command is not

exactly known, e.g., “search ols”; findit: it provides broadest possible keywords search, “findit weak instr”; hsearch: Unlike the findit command, it uses a whole word search, e.g.,

“hsearch weak instrument”.

Arithmetic, relational, and logical operators: The arithmetic operators in Stata are: + (addition), − (subtraction), *

(multiplication), / (division), ^ (raised to a power), and – (negation). For example, to compute and display: −2 × {9/(8 + 2 − 7)}2, we type in the command window: display -2*(9/(8+2-7))^2, resulting

. display -2*(9/(8+2-7))^2-18

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

35

Matrix and Matrix Calculations:The Stata manual [P] matrix, or p.pdf, summarizes the matrix commands.

• matrix define: defining a matrix, e.g., “matrix define A = (1,2,3 \4,5,6)”• matrix list: showing the content of matrix A• Scalar c = A[2, 3]: assigning the (2,3)-element of A to a scalar c.

Matrix Dyatic Operators:• B \ C add rows of C below rows of B (row join)• B , C add columns of C to the right of B (column join)• B + C addition• B - C subtraction• B * C multiplication (including mult. by scalar)• B / z division by scalar• B # C Kronecker product

The matrix monadic operators are-B negationB' transpose

Type in the command window:help matrixto get more information on matrix manipulations.

Introduction to STATA

ECON686, Term II 2019-20 © Zhenlin Yang, SMU

Chapter 1Chapter 1

36

Heteroskedasticity and Serial Correlation

Dynamic Panel Data Models

Course Outline

In this course, we focus on the common panel data models, and their implementations using Stata. We also provide many real data applications. Major topics include:

Panel Data Models with Two-way Effects

Test Hypotheses with Panel Data

Panel Data Models with One-way Effects

Spatial Panel Data Models

Course Overview