regression method (basic level) regression method (basic level) jo z e sambt nta hands-on workshop...

15
Regression method Regression method (basic level) (basic level) Joze Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Upload: cecilia-oconnor

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

Example of the linear relation between two variables

TRANSCRIPT

Page 1: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Regression methodRegression method(basic level)(basic level)

Joze Sambt NTA Hands-On Workshop

Berkeley, CA January 14, 2009

Page 2: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Education and health care expenditures are usually reported at the household

level, but in NTA context everything has to be assigned to individuals

Why do we need a regression method?

Page 3: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Example of the linear relation between two variables

y = 0.4998x + 678.93R2 = 0.8269

0

500

1000

1500

2000

2500

0 1000 2000 3000

Income

Cons

umpt

ion

Page 4: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

The idea of the regression analysis

ikikiii xxxy ...22110

◦ The slope coefficient is about 0.50, suggesting that an increase in real income of 1 dollar is leading, on average, to an increase of about 50 cents in real consumption expenditure.

◦ Constant: about 679 dollars is the level of autonomous consumption (in the case that person receives no income, i.e. if the value of the independent variable is 0).

For example: iINCOMENCONSUMPTIO *4998.093.678'

Page 5: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Allocating health expenditure Private health expenditure of household j is regressed on

the number of household members in each age group x

To use broader age groups could be a good idea (because of degrees of freedom, small number of observations in some age groups). Don’t worry, your age profile will most likely not look like stairs because of that.

0

)()(x

jj xMxCFH

)(...)()( 90,1995,240,1 jjjj MMMCFH

Page 6: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

STATA code Grouping into 5-year age groups:

gen agegrp=agerecode agegrp (0/4=2.5) (5/9=7.5) (10/14=12.5) (15/19=17.5) … (90/max=90)

Calculating the number of individuals in each age group (by households):by hhid: egen p4=sum(agegrp==2.5)by hhid: egen p9=sum(agegrp==7.5)by hhid: egen p14=sum(agegrp==12.5)by hhid: egen p19=sum(agegrp==17.5)by hhid: egen p24=sum(agegrp==22.5)…by hhid: egen p90=sum(agegrp==90)

Page 7: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Household health expenditures are regressed on the number of individuals in each age group within a household (without an intercept),

reg cfhc p4 p9 p14 p19 p24 p29 p34 p39 p44 p49 p54 p59 p64 p69 p74 p79 p84 p89 p90

[w=weight], robust noconstant

… and coefficients are stored for a future use gen bp4=_b[p4]gen bp9=_b[p9]gen bp14=_b[p14]…gen bp90=_b[p90]

intercept supressed!

Page 8: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

However,

◦summing up obtained values for all members of the household results in different amount of health expenditures than reported in the survey (at the household level).

◦Therefore: we need further adjustment whereby we use only relative size of those coefficients between household members, i.e. we consider them as within household shares (weights).

Page 9: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

For example:Assume a household with three individuals: ◦ child, aged 6 years◦ mother, aged 33 years◦ father, aged 36 years

Let’s further assume that the obtained (from the regression) coefficient for the age group 5-9 years is 20, for the age group of 30-34 years it is 80, and for the age group of 35-39 years is 100. This would sum up to 200. However, in the survey household has reported 300 dollars for health expenditures. This means, we have to rescale those values, so they will match 300:

  Regression

coefficientsShare Rescaled

valuesShare

(remains the same)

Child 20 10%  30 10%Mother 80 40%  120 40%Father 100 50%  150 50%SUM 200    300  

Page 10: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Coefficients are assigned corresponding age groups (coefficients by age groups are multiplied with the number of household members by age groups):gen hp4=bp4*(agegrp==2.5)gen hp9=bp9*(agegrp==7.5)gen hp14=bp14*(agegrp==12.5)…gen hp90=bp90*(agegrp==90)

egen sum=rsum(hp4 hp9 hp14 hp19 hp24 hp29 hp34 hp39 hp44 hp49 hp54 hp59 hp64 hp69 hp74 hp79 hp84 hp89 hp90)

Sum of weights by households are calculated (i.e. household estimated expenditures on health) :by hhid:egen total=sum(sum)

STATA code

Page 11: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Relative shares of weights (of individuals) in total sum of household weights are used to

distribute reported health expenditures (CFHj) among household members

)()()(xMxxCFH

CFHj

jij

Page 12: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Relative share for each household member is calculated (by dividing individuals’ coefficients with a total sum of coefficients of all household members): gen rhp4=hp4/total

replace rhp4=0 if rhp4==.gen rhp9=hp9/totalreplace rhp9=0 if rhp9==.…gen rhp90=hp90/totalreplace rhp90=0 if rhp90==.

Finally, relative shares of household members of each household are multiplied with reported health expenditures of that household to obtain health expenditures by individuals:

gen th4=cfhc*rhp4gen th9=cfhc*rhp9…gen th90=cfhc*rhp90

… rewritten as STATA code:

Page 13: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

It has already been said during the workshop: if you have information…

about the subcategories of the expenditures (for example information on household members using inpatient care (IN) or out-patient (OUT) services), use that information:

about household members being enrolled (E) and non-enrolled (NE) into the educational process, use that information:

00

)()()()(x

jx

jj xOUTxxINxCFH

00

)()()()(x

jx

jj xNExxExCFE

Page 14: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

… or if you…have external profile of per capita utilization by age

(U) and number of household members by age (M), use that information:

have detailed data available (for example separately reported expenditures on primary, secondary and tertiary education level), use them: limit the analysis only to those age groups, for which the expenditures are relevant. This is especially relevant for education expenditures.

0

)()()(x

jj xMxUxCFH

Page 15: Regression method (basic level) Regression method (basic level) Jo z e Sambt NTA Hands-On Workshop Berkeley, CA January 14, 2009

Some final details

There is no constant term (i.e. estimated in homogeneous form), so the household consumption is fully allocated.

Constrain negative values of the coefficients to zero.

In the case of education – do not apply smoothing.

You might want to have two age groups (age 0 and age 1-4) instead of the 0-4 age group, to capture higher health expenditures in the first year of age, reported in countries where such detailed data is available.