Exercise L2, linear model

 

Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dat) we use was obtained from Wooldridge (2002). Here we study the determinants of wages. The same data were used by Rabe-Hesketh and Skrondal (2005, exercise 2.7).

 

Data description

Number of observations (rows): 4360

Number of variables (columns): 44

 

Description of the subset of variables we are going to use in this exercise:

 

nr= person identifier;

year=1980 to 1987

black=1 if respondent is black, 0 otherwise

exper=labour market experience (age-6-educ)

hisp=1 if respondent is Hispanic, 0 otherwise

poorhlth=1 if respondent has a health disability, 0 otherwise

married=1 if respondent is married, 0 otherwise

nrthcen=1 if respondent lives in the Northern Central part of the US, 0 otherwise

nrtheast=1 if respondent lives in the North East part of the US, 0 otherwsie

rur=1 if respondent lives in a rural area, 0 otherwise

south=1 if respondent lives in the South of the US, 0 otherwise

educ=years of schooling

union=1 if the respondent is a member of a trade union, 0 otherwise

lwage=log of hourly wage in US dollars

d8m=1 if the year is 198m, 0 otherwise, m=1,…,7

 

The first few lines and columns of the data look like (the data  set contains more variables than those listed above):

 

 

Start Sabre and specify transcript file:

 

out wagepan.log

 

data nr year agric black bus construc ent exper fin hisp poorhlth hours &

     manuf married min nrthcen nrtheast occ1 occ2 occ3 occ4 occ5 occ6 occ7 &

     occ8 occ9 per pro pub rur south educ tra trad union lwage d81 d82 d83 &

     d84 d85 d86 d87 expersq

read wagepan.dat

 

 

Suggested exercises:

 

 

(1) Estimate a linear model on lwage (log of hourly wage) without covariates

(2) Allow for the person identifier (nr) random effect, use mass 64. Is this random effect significant?

(3) Add the covariates (educ, black, hisp, exper, expersq, married, union, d8m, m=1,2,…,7). How does the magnitude of the person identifier random effects change?

(4) Create interaction effects between the year indicators (d81,…,d87) and educ, add these effects to the previous model, do the returns to education vary with year? What do the results show?

 

 

References

Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas

 

Vella, F., and Verbeek, M.,  (1998), Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men. Journal of Applied Econometrics, 13, 163-183.

 

Wooldridge, J. M., (2002), Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, MA