************************************************************ ************************************************************ *** *** *** Do-file for working with pairfam data *** *** WEIGHTING *** *** ANCHOR DATA WAVES 1-11 *** *** Release 11.0 *** *** *** *** May 2020 *** *** *** *** Author: Herzig *** *** *** ************************************************************ ************************************************************ /* This do-file shows some examples for using weights. We leave the decision to use post-stratification weights to the user. We recommend using design weights if and only if the user analyzes the data pooled over the cohorts. Hint: The examples focus on the pairfam main sample only! For the refreshment sample (from W11) two post-stratification weights are available. For further information on weights please refer to the pairfam Data Manual, Release 11.0, section "Weigts" If you need further help please do not hesitate to contact: support@pairfam.de */ *************************************************************************** *** PRELIMINARIES *** *************************************************************************** clear all set more off // tells Stata not to pause for --more-- messages set maxvar 15000 // increases maximal number of variables global inpath "insert your datapath here" // directory of original data global oupath "insert your datapath here" // working directory ****************************************** *** Using weights *** ****************************************** ***** I) Separate analyses by cohorts (post-stratification weight) ***** *** Pairfam cd "$inpath" use id nkidsbioalv nkids psweight cohort relstat yeduc sex_gen using anchor1, clear // load Anchor data wave 1 label language de // use German labels * Example: number of kids (without svy) replace nkidsbioalv=nkids if nkidsbioalv==-7 mean nkidsbioalv, over(cohort) // unweighted mean nkidsbioalv [pweight=psweight], over(cohort) // weighted * Example: number of kids (with svy) * Use the svy-commands, if you want full flexibility svyset [pweight=psweight] // Post-Stratification Weight pairfam sample svy: mean nkidsbioalv, over(cohort) // weighted (using svy) * Example: relationship status recode relstat -7=. proportion relstat if cohort==3 // unweighted svy, subpop(if cohort==3): proportion relstat // weighted (exact case selection, df correct) svy: proportion relstat if cohort==3 // weighted (sloppy case selection, df wrong) svy, subpop(if cohort==3): tab relstat // weighted (if you need only proportions) * Example: Regression on number of kids gen woman = sex_gen==2 recode yeduc -7/0=. reg nkidsbioalv woman yeduc if cohort==3 //unweighted svy, subpop(if cohort==3): reg nkidsbioalv woman yeduc //weighted *** DemoDiff use id nkidsbioalv nkids caweight ca1weight cohort using anchor1_DD, clear // load Anchor data wave 1 label language de // use German labels * Example: number of kids (without svy) replace nkidsbioalv=nkids if nkidsbioalv==-7 mean nkidsbioalv, over(cohort) // unweighted mean nkidsbioalv [pweight=caweight], over(cohort) // weighted *** Combined sample pairfam & DemoDiff use id nkidsbioalv nkids ca1weight cohort using anchor1, clear quietly: append using anchor1_DD, keep(id nkidsbioalv nkids ca1weight cohort) * Example: number of kids (without svy) replace nkidsbioalv=nkids if nkidsbioalv==-7 mean nkidsbioalv, over(cohort) // unweighted mean nkidsbioalv [pweight=ca1weight], over(cohort) // weighted ***** II) Pooled over cohorts (design weight) ***** use id incnet yeduc age dxpsweight using anchor1, clear * Example: Regression of net income on years of education and age svyset [pweight=dxpsweight] // combination of design- and post-stratification weight pairfam sample recode incnet -7/0=. reg incnet yeduc age // unweighted svy: reg incnet yeduc age // weighted * Use ddcaweight for DemoDiff sample and d1ca1weight for combined sample pairfam and DemoDiff ***** III) Using weights for long-format data (combined longitudinal and cross-sectional weights) ***** *** Data Preparation: Extracting weight variables and pooling waves use id wave dxpsweight /*ddcaweight d1ca1weight*/ sample relstat age mardur sat6 using anchor1,clear quietly: for num 2/11: append using anchorX.dta, keep (id wave dxpsweight /*ddcaweight d1ca1weight*/ lweight sample relstat age mardur sat6) keep if sample == 1 // ONLY pairfam main sample (without DemoDiff and Refreshment) * Computing panel weights gen panelweight = dxpsweight if wave==1 // Wave 1: Cross-sectional weight pairfam sample (use ddcaweight for DemoDiff sample and // d1ca1weight for combined sample pairfam and DemoDiff) bysort id (wave): replace panelweight = panelweight[_n-1] * lweight if wave>1 // Waves 2-11: Multiplying lweight (wave t) summ panelweight, det * Some variables needed below mvdecode _all, mv(-1/-11=.) //Define missings * Life Satisfaction rename sat6 happy // rename sat6 to happy tab happy, missing * Dummy for marriage (0=never-married 1=married 2=divorced, widowed) recode relstat 1/3=0 4/5=1 6/11=2, into(marry) lab var marry "Marriage" *** Sample Defintion * Exclude person-years with missing on the outcome and event variable drop if happy==. drop if marry==. * Only persons who were never married when first observed bysort id (wave): gen pynr = _n // person-year ID (within person) gen help=0 replace help=1 if marry>0 & pynr==1 bysort id (wave): replace help = sum(help) // ==1 for all pys of those initially not unmarried keep if help==0 drop help * All person-years after first marriage are excluded gen help=0 replace help=1 if marry>1 // flag pys after first marriage bysort id (wave): replace help=sum(help) // flag all following pys (could be a second marriage) keep if help==0 // all pys after first marriage are dropped drop help pynr bysort id (wave): gen pynr = _n * Restricting the estimation sample to those with at least 2 observations bysort id: gen pycount = _N // # of person-years (within person) tab pycount if pynr==1 // Length of the panels keep if pycount>1 xtset id wave // Information on panel data structure * Example: Effect of marriage (event) on life satisfaction (outcome) recode mardur .=0 * Fixed effects & areg regression without using weights xtreg happy i.marry c.mardur##c.mardur age, fe vce(cluster id) est store FE1 areg happy i.marry c.mardur##c.mardur age, absorb(id) vce(cluster id) est store AR1 * Areg regression using panel weights areg happy i.marry c.mardur##c.mardur age [pw=panelweight], absorb(id) vce(cluster id) est store AR2 estimates table FE1 AR1 AR2, b(%7.3f) se t stfmt(%6.0f) stats(N N_clust)