************************************************************
************************************************************
***														 ***
***		    Do-file for working with pairfam data   	 ***
***        	 		   WEIGHTING			 			 ***
***           	  ANCHOR DATA WAVES 1-11           		 ***
***                   Release 11.0			             ***
***	  													 ***
***					    May 2020	                 	 ***
***														 ***
***				 Author: Herzig							 ***
***														 ***			
************************************************************
************************************************************

/*
This do-file shows some examples for using weights. We leave the decision to use post-stratification weights to the user. 
We recommend using design weights if and only if the user analyzes the data pooled over the cohorts.
Hint: The examples focus on the pairfam main sample only! For the refreshment sample (from W11) two post-stratification weights are available.
For further information on weights please refer to the pairfam Data Manual, Release 11.0, section "Weigts" 

If you need further help please do not hesitate to contact:
support@pairfam.de
*/

***************************************************************************
***                     PRELIMINARIES                                   ***
***************************************************************************

clear all
set more off		// tells Stata not to pause for --more-- messages
set maxvar 15000	// increases maximal number of variables


global inpath "insert your datapath here"  // directory of original data
global oupath "insert your datapath here"  // working directory	


******************************************
***			 Using weights             ***
******************************************


***** I) Separate analyses by cohorts (post-stratification weight) *****


*** Pairfam 
cd "$inpath"
use id nkidsbioalv nkids psweight cohort relstat yeduc sex_gen using anchor1, clear  // load Anchor data wave 1
label language de	// use German labels

* Example: number of kids (without svy)
replace nkidsbioalv=nkids if nkidsbioalv==-7
mean nkidsbioalv, over(cohort)                     // unweighted
mean nkidsbioalv [pweight=psweight], over(cohort)  // weighted

* Example: number of kids (with svy)
* Use the svy-commands, if you want full flexibility
svyset [pweight=psweight]                          // Post-Stratification Weight pairfam sample
svy: mean nkidsbioalv, over(cohort)                // weighted (using svy)

* Example: relationship status
recode relstat -7=.
proportion relstat if cohort==3                	   // unweighted
svy, subpop(if cohort==3): proportion relstat  	   // weighted (exact case selection, df correct)
svy: proportion relstat if cohort==3           	   // weighted (sloppy case selection, df wrong)
svy, subpop(if cohort==3): tab relstat		   	   // weighted (if you need only proportions)

* Example: Regression on number of kids
gen woman = sex_gen==2
recode yeduc -7/0=.
reg nkidsbioalv woman yeduc if cohort==3                 //unweighted
svy, subpop(if cohort==3): reg nkidsbioalv woman yeduc   //weighted



*** DemoDiff
use id nkidsbioalv nkids caweight ca1weight cohort using anchor1_DD, clear  	// load Anchor data wave 1
label language de		// use German labels

* Example: number of kids (without svy)
replace nkidsbioalv=nkids if nkidsbioalv==-7
mean nkidsbioalv, over(cohort)                     // unweighted
mean nkidsbioalv [pweight=caweight], over(cohort)  // weighted



*** Combined sample pairfam & DemoDiff
use id nkidsbioalv nkids ca1weight cohort using anchor1, clear
quietly: append using anchor1_DD, keep(id nkidsbioalv nkids ca1weight cohort)

* Example: number of kids (without svy)
replace nkidsbioalv=nkids if nkidsbioalv==-7
mean nkidsbioalv, over(cohort)                      // unweighted
mean nkidsbioalv [pweight=ca1weight], over(cohort)  // weighted




***** II) Pooled over cohorts (design weight) *****

use id incnet yeduc age dxpsweight using anchor1, clear

* Example: Regression of net income on years of education and age
svyset [pweight=dxpsweight]        // combination of design- and post-stratification weight pairfam sample
recode incnet -7/0=.
reg incnet yeduc age               // unweighted
svy: reg incnet yeduc age	       // weighted

* Use ddcaweight for DemoDiff sample and d1ca1weight for combined sample pairfam and DemoDiff



***** III) Using weights for long-format data (combined longitudinal and cross-sectional weights) *****

*** Data Preparation: Extracting weight variables and pooling waves 

use id wave dxpsweight /*ddcaweight d1ca1weight*/ sample relstat age mardur sat6 using anchor1,clear                        		
quietly: for num 2/11: append using anchorX.dta, keep (id wave dxpsweight /*ddcaweight d1ca1weight*/ lweight sample relstat age mardur sat6)	
keep if sample == 1							// ONLY pairfam main sample (without DemoDiff and Refreshment)
	

* Computing panel weights
gen panelweight = dxpsweight if wave==1  	// Wave 1: Cross-sectional weight pairfam sample (use ddcaweight for DemoDiff sample and 
											// d1ca1weight for combined sample pairfam and DemoDiff)
												
bysort id (wave): replace panelweight = panelweight[_n-1] * lweight if wave>1 		// Waves 2-11: Multiplying lweight (wave t)
summ panelweight, det


* Some variables needed below
mvdecode _all, mv(-1/-11=.)    //Define missings

* Life Satisfaction
rename sat6 happy							// rename sat6 to happy
tab happy, missing

* Dummy for marriage (0=never-married 1=married 2=divorced, widowed) 
recode relstat 1/3=0 4/5=1 6/11=2, into(marry)
lab var marry "Marriage"


***  Sample Defintion             

* Exclude person-years with missing on the outcome and event variable 
drop if happy==. 
drop if marry==.

* Only persons who were never married when first observed 
bysort id (wave): gen pynr    = _n   		// person-year ID (within person)
gen     help=0
replace help=1 if marry>0 & pynr==1       
bysort id (wave): replace help = sum(help) 	// ==1 for all pys of those initially not unmarried
keep if help==0   
drop   help  

* All person-years after first marriage are excluded 
gen     help=0
replace help=1 if marry>1                  	// flag pys after first marriage
bysort id (wave): replace help=sum(help)   	// flag all following pys (could be a second marriage)
keep if help==0                            	// all pys after first marriage are dropped
drop   help pynr
bysort id (wave): gen pynr = _n 

* Restricting the estimation sample to those with at least 2 observations
bysort id:        gen pycount = _N   		// # of person-years (within person)
tab pycount if pynr==1               		// Length of the panels
keep if pycount>1


xtset id wave                        		// Information on panel data structure


* Example: Effect of marriage (event) on life satisfaction (outcome)
recode mardur .=0

* Fixed effects & areg regression without using weights
xtreg happy i.marry c.mardur##c.mardur age, fe vce(cluster id) 
est store FE1
areg  happy i.marry c.mardur##c.mardur age, absorb(id) vce(cluster id) 
est store AR1

* Areg regression using panel weights
areg  happy i.marry c.mardur##c.mardur age [pw=panelweight], absorb(id) vce(cluster id)
est store AR2 

estimates table FE1 AR1 AR2, b(%7.3f) se t stfmt(%6.0f) stats(N N_clust)