************************************************************ ************************************************************ *** *** *** Do-file for working with pairfam data *** *** MERGING and DATA FORMAT TRANSFORMATION *** *** Release 13.0 *** *** *** *** May 2022 *** *** *** *** Authors: Daniel Fuß, Sabine Keller, Nina Schumann *** *** *** ************************************************************ ************************************************************ * This do-file introduces the most common operations for * merging the separate pairfam datasets to combined multi- * waves and/or multi-actor datasets. It provides a number * of commented examples that have to be adapted to your * specific problem. * Attention for wave 12 & 13: Before running the dofile please append the datasets * anchor12_capi.dta (anchor13_capi.dta) and anchor12_cati.dta (anchor13_cati.dta) * and name the combined dataset anchor12.dta (anchor13.dta)! For more information * on the mode change in W12/W13, please see the Data Manual, Chapter 12. ************************************************************ *** *** *** Preparing the merging process *** *** *** ************************************************************ clear all set more off // tells Stata not to pause for --more-- messages set maxvar 10000 // increases maximal number of variables * Definition of a macro for the datapath with pairfam datasets global inpath `""insert your datapath here""' // directory of original data global outpath `""insert your datapath here""' // working directory * Wave definition global wave = 13 * NOTE: A Stata problem might arise with the command "use varlist using filename" * for users of the English version: If the data set was saved with German labels * as the main labels, the English labels will be lost when opening the data with that command * Quick fix: Open each data set separately, change to English labels * with the command "label language en" and save the data. /* Overview of combinations of datasets and their corresponding examples: anchor | partner | parent | child | parenting wide long | wide long | wide long | wide long | wide long ------------------------------------------------------------------------ anchor 1.1 1.2 | 2.1 2.2 | 2.3 2.4 | 2.3 2.4 | 2.3 2.4 ------------------------------------------------------------------------ partner | 1.1 1.2 | 2.3 2.4 | 2.3 2.4 | 2.3 2.4 ------------------------------------------------------------------------ parent | | 1.1 1.2 | 2.5 2.6 | 2.5 2.6 ------------------------------------------------------------------------ child | | | 1.1 1.2 | 2.5 2.6 ------------------------------------------------------------------------ parenting | | | | 1.3 1.2 For instance, if you want to combine anchor and child data in wide format, see example 2.3. NOTES: - Format wide/long: Please note that this either refers to the combination of waves or actors. - Waves: The datasets child and parenting are only available since wave 2, the dataset parent is available for waves 2 to 8. - ID: Note that the data structures differ by actor. Whereas the anchor datasets only contain one identifying variable (id), the child datasets can contain various numbers of children identifiers (cid) per anchor, depending on the anchor's number of children. For more information, see the Data Manual, Chapter 2. */ ************************************************************ *** *** *** 1.1.1 Same actor - two waves *** *** ! WIDE-Format ! *** *** *** *** Example: anchor (wave 1 & wave 2) *** *** *** ************************************************************ * Use anchor1 and rename variables to var_1 cd $inpath use id wave nkids casmin egp using anchor1, clear // load specified variables of anchor data wave 1 foreach x in wave nkids casmin egp { // x specifies the variables which need to be renamed (NOT id) rename `x' `x'_1 // add wave suffix "_1" } cd $outpath save anchor_1, replace // save temporal data set * Use anchor2 and rename variables to var_2 cd $inpath use id wave nkids casmin egp using anchor2, clear // procedure analogous to anchor1 foreach x in wave nkids casmin egp { rename `x' `x'_2 } cd $outpath save anchor_2, replace * Merge prepared datasets (one line = one anchor) use anchor_1, clear // load anchor data wave 1 with renamed variables help merge // get detailed information about the merge command merge 1:1 id using anchor_2 // merge anchor data wave 1 to anchor data wave 2 by common key variable id // option ", keepusing (varlist)" specifies the variables you want to keep from the second dataset keep if _merge==3 // omit if you need an unbalanced dataset /* balanced dataset (keep only anchors that participated in wave 1 AND wave 2): keep if _merge==3 unbalanced dataset (keep all anchors): default option, no additional action is required */ drop _merge // optional: delete automatically generated variable _merge erase "anchor_1.dta" // delete temporal dataset erase "anchor_2.dta" // delete temporal dataset describe, short // describe merged dataset save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 1.1.2 Same actor - several waves *** *** ! WIDE-Format ! *** *** *** *** Example: partner (all waves) *** *** *** ************************************************************ * The procedure of 1.1.1 has to be repeated several times. A loop helps * to make the procedure more efficient. * Use partner 1, append partner1_DD and rename variables to var_1 cd $inpath use partner1, clear append using partner1_DD, nolabel // append DemoDiff cases if you decide to combine partner1 and partner1_DD keep pid psex psat1* // only keep specified variables foreach y in psex psat1* { // y specifies the variables which need to be renamed rename `y' `y'_1 // add the wave suffix to the variable name } cd $outpath save partner_1, replace // save the temporal data set * Use all partner data of all subsequent waves and rename variables forvalues x=2/$wave { // x specifies the waves which need to be merged cd $inpath use pid psex psat1* using partner`x', clear // load partner data from wave `x' with specified variables foreach y in psex psat1* { // y specifies the variables which need to be renamed rename `y' `y'_`x' // add the wave suffix to the variable name } cd $outpath save partner_`x', replace // save the temporal data set } * Merge prepared datasets (one line = one partner) cd $outpath use partner_1, clear // load partner data wave 1 with renamed variables help merge // get detailed information about the merge command forvalues x=2/$wave { // x specifies the waves which need to be added merge 1:1 pid using partner_`x' // merge specified partner variables of wave 2 to current wave to partner data wave 1 by key variable pid drop _merge // optional: delete automatically generated variable _merge erase "partner_`x'.dta" // delete temporal dataset } erase "partner_1.dta" // delete temporal dataset sort pid // optional: sort the data describe, short // describe merged dataset save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 1.2 Same actor - several waves *** *** ! LONG-Format ! *** *** *** *** Example: anchor (all waves) *** *** *** ************************************************************ * Use anchor1 and append DemoDiff of wave 1 cd $inpath use id wave cohort sex_gen relstat casmin using anchor1, clear // load anchor data wave 1 help append // get detailed information about the append command append using anchor1_DD, nolabel keep(id wave cohort sex_gen relstat casmin) // append anchor data wave 1, DemoDiff subsample * Append all subsequent waves (one line: one anchor-wave) forvalues x = 2/$wave { append using anchor`x', /// keep(id wave cohort sex_gen relstat casmin) // append data from anchor persons wave 2 to current wave } sort id wave // optional: sort data by id and wave describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 1.3 Parenting - several waves *** *** ! WIDE-Format ! *** *** *** *** Example: all waves since wave 2 *** *** *** ************************************************************ * Use parenting data from all waves since W2 and rename variables forvalues x = 2/$wave { // x specifies the waves which need to be merged cd $inpath use cid dropoffvers sdqpi* using parenting`x', clear // load parenting data from wave `x' with specified variables foreach y in sdqpi* { // y specifies the variables which need to be renamed rename `y' `y'_`x' // add the wave suffix to the variable name } cd $outpath save parenting_`x', replace // save the temporal data set } * Merge prepared datasets (one line: one parent) cd $outpath use parenting_2, clear // load parenting data wave 2 help merge // get detailed information about the merge command forvalues x = 3/$wave { // x specifies the waves which need to be added merge 1:1 cid dropoffvers using parenting_`x' // merge to parenting data wave 2 by key variable cid and dropoffvers drop _merge // optional: delete automatically generated variable _merge erase "parenting_`x'.dta" // delete temporal dataset } erase "parenting_2.dta" // delete temporal dataset describe, short // describe merged dataset save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.1 Different actors (anchor partner) *** *** ! WIDE-Format ! *** *** *** *** Example: wave 1 *** *** *** ************************************************************ * Use anchor1 cd $inpath use anchor1, clear // load anchor data wave 1 * Merge anchor1 and partner1 (one line: one anchor/partner) help merge // get detailed information about the merge command merge 1:1 id using partner1 // merge partner data wave 1 to anchor data wave 1 by common key variable id // partner variables are indicated by a "p"-prefix keep if _merge==3 // omit if you want to keep the anchors without partner, too drop _merge // optional: delete automatically generated variable _merge describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.2.1 Different actors (anchor partner) *** *** - maintain names of variables - *** *** ! LONG-Format ! *** *** *** *** Example: wave 1 *** *** *** ************************************************************ * Use anchor1 cd $inpath use anchor1, clear // load anchor data wave 1 * Append partner1 (one line: one anchor or one partner) help append // get detailed information about the append command append using partner1 // append data from partner wave 1 sort id // optional: in order to have the data of each couple one below the other describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.2.2 Different actors (anchor partner) *** *** - adjust names of variables - *** *** ! LONG-Format ! *** *** *** *** Example: wave 1 *** *** *** ************************************************************ ** VERSION 1: Start with partner dataset ** Note: Variable names for both actors are taken from anchor dataset (no prefix). * Use partner1 and rename variables from pvar to var cd $inpath use id pid psex psat1* using partner1, clear // load specified variables of partner data wave 1 foreach x in sex sat1i1 sat1i2 sat1i3 sat1i4 { // x specifies the variables which need to be renamed rename p`x' `x' // remove the p-prefix to adjust the variable names } * Append anchor1 help append // get detailed information about the append command append using anchor1, keep (id pid sex_gen sat1*) gen (anchor) // append specified variables from anchor persons wave 1 label var anchor "Data source: Anchor" // the new "anchor" variable indicates the data source lab def anchor 0"0 Partner" 1"1 Anchor" lab val anchor anchor replace sex=sex_gen if anchor==1 // anchor data set: variable is called "sex_gen" drop sex_gen sort id // optional: to display the data of each couepl one after the other order id pid anchor describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ** VERSION 2: Start with anchor dataset ** Note: Variable names for both actors are taken from partner dataset (prefix "p"). * Use anchor1 and rename variables from var to pvar cd $inpath use id pid sex_gen sat1* using anchor1, clear // load specified variables of anchor data wave 1 foreach x in sat1i1 sat1i2 sat1i3 sat1i4 { // x specifies the variables which need to be renamed rename `x' p`x' // add the p-prefix to adjust the variable names } rename sex_gen psex // rename sex_gen so that it is compatible with partner data * Append partner1 help append // get detailed information about the append command append using partner1, keep (id pid psex psat1*) gen (partner) // append specified variables from partner persons wave 1 label var partner "Data source: Partner" // the new "partner" variable indicates the data source lab def partner 0"0 Anchor" 1"1 Partner" lab val partner partner sort id // optional: to display the data of each couepl one after the other order id pid partner describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.3 Different actors *** *** (anchor/partner + parent/child/parenting) *** *** ! WIDE-Format ! *** *** *** *** Example: anchor & child (wave 3) *** *** *** ************************************************************ ** VERSION 1 (start with anchor dataset) ** * Use anchor3 cd $inpath use id cohort sex_gen crn29* using anchor3, clear // load anchor data wave 3 * Merge anchor3 with child3 (one line: one child) help merge // get detailed information about the merge command merge 1:m id using child3, keepusing (cid cnr cpcr10i1 cpcr10i2 cpcr10i3) // merge specified child variables of wave 3 with anchor data wave 3 by key variable id keep if _merge==3 drop _merge // delete automatically generated variable _merge * Rename child variables and reshape to wide format (one line: one anchor) foreach x in cpcr10i1 cpcr10i2 cpcr10i3 { // rename variables for better clarity after reshaping (optional) rename `x' `x'_c } reshape wide cid cpcr10i1_c cpcr10i2_c cpcr10i3_c, i(id) j(cnr) // reshape to wide format describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name * Note: Dataset includes only merged observations. ** VERSION 2 (start with child dataset) ** * Use child3 cd $inpath use id cid cnr cpcr10i1 cpcr10i2 cpcr10i3 using child3, clear // load specified variable of child data wave 3 * Rename child variables and reshape to wide format foreach x in cpcr10i1 cpcr10i2 cpcr10i3 { // rename variables for better clarity after reshaping (optional) rename `x' `x'_c } reshape wide cid cpcr10i1_c cpcr10i2_c cpcr10i3_c, i(id) j(cnr) // reshape to wide format * Merge with anchor (by anchor identifier id) merge 1:1 id using anchor3, keepusing(id cohort sex_gen crn29*) // merge with anchor data wave 3 describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name * Note: Dataset includes all anchor observations. ************************************************************ *** *** *** 2.4.1 Different actors *** *** (anchor/partner + parent/child/parenting) *** *** - maintain names of variables - *** *** ! LONG-Format ! *** *** *** *** Example: anchor & parent (wave 2) *** *** *** ************************************************************ * Use anchor2 cd $inpath use id cohort sex_gen using anchor2, clear // load anchor data wave 2 * Merge anchor2 with parent2 (by anchor identifier id) help merge // get detailed information about the merge command merge 1:m id using parent2 // merge parent data wave 2 to anchor data wave 2 by common key variable id drop _merge // optional: delete automatically generated variable _merge sort id describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.4.2 Different actors *** *** (anchor/partner + parent/child/parenting) *** *** - adjust names of variables - *** *** ! LONG-Format ! *** *** *** *** Example: anchor & parent (wave 3) *** *** *** ************************************************************ * Use parent3 and rename parent variables from parvar to var cd $inpath use id parid parval1i1 parval1i2 parval1i3 using parent3, clear // load specified variables of parent data wave 3 foreach x in val1i1 val1i2 val1i3 { // x specifies the variables which need to be renamed rename par`x' `x' // remove the par-prefix to adjust the variable names } * Append anchor3 help append // get detailed information about the append command append using anchor3, keep (id val1i1 val1i2 val1i3) gen(anchor) // append specified variables from anchor persons wave 3 label var anchor "Data source: Anchor" // the new "anchor" variable indicates the data source lab def anchor 0"0 Parent" 1"1 Anchor" lab val anchor anchor sort id // optional: in order to have the data of each couple one below the other describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.5 Different actors *** *** (parent/child + child/parenting) *** *** ! WIDE-Format ! *** *** *** *** Example: parent & child (wave 2) *** *** *** ************************************************************ ** VERSION 1 (position does not correspond to parid)** * Use parent2 cd $inpath use id parid cid using parent2, clear // load specified variables of parent data wave 2 * Merge parent2 with child2 (one line: one grandparent) help merge // get detailed information about the merge command merge m:1 cid using child2, keepusing (cgp5 cgp6) // merge specified child variables of wave 2 to parent data wave 2 by key variable cid keep if _merge==3 // only keep grandparents with granchild data drop _merge // optional: delete automatically generated variable _merge * Rename variables and reshape to wide format (one line: one child) sort cid parid bysort cid: gen n=_n // running number of grandparent per grandchild foreach x in cgp5 cgp6 { // rename variables for better clarity after reshaping (optional) rename `x' `x'_par } reshape wide parid cgp5_par cgp6_par, i(cid) j(n) // rehape to wide format describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ** VERSION 2 (position corresponds to parid) ** * Use parent2 and generate variable for grandparent position cd $inpath use id parid cid using parent2, clear // load specified variables of parent data wave 2 gen n=parid-id-300 // identify last character of variable "parid" for reshaping * Merge parent2 with child2 (one line: one grandparent) merge m:1 cid using child2, keepusing (cgp5 cgp6) // merge specified child variables of wave 2 to parent data wave 2 by key variable cid keep if _merge==3 // only keep grandparents with granchild data drop _merge // optional: delete automatically generated variable _merge * Rename variables and reshape to wide format (one line: one child) foreach x in cgp5 cgp6 { // rename variables for better clarity after reshaping (optional) rename `x' `x'_par } reshape wide parid cgp5_par cgp6_par, i(cid) j(n) // rehape to wide format describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name ************************************************************ *** *** *** 2.6 Different actors *** *** (parent/child + child/parenting) *** *** ! LONG-Format ! *** *** *** *** Example: parent & parenting (wave 3) *** *** *** ************************************************************ * Use parent3 cd $inpath use parent3, clear // load parent data wave 3 * Join by parenting3 (by child identifier cid) help joinby // get detailed information about the joinby command joinby cid using parenting3 // join parenting data wave 3 to parent data wave 3 by key variable cid sort parid cid // optional: in order to have the data of each couple one below the other order id parid cid dropoffvers // dropoffvers indicates respondent of parenting questionnaire (anchor or partner) describe, short // describe merged dataset cd $outpath save "insert new dataset name here", replace // save merged dataset with new name