Copyright (c) 2017
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Bug reports: miguel-angel.luque at lshtm.ac.uk
. clear . set more off . use http://www.stata-press.com/data/r14/cattaneo2.dta (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . describe Contains data from http://www.stata-press.com/data/r14/cattaneo2.dta obs: 4,642 Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154 vars: 23 14 Jan 2015 09:49 size: 116,050 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- storage display value variable name type format label variable label -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- bweight int %9.0g infant birthweight (grams) mmarried byte %10.0g mmarried 1 if mother married mhisp byte %9.0g 1 if mother hispanic fhisp byte %9.0g 1 if father hispanic foreign byte %9.0g 1 if mother born abroad alcohol byte %9.0g 1 if alcohol consumed during pregnancy deadkids byte %9.0g previous births where newborn died mage byte %9.0g mother's age medu byte %9.0g mother's education attainment fage byte %9.0g father's age fedu byte %9.0g father's education attainment nprenatal byte %9.0g number of prenatal care visits monthslb int %9.0g months since last birth order byte %9.0g order of birth of the infant msmoke byte %27.0g smoke2 cigarettes smoked during pregnancy mbsmoke byte %9.0g mbsmoke 1 if mother smoked mrace byte %9.0g 1 if mother is white frace byte %9.0g 1 if father is white prenatal byte %9.0g trimester of first prenatal care visit birthmonth byte %9.0g month of birth lbweight byte %9.0g 1 if low birthweight baby fbaby byte %9.0g YesNo 1 if first baby prenatal1 byte %9.0g YesNo 1 if first prenatal visit in 1 trimester -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sorted by: . global Y bweight // Outcome . global A mbsmoke // Exposure or treatment . global C mmarried // Confounder . global W mmarried prenatal1 fbaby medu mage foreign mrace // Confounders . regr $Y $A $C // Naive approach Source | SS df MS Number of obs = 4,642 -------------+---------------------------------- F(2, 4639) = 132.55 Model | 84050074.2 2 42025037.1 Prob > F = 0.0000 Residual | 1.4708e+09 4,639 317058.453 R-squared = 0.0541 -------------+---------------------------------- Adj R-squared = 0.0536 Total | 1.5549e+09 4,641 335032.156 Root MSE = 563.08 ------------------------------------------------------------------------------ bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mbsmoke | -224.4212 21.85195 -10.27 0.000 -267.2614 -181.581 mmarried | 182.794 18.55405 9.85 0.000 146.4193 219.1688 _cons | 3275.55 16.68283 196.34 0.000 3242.844 3308.256 ------------------------------------------------------------------------------
. proportion $C Proportion estimation Number of obs = 4,642 -------------------------------------------------------------- | Logit | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ mmarried | notmarried | .3003016 .0067287 .2872778 .313656 married | .6996984 .0067287 .686344 .7127222 -------------------------------------------------------------- . matrix m=e(b) . gen mmarried1 = m[1,2] . sum mmarried1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mmarried1 | 4,642 .6996984 0 .6996984 .6996984 . gen mmarried2 = m[1,1] . sum mmarried2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- mmarried2 | 4,642 .3003016 0 .3003016 .3003016
. sumup $Y, by($A $C) mbsmoke mmarried | Obs Missing Mean Std. Dev. Min Max -----------------+------------------------------------------------------------------ nonsmok notmarri | 939 0 3258.925 635.8025 454 4933 nonsmok married | 2839 0 3463.843 537.9523 340 5500 smoker notmarri | 455 0 3085.437 548.4611 397 5018 smoker married | 409 0 3195.756 569.468 482 4734 -----------------+------------------------------------------------------------------ Total | 4642 0 3361.68 578.8196 340 5500 ------------------------------------------------------------------------------------ . matrix y00 = r(Stat1) . matrix y01 = r(Stat2) . matrix y10 = r(Stat3) . matrix y11 = r(Stat4) . gen ATE = ((y11[3,1]-y01[3,1]))*mmarried1 + ((y10[3,1]-y00[3,1]))*mmarried2 . qui: sum ATE . display "The ATE is: " `r(mean)' The ATE is: -239.67882 . drop ATE . *Check Stata implementation results . teffects ra ($Y $C) ($A) Iteration 0: EE criterion = 1.549e-24 Iteration 1: EE criterion = 5.234e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -239.6788 23.14732 -10.35 0.000 -285.0467 -194.3109 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3402.306 9.51684 357.50 0.000 3383.653 3420.958 ----------------------------------------------------------------------------------------
. proportion $C if $A==1 Proportion estimation Number of obs = 864 -------------------------------------------------------------- | Logit | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ mmarried | notmarried | .5266204 .0169961 .4931927 .5598111 married | .4733796 .0169961 .4401889 .5068073 -------------------------------------------------------------- . matrix m=e(b) . gen mmarried1atet = m[1,2] . gen mmarried2atet = m[1,1] . gen ATT = ((y11[3,1]-y01[3,1]))*mmarried1atet + ((y10[3,1]-y00[3,1]))*mmarried2atet . qui: sum ATT . display "The ATT is: " `r(mean)' The ATT is: -218.26932 . *Check restuls Stata Implementation . teffects ra ($Y $C) ($A), atet Iteration 0: EE criterion = 1.503e-24 Iteration 1: EE criterion = 3.239e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATET | mbsmoke | (smoker vs nonsmoker) | -218.2693 22.4693 -9.71 0.000 -262.3083 -174.2303 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3355.929 12.41832 270.24 0.000 3331.59 3380.268 ---------------------------------------------------------------------------------------- . drop ATT
. program drop ATE . program define ATE, rclass 1. capture drop y1 2. capture drop y0 3. capture drop ATE 4. sumup $Y, by($A $C) 5. matrix y00 = r(Stat1) 6. matrix y01 = r(Stat2) 7. matrix y10 = r(Stat3) 8. matrix y11 = r(Stat4) 9. gen ATE = ((y11[3,1]-y01[3,1]))*mmarried1 + ((y10[3,1]-y00[3,1]))*mmarried2 10. qui sum ATE 11. return scalar ate = `r(mean)' 12. end . qui bootstrap r(ate), reps(1000): ATE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ATE _bs_1: r(ate) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -239.67882 .56238 23.21251 -285.1745 -194.1831 (N) | -287.5401 -193.4211 (P) | -291.1595 -197.7737 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval . drop ATE
. program drop ATE . program define ATE, rclass 1. capture drop y1 2. capture drop y0 3. capture drop ATT 4. sumup $Y, by($A $C) 5. matrix y00 = r(Stat1) 6. matrix y01 = r(Stat2) 7. matrix y10 = r(Stat3) 8. matrix y11 = r(Stat4) 9. gen ATT = ((y11[3,1]-y01[3,1]))*mmarried1atet + ((y10[3,1]-y00[3,1]))*mmarried2atet 10. qui sum ATT 11. return scalar atet = `r(mean)' 12. end . qui bootstrap r(atet), reps(1000): ATE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ATE _bs_1: r(atet) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -218.26932 .1785775 21.729215 -260.8578 -175.6808 (N) | -261.1092 -177.6079 (P) | -261.2072 -177.6395 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval . drop ATT
. regress $Y ibn.$A ibn.$A#c.($C), noconstant vce(robust) Linear regression Number of obs = 4,642 F(4, 4638) = 42409.94 Prob > F = 0.0000 R-squared = 0.9728 Root MSE = 562.86 ------------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------+---------------------------------------------------------------- mbsmoke | nonsmoker | 3258.925 20.74652 157.08 0.000 3218.252 3299.599 smoker | 3085.437 25.69505 120.08 0.000 3035.063 3135.812 | mbsmoke#c.mmarried | nonsmoker | 204.9171 23.0739 8.88 0.000 159.6813 250.1529 smoker | 110.3181 38.10346 2.90 0.004 35.61724 185.019 ------------------------------------------------------------------------------------ . margins $A, vce(unconditional) // Marginal probabilty for C Predictive margins Number of obs = 4,642 Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Unconditional | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mbsmoke | nonsmoker | 3402.306 9.520943 357.35 0.000 3383.64 3420.971 smoker | 3162.627 21.15799 149.48 0.000 3121.147 3204.107 ------------------------------------------------------------------------------ . margins r.$A, contrast(nowald) // Check R meaning .... Contrasts of predictive margins Model VCE : Robust Expression : Linear prediction, predict() ------------------------------------------------------------------------ | Delta-method | Contrast Std. Err. [95% Conf. Interval] -----------------------+------------------------------------------------ mbsmoke | (smoker vs nonsmoker) | -239.6788 23.14854 -285.061 -194.2967 ------------------------------------------------------------------------
. regress $Y ibn.$A ibn.$A#c.($C), noconstant vce(robust) coeflegend Linear regression Number of obs = 4,642 F(4, 4638) = 42409.94 Prob > F = 0.0000 R-squared = 0.9728 Root MSE = 562.86 ------------------------------------------------------------------------------------ bweight | Coef. Legend -------------------+---------------------------------------------------------------- mbsmoke | nonsmoker | 3258.925 _b[0bn.mbsmoke] smoker | 3085.437 _b[1.mbsmoke] | mbsmoke#c.mmarried | nonsmoker | 204.9171 _b[0bn.mbsmoke#c.mmarried] smoker | 110.3181 _b[1.mbsmoke#c.mmarried] ------------------------------------------------------------------------------------ . predictnl ATE = (_b[1bn.mbsmoke] + _b[1bn.mbsmoke#c.mmarried]*mmarried) - (_b[0bn.mbsmoke] + _b[0bn.mbsmoke#c.mmarried]*mmarried) . qui: sum ATE . display "The ATE is: " `r(mean)' The ATE is: -239.67882 . drop ATE
. teffects ra ($Y $C) ($A) //Parametric G-Formula implementation in Stata Iteration 0: EE criterion = 1.549e-24 Iteration 1: EE criterion = 5.234e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -239.6788 23.14732 -10.35 0.000 -285.0467 -194.3109 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3402.306 9.51684 357.50 0.000 3383.653 3420.958 ---------------------------------------------------------------------------------------- . . regress $Y $C if $A==1 Source | SS df MS Number of obs = 864 -------------+---------------------------------- F(1, 862) = 8.40 Model | 2621288.44 1 2621288.44 Prob > F = 0.0038 Residual | 268879388 862 311925.044 R-squared = 0.0097 -------------+---------------------------------- Adj R-squared = 0.0085 Total | 271500676 863 314601.015 Root MSE = 558.5 ------------------------------------------------------------------------------ bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 110.3181 38.05526 2.90 0.004 35.62633 185.0099 _cons | 3085.437 26.183 117.84 0.000 3034.047 3136.827 ------------------------------------------------------------------------------ . predict double y1hat (option xb assumed; fitted values) . regress $Y $C if $A==0 Source | SS df MS Number of obs = 3,778 -------------+---------------------------------- F(1, 3776) = 93.20 Model | 29629575.1 1 29629575.1 Prob > F = 0.0000 Residual | 1.2005e+09 3,776 317923.211 R-squared = 0.0241 -------------+---------------------------------- Adj R-squared = 0.0238 Total | 1.2301e+09 3,777 325683.776 Root MSE = 563.85 ------------------------------------------------------------------------------ bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 204.9171 21.22641 9.65 0.000 163.3008 246.5334 _cons | 3258.925 18.40044 177.11 0.000 3222.85 3295.001 ------------------------------------------------------------------------------ . predict double y0hat (option xb assumed; fitted values) . mean y1hat y0hat Mean estimation Number of obs = 4,642 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ y1hat | 3162.627 .7422931 3161.172 3164.082 y0hat | 3402.306 1.378817 3399.602 3405.009 -------------------------------------------------------------- . lincom _b[y1hat] - _b[y0hat] ( 1) y1hat - y0hat = 0 ------------------------------------------------------------------------------ Mean | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -239.6788 .6365241 -376.54 0.000 -240.9267 -238.4309 ------------------------------------------------------------------------------ . . * Bostrapping to get the SE and 95%CI for the ATE . capture program drop ATE . program define ATE, rclass 1. capture drop y1 2. capture drop y0 3. reg $Y $C if $A==1 4. predict double y1, xb 5. quiet sum y1 6. reg $Y $C if $A==0 7. predict double y0, xb 8. quiet sum y0 9. mean y1 y0 10. lincom _b[y1]-_b[y0] 11. return scalar ace =`r(estimate)' 12. end . qui bootstrap r(ace), reps(1000): ATE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ATE _bs_1: r(ace) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -239.67883 .1668729 23.365851 -285.4751 -193.8826 (N) | -284.6081 -194.4855 (P) | -284.5827 -194.4103 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval
. clear . use http://www.stata-press.com/data/r14/cattaneo2.dta (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . . // Stata Implementation . teffects ra ($Y $W) ($A) Iteration 0: EE criterion = 3.484e-23 Iteration 1: EE criterion = 7.862e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -239.0757 25.1374 -9.51 0.000 -288.3441 -189.8073 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3406.643 9.442598 360.77 0.000 3388.136 3425.151 ---------------------------------------------------------------------------------------- . . // Non Parametric G-FORMULA with multiple variables . regress $Y ibn.$A ibn.$A#c.($W), noconstant vce(robust) Linear regression Number of obs = 4,642 F(16, 4626) = 11378.26 Prob > F = 0.0000 R-squared = 0.9737 Root MSE = 554.06 ------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------------+---------------------------------------------------------------- mbsmoke | nonsmoker | 3026.142 62.15547 48.69 0.000 2904.287 3147.996 smoker | 3034.277 126.282 24.03 0.000 2786.704 3281.85 | mbsmoke#c.mmarried | nonsmoker | 61.97948 27.7798 2.23 0.026 7.517833 116.4411 smoker | 88.85613 42.22493 2.10 0.035 6.07513 171.6371 | mbsmoke#c.prenatal1 | nonsmoker | 30.83669 26.80587 1.15 0.250 -21.7156 83.38897 smoker | -5.674673 41.07336 -0.14 0.890 -86.19804 74.8487 | mbsmoke#c.fbaby | nonsmoker | -90.14129 19.9636 -4.52 0.000 -129.2795 -51.00311 smoker | 24.82468 39.98648 0.62 0.535 -53.5679 103.2173 | mbsmoke#c.medu | nonsmoker | 4.654846 3.844754 1.21 0.226 -2.882706 12.1924 smoker | 8.144071 7.764126 1.05 0.294 -7.07732 23.36546 | mbsmoke#c.mage | nonsmoker | 1.863902 2.163618 0.86 0.389 -2.37782 6.105625 smoker | -7.128437 4.218267 -1.69 0.091 -15.39825 1.141379 | mbsmoke#c.foreign | nonsmoker | -22.88371 39.80293 -0.57 0.565 -100.9164 55.14901 smoker | 158.5796 114.2716 1.39 0.165 -65.44728 382.6064 | mbsmoke#c.mrace | nonsmoker | 291.0515 33.25878 8.75 0.000 225.8485 356.2546 smoker | 168.8261 54.49301 3.10 0.002 61.99386 275.6584 ------------------------------------------------------------------------------------- . margins $A, vce(unconditional) Predictive margins Number of obs = 4,642 Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Unconditional | Margin Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mbsmoke | nonsmoker | 3406.643 9.458914 360.15 0.000 3388.099 3425.187 smoker | 3167.568 23.42163 135.24 0.000 3121.65 3213.485 ------------------------------------------------------------------------------ . margins r.$A, contrast(nowald) Contrasts of predictive margins Model VCE : Robust Expression : Linear prediction, predict() ------------------------------------------------------------------------ | Delta-method | Contrast Std. Err. [95% Conf. Interval] -----------------------+------------------------------------------------ mbsmoke | (smoker vs nonsmoker) | -239.0757 25.12453 -288.3318 -189.8197 ------------------------------------------------------------------------ . . // G-Computation or parametric G-Formula computation . regress $Y $W if $A==1 Source | SS df MS Number of obs = 864 -------------+---------------------------------- F(7, 856) = 4.05 Model | 8703761.59 7 1243394.51 Prob > F = 0.0002 Residual | 262796914 856 307005.741 R-squared = 0.0321 -------------+---------------------------------- Adj R-squared = 0.0241 Total | 271500676 863 314601.015 Root MSE = 554.08 ------------------------------------------------------------------------------ bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 88.85613 43.55466 2.04 0.042 3.369693 174.3426 prenatal1 | -5.674673 44.6082 -0.13 0.899 -93.22893 81.87959 fbaby | 24.82468 42.24249 0.59 0.557 -58.08632 107.7357 medu | 8.144071 9.431183 0.86 0.388 -10.36688 26.65502 mage | -7.128437 4.075119 -1.75 0.081 -15.12683 .8699587 foreign | 158.5796 121.1829 1.31 0.191 -79.2709 396.43 mrace | 168.8261 52.84241 3.19 0.001 65.11027 272.542 _cons | 3034.277 133.0632 22.80 0.000 2773.108 3295.445 ------------------------------------------------------------------------------ . predict double y1hat (option xb assumed; fitted values) . regress $Y $W if $A==0 Source | SS df MS Number of obs = 3,778 -------------+---------------------------------- F(7, 3770) = 33.88 Model | 72795077.5 7 10399296.8 Prob > F = 0.0000 Residual | 1.1573e+09 3,770 306979.454 R-squared = 0.0592 -------------+---------------------------------- Adj R-squared = 0.0574 Total | 1.2301e+09 3,777 325683.776 Root MSE = 554.06 ------------------------------------------------------------------------------ bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 61.97948 26.2361 2.36 0.018 10.54116 113.4178 prenatal1 | 30.83669 25.85499 1.19 0.233 -19.85444 81.52781 fbaby | -90.14129 19.50776 -4.62 0.000 -128.3881 -51.8945 medu | 4.654846 4.097495 1.14 0.256 -3.378676 12.68837 mage | 1.863902 2.024667 0.92 0.357 -2.105647 5.833452 foreign | -22.88371 39.00843 -0.59 0.557 -99.36337 53.59596 mrace | 291.0515 28.31813 10.28 0.000 235.5312 346.5719 _cons | 3026.142 59.78945 50.61 0.000 2908.919 3143.364 ------------------------------------------------------------------------------ . predict double y0hat (option xb assumed; fitted values) . mean y1hat y0hat Mean estimation Number of obs = 4,642 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ y1hat | 3167.568 1.353491 3164.914 3170.221 y0hat | 3406.643 2.051065 3402.622 3410.664 -------------------------------------------------------------- . lincom _b[y1hat] - _b[y0hat] ( 1) y1hat - y0hat = 0 ------------------------------------------------------------------------------ Mean | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | -239.0757 1.561876 -153.07 0.000 -242.1378 -236.0137 ------------------------------------------------------------------------------ . . * Bostrapping to get the SE and 95%CI for the ATE capture program drop ATE . program drop ATE . program define ATE, rclass 1. capture drop y1 2. capture drop y0 3. reg $Y $W if $A==1 4. predict double y1, xb 5. quiet sum y1 6. reg $Y $W if $A==0 7. predict double y0, xb 8. quiet sum y0 9. mean y1 y0 10. lincom _b[y1]-_b[y0] 11. return scalar ace =`r(estimate)' 12. end . qui bootstrap r(ace), reps(1000): ATE dots . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ATE dots _bs_1: r(ace) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -239.07573 .9380744 24.832592 -287.7467 -190.4047 (N) | -289.9457 -192.0624 (P) | -293.7208 -193.5329 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval . . // Possibilities of G-compuation in Stata . * Potential Outcomes . teffects ra ($Y $W) ($A), aequations Iteration 0: EE criterion = 3.484e-23 Iteration 1: EE criterion = 7.862e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -239.0757 25.1374 -9.51 0.000 -288.3441 -189.8073 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3406.643 9.442598 360.77 0.000 3388.136 3425.151 -----------------------+---------------------------------------------------------------- OME0 | mmarried | 61.97948 27.73188 2.23 0.025 7.625998 116.333 prenatal1 | 30.83669 26.75963 1.15 0.249 -21.61122 83.2846 fbaby | -90.14129 19.92916 -4.52 0.000 -129.2017 -51.08085 medu | 4.654846 3.838122 1.21 0.225 -2.867736 12.17743 mage | 1.863902 2.159886 0.86 0.388 -2.369396 6.097201 foreign | -22.88371 39.73427 -0.58 0.565 -100.7614 54.99404 mrace | 291.0515 33.20142 8.77 0.000 225.978 356.1251 _cons | 3026.142 62.04826 48.77 0.000 2904.529 3147.754 -----------------------+---------------------------------------------------------------- OME1 | mmarried | 88.85613 42.15209 2.11 0.035 6.23954 171.4727 prenatal1 | -5.674673 41.00251 -0.14 0.890 -86.03812 74.68877 fbaby | 24.82468 39.91751 0.62 0.534 -53.4122 103.0616 medu | 8.144071 7.750734 1.05 0.293 -7.047089 23.33523 mage | -7.128437 4.210991 -1.69 0.090 -15.38183 1.124954 foreign | 158.5796 114.0745 1.39 0.164 -65.00235 382.1615 mrace | 168.8261 54.39902 3.10 0.002 62.20603 275.4463 _cons | 3034.277 126.0642 24.07 0.000 2787.195 3281.358 ---------------------------------------------------------------------------------------- . * Relative Risk . teffects ra ($Y $W) ($A), coeflegend Iteration 0: EE criterion = 3.484e-23 Iteration 1: EE criterion = 7.862e-26 Treatment-effects estimation Number of obs = 4,642 Estimator : regression adjustment Outcome model : linear Treatment model: none ---------------------------------------------------------------------------------------- bweight | Coef. Legend -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -239.0757 _b[ATE:r1vs0.mbsmoke] -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3406.643 _b[POmean:0.mbsmoke] ---------------------------------------------------------------------------------------- . nlcom 100*_b[ATE:r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] _nl_1: 100*_b[ATE:r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] ------------------------------------------------------------------------------ bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | -7.017927 .7309724 -9.60 0.000 -8.450606 -5.585247 ------------------------------------------------------------------------------
. clear . use http://www.stata-press.com/data/r14/cattaneo2.dta (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . teffects ipw ($Y) ($A $W, logit), nolog vsquish Treatment-effects estimation Number of obs = 4,642 Estimator : inverse-probability weights Outcome model : weighted mean Treatment model: logit ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -240.2212 24.82773 -9.68 0.000 -288.8826 -191.5597 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3408.235 9.440857 361.01 0.000 3389.732 3426.739 ---------------------------------------------------------------------------------------- . . * Model for the propensity score (note here the possibility of potential missepecification) . logit $A $W, vce(robust) nolog Logistic regression Number of obs = 4,642 Wald chi2(7) = 363.42 Prob > chi2 = 0.0000 Log pseudolikelihood = -2026.9688 Pseudo R2 = 0.0914 ------------------------------------------------------------------------------ | Robust mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | -1.143264 .1041147 -10.98 0.000 -1.347325 -.9392025 prenatal1 | -.2891671 .0978237 -2.96 0.003 -.4808979 -.0974362 fbaby | -.5087053 .0904202 -5.63 0.000 -.6859256 -.331485 medu | -.1446855 .017873 -8.10 0.000 -.1797159 -.1096551 mage | .0019855 .0086558 0.23 0.819 -.0149795 .0189505 foreign | -1.192764 .2449577 -4.87 0.000 -1.672872 -.7126555 mrace | .5582083 .1190348 4.69 0.000 .3249045 .7915122 _cons | .9735576 .254319 3.83 0.000 .4751015 1.472014 ------------------------------------------------------------------------------ . predict double ps (option pr assumed; Pr(mbsmoke)) . . * Sempling weights based on Horvitz-Thompson estimator . generate double ipw1 = ($A==1)/ps . regress $Y [pw=ipw1] (sum of wgt is 4,506.74922423503) Linear regression Number of obs = 864 F(0, 863) = 0.00 Prob > F = . R-squared = 0.0000 Root MSE = 571.36 ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 3168.014 23.35919 135.62 0.000 3122.167 3213.862 ------------------------------------------------------------------------------ . generate double ipw0 = ($A==0)/(1-ps) . regress $Y [pw=ipw0] (sum of wgt is 4,660.13481444053) Linear regression Number of obs = 3,778 F(0, 3777) = 0.00 Prob > F = . R-squared = 0.0000 Root MSE = 573.51 ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 3408.235 9.523029 357.89 0.000 3389.565 3426.906 ------------------------------------------------------------------------------ . . * Bostrapping to get the SE and 95%CI for the ATE capture program drop ATE . program drop ATE . program define ATE, rclass 1. capture drop y1 2. capture drop y0 3. regress $Y [pw=ipw1] 4. matrix y1 = e(b) 5. gen double y1 = y1[1,1] 6. regress $Y [pw=ipw0] 7. matrix y0 = e(b) 8. gen double y0 = y0[1,1] 9. mean y1 y0 10. lincom _b[y1]-_b[y0] 11. return scalar ace = `r(estimate)' 12. end . qui bootstrap r(ace), reps(1000): ATE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ATE _bs_1: r(ace) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -240.22116 .2588013 24.989583 -289.1998 -191.2425 (N) | -287.8047 -192.4459 (P) | -287.6467 -192.4318 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval . . // Test balance after adjustment . qui teffects ipw ($Y) ($A $W) . tebalance summarize Covariate balance summary Raw Weighted ----------------------------------------- Number of obs = 4,642 4,642.0 Treated obs = 864 2,282.2 Control obs = 3,778 2,359.8 ----------------------------------------- ----------------------------------------------------------------- |Standardized differences Variance ratio | Raw Weighted Raw Weighted ----------------+------------------------------------------------ mmarried | -.5953009 -.020219 1.335944 1.017057 prenatal1 | -.3242695 -.0186635 1.496155 1.0278 fbaby | -.1663271 .0217106 .9430944 1.005001 medu | -.5474357 -.1156001 .7315846 .5112843 mage | -.300179 -.0740731 .8818025 .7979432 foreign | -.1706164 -.034976 .4416089 .8643349 mrace | -.1029446 -.028183 1.198452 1.052116 -----------------------------------------------------------------
. qui: teffects ipw ($Y) ($A $W, logit), nolog vsquish . teffects overlap
. sort $A . by $A: summarize ps -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> mbsmoke = nonsmoker Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- ps | 3,778 .169212 .1087 .0102683 .8271439 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -> mbsmoke = smoker Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- ps | 864 .2600891 .1340289 .0358913 .7831799 . kdensity ps if $A==1, generate(x1pointsa d1A) nograph n(10000) (n() set to 4642) . kdensity ps if $A==0, generate(x0pointsa d0A) nograph n(10000) (n() set to 4642) . label variable d1A "density for mbsmoke=1" . label variable d0A "density for mbsmoke=0" . twoway (line d0A x0pointsa , yaxis(1))(line d1A x1pointsa, yaxis(2))
. clear . set more off . use http://www.stata-press.com/data/r14/cattaneo2.dta (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . teffects ipwra ($Y $W) ($A $W) Iteration 0: EE criterion = 1.485e-16 Iteration 1: EE criterion = 9.421e-25 Treatment-effects estimation Number of obs = 4,642 Estimator : IPW regression adjustment Outcome model : linear Treatment model: logit ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -232.751 26.08798 -8.92 0.000 -283.8825 -181.6195 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3408.781 9.425782 361.64 0.000 3390.307 3427.255 ---------------------------------------------------------------------------------------- . nlcom 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] _nl_1: 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] ------------------------------------------------------------------------------ bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | -6.827984 .7588595 -9.00 0.000 -8.315321 -5.340647 ------------------------------------------------------------------------------ . . // Step One . * Inverse probability weights . logit $A $W Iteration 0: log likelihood = -2230.7484 Iteration 1: log likelihood = -2039.8238 Iteration 2: log likelihood = -2027.0399 Iteration 3: log likelihood = -2026.9688 Iteration 4: log likelihood = -2026.9688 Logistic regression Number of obs = 4,642 LR chi2(7) = 407.56 Prob > chi2 = 0.0000 Log likelihood = -2026.9688 Pseudo R2 = 0.0914 ------------------------------------------------------------------------------ mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | -1.143264 .0984479 -11.61 0.000 -1.336218 -.9503091 prenatal1 | -.2891671 .0967775 -2.99 0.003 -.4788476 -.0994866 fbaby | -.5087053 .089107 -5.71 0.000 -.6833518 -.3340587 medu | -.1446855 .0178536 -8.10 0.000 -.1796779 -.1096932 mage | .0019855 .0085171 0.23 0.816 -.0147077 .0186787 foreign | -1.192764 .2434403 -4.90 0.000 -1.669898 -.7156296 mrace | .5582083 .1153203 4.84 0.000 .3321848 .7842319 _cons | .9735576 .2573016 3.78 0.000 .4692559 1.477859 ------------------------------------------------------------------------------ . predict double dps, pr . gen ipw = . (4,642 missing values generated) . replace ipw=($A==1)/dps if $A==1 (864 real changes made) . replace ipw=($A==0)/(1-dps) if $A==0 (3,778 real changes made) . sum ipw Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- ipw | 4,642 1.97477 2.119163 1.010375 27.86188 . . * Stabilized weights . logit $A, vce(robust) nolog Logistic regression Number of obs = 4,642 Wald chi2(0) = . Prob > chi2 = . Log pseudolikelihood = -2230.7484 Pseudo R2 = 0.0000 ------------------------------------------------------------------------------ | Robust mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | -1.475377 .0377148 -39.12 0.000 -1.549297 -1.401458 ------------------------------------------------------------------------------ . predict double nps, pr . gen sws = . (4,642 missing values generated) . replace sws = nps/dps if $A==1 (864 real changes made) . replace sws = (1-nps)/(1-dps) if $A==0 (3,778 real changes made) . sum sws Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- sws | 4,642 .9977565 .3239543 .2376551 5.185839 . . // Step Two . * GCOMP . reg $Y $W if $A==1 [pw=sws] (sum of wgt is 838.8262244164944) Linear regression Number of obs = 864 F(7, 856) = 3.88 Prob > F = 0.0004 R-squared = 0.0411 Root MSE = 561.78 ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 65.34198 45.13777 1.45 0.148 -23.25169 153.9356 prenatal1 | -7.480862 46.30099 -0.16 0.872 -98.35762 83.3959 fbaby | 40.47876 49.47901 0.82 0.414 -56.63562 137.5932 medu | 21.12795 11.91467 1.77 0.077 -2.257435 44.51334 mage | -8.216456 5.313706 -1.55 0.122 -18.64588 2.212963 foreign | 201.832 140.7535 1.43 0.152 -74.43046 478.0945 mrace | 211.021 65.76906 3.21 0.001 81.93349 340.1085 _cons | 2880.082 183.0892 15.73 0.000 2520.725 3239.438 ------------------------------------------------------------------------------ . predict double y1, xb . reg $Y $W if $A==0 [pw=sws] (sum of wgt is 3,792.75944876671) Linear regression Number of obs = 3,778 F(7, 3770) = 25.82 Prob > F = 0.0000 R-squared = 0.0607 Root MSE = 556.35 ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | 54.60708 27.87333 1.96 0.050 -.0411953 109.2553 prenatal1 | 25.72218 27.33137 0.94 0.347 -27.86353 79.30789 fbaby | -89.76871 20.5927 -4.36 0.000 -130.1426 -49.3948 medu | 1.077116 3.674251 0.29 0.769 -6.126596 8.280828 mage | 3.183998 2.198838 1.45 0.148 -1.127029 7.495025 foreign | -32.60431 40.10512 -0.81 0.416 -111.2341 46.02552 mrace | 295.4244 33.41343 8.84 0.000 229.9142 360.9345 _cons | 3044.628 62.32354 48.85 0.000 2922.437 3166.819 ------------------------------------------------------------------------------ . predict double y0, xb . mean y1 y0 Mean estimation Number of obs = 4,642 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ y1 | 3176.03 1.698071 3172.701 3179.359 y0 | 3408.781 2.041572 3404.779 3412.784 -------------------------------------------------------------- . . *IPWRA (Problem SE) we need bootstrap . nlcom(_b[y1]-_b[y0]) _nl_1: _b[y1]-_b[y0] ------------------------------------------------------------------------------ Mean | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | -232.751 1.757386 -132.44 0.000 -236.1954 -229.3066 ------------------------------------------------------------------------------ . . *Bostrapping SE ATE (Stabilized weights) . capture program drop ACE . program define ACE, rclass 1. capture drop y1 2. capture drop y0 3. reg $Y $W if $A==1 [pw=sws] 4. predict double y1, xb 5. quiet sum y1 6. return scalar y1=`r(mean)' 7. reg $Y $W if $A==0 [pw=sws] 8. predict double y0, xb 9. quiet sum y0 10. return scalar y0=`r(mean)' 11. mean y1 y0 12. lincom _b[y1]-_b[y0] 13. return scalar ace =`r(estimate)' 14. end . qui bootstrap r(ace), reps(1000): ACE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ACE _bs_1: r(ace) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -232.75103 -1.746588 26.592008 -284.8704 -180.6317 (N) | -287.9317 -181.1544 (P) | -284.0648 -178.2514 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval . . teffects ipwra ($Y $W) ($A $W) Iteration 0: EE criterion = 1.485e-16 Iteration 1: EE criterion = 9.421e-25 Treatment-effects estimation Number of obs = 4,642 Estimator : IPW regression adjustment Outcome model : linear Treatment model: logit ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -232.751 26.08798 -8.92 0.000 -283.8825 -181.6195 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3408.781 9.425782 361.64 0.000 3390.307 3427.255 ----------------------------------------------------------------------------------------
. // Stata AIPW implementation . clear . set more off . use http://www.stata-press.com/data/r14/cattaneo2.dta (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . teffects aipw ($Y $W) ($A $W, logit) Iteration 0: EE criterion = 1.485e-16 Iteration 1: EE criterion = 5.017e-25 Treatment-effects estimation Number of obs = 4,642 Estimator : augmented IPW Outcome model : linear by ML Treatment model: logit ---------------------------------------------------------------------------------------- | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker vs nonsmoker) | -237.5415 25.66737 -9.25 0.000 -287.8486 -187.2344 -----------------------+---------------------------------------------------------------- POmean | mbsmoke | nonsmoker | 3409.018 9.421793 361.82 0.000 3390.552 3427.484 ---------------------------------------------------------------------------------------- . nlcom 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] _nl_1: 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke] ------------------------------------------------------------------------------ bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _nl_1 | -6.968033 .7462114 -9.34 0.000 -8.430581 -5.505486 ------------------------------------------------------------------------------ . . // AIPTW implementation . qui sum $Y . gen double Y = ($Y -`r(min)')/(`r(max)' - `r(min)') . qui sum $Y . global cf = (`r(max)' - `r(min)') . * Step 1: prediction model for the outcome Q0 (g-computation) . qui glm Y $A $W, fam(bin) . predict double QAW, xb . qui glm Y $W if $A==1, fam(bin) . predict double Q1W, mu . qui glm Y $W if $A==0, fam(bin) . predict double Q0W, mu . . * Step 2: prediction model for the treatment g0 (IPTW) . * Inverse probability weights (Stabilized weights) . qui logit $A $W . predict double dps, pr . * Stabilized weights . qui logit $A . predict double nps, pr . gen sws = . (4,642 missing values generated) . replace sws = nps/dps if $A==1 (864 real changes made) . replace sws = (1-nps)/(1-dps) if $A==0 (3,778 real changes made) . sum sws Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- sws | 4,642 .9977565 .3239543 .2376551 5.185839 . . * Step 3: Estimation Equation . gen double ATE = (sws*(Y-QAW) + (Q1W - Q0W)*$cf) . sum ATE Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- ATE | 4,642 -238.8908 106.5089 -487.0504 195.8233 . * Marginal Structural Model . glm $Y $A [pw=sws],nolog Generalized linear models No. of obs = 4,642 Optimization : ML Residual df = 4,640 Scale parameter = 327735.3 Deviance = 1520691611 (1/df) Deviance = 327735.3 Pearson = 1520691611 (1/df) Pearson = 327735.3 Variance function: V(u) = 1 [Gaussian] Link function : g(u) = u [Identity] AIC = 15.50565 Log pseudolikelihood = -35986.62428 BIC = 1.52e+09 ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mbsmoke | -240.2212 25.2155 -9.53 0.000 -289.6426 -190.7997 _cons | 3408.235 9.522794 357.90 0.000 3389.571 3426.9 ------------------------------------------------------------------------------
. drop ATE . capture program drop ACE . program define ACE, rclass 1. capture drop ATE 2. * Estimation Equation . gen double ATE = (sws*(Y-QAW) + (Q1W - Q0W)*$cf) 3. mean ATE 4. lincom _b[ATE] 5. return scalar ace =`r(estimate)' 6. end . qui bootstrap r(ace), reps(1000): ACE . estat boot, all Bootstrap results Number of obs = 4,642 Replications = 1000 command: ACE _bs_1: r(ace) ------------------------------------------------------------------------------ | Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- _bs_1 | -238.89079 .0292854 1.5841525 -241.9957 -235.7859 (N) | -241.9429 -235.7569 (P) | -241.9064 -235.7392 (BC) ------------------------------------------------------------------------------ (N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval
. eltmle $Y $A $W, tmle D:\Dropbox\CAUSALITY\BOX\ShortCourseGranada\My-STATA-FILES . shell "C:\Program Files\R\R-3.3.2\bin\x64\R.exe" CMD BATCH SLS.R . end of do-file Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- POM1 | 4,642 2839.505 111.259 2355.208 3111.317 POM0 | 4,642 3065.963 132.7883 2589.092 3202.195 WT | 4,642 -.044143 2.959656 -6.186227 32.51731 PS | 4,642 .1863172 .1229295 .025 .8383506 ACE (Additive Effect): -226.4587; Estimated Variance: 625.9183; p-value: 0.0000; 95%CI:( -275.49, -177.42) ACE (Risk Differences): -0.0439; SE: 0.00485; p-value: 0.0000; 95%CI:(-0.0534,-0.0344)
w1=Socieconomic status, w2=Age, w3=Cancer stage, w4=Comorbidities
Note: the interaction between age and comorbidities
. clear . set obs 1000 number of observations (_N) was 0, now 1,000 . set seed 777 . gen w1 = round(runiform(1, 5)) //Quintiles of Socioeconomic Deprivation . gen w2 = rbinomial(1, 0.45) //Binary: probability age >65 = 0.45 . gen w3 = round(runiform(0, 1) + 0.75*(w2) + 0.8*(w1)) //Stage . recode w3 (5/6=1) //Stage (TNM): categorical 4 levels (w3: 128 changes made) . gen w4 = round(runiform(0, 1) + 0.75*(w2) + 0.2*(w1)) //Comorbidites: categorical four levels . gen A = (rbinomial(1,invlogit(-1 - 0.15*(w4) + 1.5*(w2) + 0.75*(w3) + 0.25*(w1) + 0.8*(w2)*(w4)))) //Binary treatment . gen Y = (rbinomial(1,invlogit(-3 + A + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1)))) //Binary outcome . // Potential outcomes . gen Yt1 = (invlogit(-3 + 1 + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1))) . gen Yt0 = (invlogit(-3 + 0 + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1))) . // True ATE . gen psi = Yt1-Yt0
TRUE SIMULATED ATE = 17.8%
. mean psi Mean estimation Number of obs = 1,000 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ psi | .1787412 .0020142 .1747886 .1826938 --------------------------------------------------------------
Note: We misspecified the models excluding the interaction between age and comorbidities
Regression adjustment ATE = 24.1%
. teffects ra (Y w1 w2 w3 w4) (A) Iteration 0: EE criterion = 8.701e-32 Iteration 1: EE criterion = 1.434e-32 Treatment-effects estimation Number of obs = 1,000 Estimator : regression adjustment Outcome model : linear Treatment model: none ------------------------------------------------------------------------------ | Robust Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | A | (1 vs 0) | .2410504 .1148066 2.10 0.036 .0160336 .4660672 -------------+---------------------------------------------------------------- POmean | A | 0 | .4739475 .1141366 4.15 0.000 .2502439 .697651 ------------------------------------------------------------------------------ . estimates store ra
Inverse probability weighting ATE = 37.9%
. teffects ipw (Y) (A w1 w2 w3 w4) Iteration 0: EE criterion = 3.522e-23 Iteration 1: EE criterion = 2.840e-33 Treatment-effects estimation Number of obs = 1,000 Estimator : inverse-probability weights Outcome model : weighted mean Treatment model: logit ------------------------------------------------------------------------------ | Robust Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | A | (1 vs 0) | .378604 .0807516 4.69 0.000 .2203337 .5368743 -------------+---------------------------------------------------------------- POmean | A | 0 | .3362695 .079327 4.24 0.000 .1807913 .4917476 ------------------------------------------------------------------------------ . estimates store ipw
IPTW-RA ATE = 31.9%
. teffects ipwra (Y w1 w2 w3 w4) (A w1 w2 w3 w4) Iteration 0: EE criterion = 3.522e-23 Iteration 1: EE criterion = 5.128e-33 Treatment-effects estimation Number of obs = 1,000 Estimator : IPW regression adjustment Outcome model : linear Treatment model: logit ------------------------------------------------------------------------------ | Robust Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | A | (1 vs 0) | .3188398 .0706036 4.52 0.000 .1804593 .4572202 -------------+---------------------------------------------------------------- POmean | A | 0 | .3958477 .0691749 5.72 0.000 .2602674 .5314281 ------------------------------------------------------------------------------ . estimates store ipwra
AIPTW ATE = 28.8%
. teffects aipw (Y w1 w2 w3 w4) (A w1 w2 w3 w4) Iteration 0: EE criterion = 3.522e-23 Iteration 1: EE criterion = 3.600e-33 Treatment-effects estimation Number of obs = 1,000 Estimator : augmented IPW Outcome model : linear by ML Treatment model: logit ------------------------------------------------------------------------------ | Robust Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | A | (1 vs 0) | .285314 .1152551 2.48 0.013 .0594183 .5112098 -------------+---------------------------------------------------------------- POmean | A | 0 | .4293636 .1144905 3.75 0.000 .2049664 .6537608 ------------------------------------------------------------------------------ . estimates store aipw
. quie reg psi . estimates store psi . estout psi ra ipw ipwra aipw ----------------------------------------------------------------------------- psi ra ipw ipwra aipw b b b b b ----------------------------------------------------------------------------- main r1vs0.A .2410504 .378604 .3188398 .285314 _cons .1787412 ----------------------------------------------------------------------------- POmean 0.A .4739475 .3362695 .3958477 .4293636 ----------------------------------------------------------------------------- OME0 w1 .0395478 .1119125 .0395478 w2 .1770299 -.098543 .1770299 w3 .1715455 .0389883 .1715455 w4 .1019039 .2425232 .1019039 _cons -.3331923 -.3427923 -.3331923 ----------------------------------------------------------------------------- OME1 w1 .0100658 .0071447 .0100658 w2 .1529128 .1510302 .1529128 w3 .0867786 .0859272 .0867786 w4 .1157346 .1193923 .1157346 _cons .2148979 .2211646 .2148979 ----------------------------------------------------------------------------- TME1 w1 .200913 .200913 .200913 w2 3.383105 3.383105 3.383105 w3 .6876172 .6876172 .6876172 w4 -.1261434 -.1261434 -.1261434 _cons -.85242 -.85242 -.85242 -----------------------------------------------------------------------------
Ensemble Learning Maximum Likelihood Estimation (ELTMLE) ATE = 22.1%
. eltmle Y A w1 w2 w3 w4, tmle D:\Dropbox\CAUSALITY\BOX\ShortCourseGranada\My-STATA-FILES . shell "C:\Program Files\R\R-3.3.2\bin\x64\R.exe" CMD BATCH SLS.R . end of do-file Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- POM1 | 1,000 .7100199 .187985 .20515 .9727612 POM0 | 1,000 .4891232 .255392 .0454817 .9379262 WT | 1,000 .2563642 3.599528 -40 2.046998 PS | 1,000 .86845 .1383139 .4885203 .975 TMLE: Average Causal Effect ACE (Risk Differences): 0.2209; SE: 0.05458; p-value: 0.0002; 95%CI:(0.1139, 0.3279) TMLE: Causal Relative Risk RR: 1.4516; 95%CI:(1.1314, 1.8624)
RELATIVE BIAS IPTW
. display (0.38 - 0.18)/.38 .52631579
RELATIVE BIAS AIPTW
. display (0.28 - 0.18)/0.28 .35714286
RELATIVE BIAS ELTMLE
. display (0.22 - 0.18)/0.22 .18181818
. *! version 2.2.1 30.Jun.2017 . *! ELTMLE: Stata module for Ensemble Learning Targeted Maximum Likelihood Estimation . *! by Miguel Angel Luque-Fernandez [cre,aut] . *! Bug reports: . *! miguel-angel.luque at lshtm.ac.uk . . /* > Copyright (c) 2017 <Miguel Angel Luque-Fernandez> > > Permission is hereby granted, free of charge, to any person obtaining a copy > of this software and associated documentation files (the "Software"), to deal > in the Software without restriction, including without limitation the rights > to use, copy, modify, merge, publish, distribute, sublicense, and/or sell > copies of the Software, and to permit persons to whom the Software is > furnished to do so, subject to the following conditions: > > The above copyright notice and this permission notice shall be included in > all copies or substantial portions of the Software. > > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE > AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, > OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE SOFTWARE. > */ . . *************************************************************************** . ** MIGUEL ANGEL LUQUE FERNANDEZ . ** mluquefe@hsph.havard.edu // miguel-angel.luque@lshtm.ac.uk . ** TMLE ALGORITHM IMPLEMENTATION IN STATA FOR BINARY OR CONTINUOUS . ** OUTCOME AND BINARY TREATMENT FOR WINDOWS USERS . ** Improved AIPTW with Super Learner (Ensemble Learning) . ** This program requires R to be installed in your computer . ** JUNE 2017 . **************************************************************************** . . * Improved Imfluence curve estimation for the causal relative risk . * Improved display including potential outcomes and weights to check balance . * Included estimation for continuous outcomes . * Added additive causal effects for continuous outcome . * Fixed risk difference and causal relative risk for SLAIPW and SLAIPWBGAM . * Checked consistency of confidence intervals for RR . . capture program drop eltmle . program define eltmle 1. syntax varlist(min=3) [if] [pw] [, slaipw slaipwbgam tmle tmlebgam] 2. version 13.2 3. marksample touse 4. local var `varlist' if `touse' 5. tokenize `var' 6. local yvar = "`1'" 7. global flag = cond(`yvar'<=1,1,0) 8. qui sum `yvar' 9. global b = `r(max)' 10. global a = `r(min)' 11. qui replace `yvar' = (`yvar' - `r(min)') / (`r(max)' - `r(min)') if `yvar'>1 12. local dir `c(pwd)' 13. cd "`dir'" 14. qui export delimited `var' using "data.csv", nolabel replace 15. if "`slaipw'" == "" & "`slaipwbgam'" == "" & "`tmlebgam'" == "" { 16. tmle `varlist' 17. } 18. else if "`tmlebgam'" == "tmlebgam" { 19. tmlebgam `varlist' 20. } 21. else if "`slaipw'" == "slaipw" { 22. slaipw `varlist' 23. } 24. else if "`slaipwbgam'" == "slaipwbgam" { 25. slaipwbgam `varlist' 26. } 27. end . . program tmle 1. // Write R Code dependencies: foreign Surperlearner . set more off 2. qui: file close _all 3. qui: file open rcode using SLS.R, write replace 4. qui: file write rcode /// > `"set.seed(123)"' _newline /// > `"list.of.packages <- c("foreign","SuperLearner")"' _newline /// > `"new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]"' _newline /// > `"if(length(new.packages)) install.packages(new.packages, repos='http://cran.us.r-project.org')"' _newline /// > `"library(SuperLearner)"' _newline /// > `"library(foreign)"' _newline /// > `"data <- read.csv("data.csv", sep=",")"' _newline /// > `"attach(data)"' _newline /// > `"SL.library <- c("SL.glm","SL.step","SL.glm.interaction")"' _newline /// > `"n <- nrow(data)"' _newline /// > `"nvar <- dim(data)[[2]]"' _newline /// > `"Y <- data[,1]"' _newline /// > `"A <- data[,2]"' _newline /// > `"X <- data[,2:nvar]"' _newline /// > `"W <- data[,3:nvar]"' _newline /// > `"X1 <- X0 <- X"' _newline /// > `"X1[,1] <- 1"' _newline /// > `"X0[,1] <- 0"' _newline /// > `"newdata <- rbind(X,X1,X0)"' _newline /// > `"Q <- try(SuperLearner(Y = data[,1] ,X = X, SL.library=SL.library, family=binomial(), newX=newdata, method="method.NNLS"), silent=TRUE)"' _newline /// > `"Q <- as.data.frame(Q[[4]])"' _newline /// > `"QAW <- Q[1:n,]"' _newline /// > `"Q1W <- Q[((n+1):(2*n)),]"' _newline /// > `"Q0W <- Q[((2*n+1):(3*n)),]"' _newline /// > `"g <- suppressWarnings(SuperLearner(Y = data[,2], X = W, SL.library = SL.library, family = binomial(), method = "method.NNLS"))"' _newline /// > `"ps <- g[[4]]"' _newline /// > `"ps[ps<0.025] <- 0.025"' _newline /// > `"ps[ps>0.975] <- 0.975"' _newline /// > `"data <- cbind(data,QAW,Q1W,Q0W,ps,Y,A)"' _newline /// > `"write.dta(data, "data2.dta")"' 5. qui: file close rcode 6. . // Write bacth file to find R.exe path and R version . set more off 7. qui: file close _all 8. qui: file open bat using setup.bat, write replace 9. qui: file write bat /// > `"@echo off"' _newline /// > `"SET PATHROOT=C:\Program Files\R\"' _newline /// > `"echo Locating path of R..."' _newline /// > `"echo."' _newline /// > `"if not exist "%PATHROOT%" goto:NO_R"' _newline /// > `"for /f "delims=" %%r in (' dir /b "%PATHROOT%R*" ') do ("' _newline /// > `"echo Found %%r"' _newline /// > `"echo shell "%PATHROOT%%%r\bin\x64\R.exe" CMD BATCH SLS.R > runr.do"' _newline /// > `"echo All set!"' _newline /// > `"goto:DONE"' _newline /// > `")"' _newline /// > `":NO_R"' _newline /// > `"echo R is not installed in your system."' _newline /// > `"echo."' _newline /// > `"echo Download it from https://cran.r-project.org/bin/windows/base/"' _newline /// > `"echo Install it and re-run this script"' _newline /// > `":DONE"' _newline /// > `"echo."' _newline /// > `"pause"' 10. qui: file close bat 11. . //Run batch . shell setup.bat 12. //Run R . do runr.do 13. . // Read Revised Data Back to Stata . clear 14. quietly: use "data2.dta", clear 15. . // Q to logit scale . gen logQAW = log(QAW / (1 - QAW)) 16. gen logQ1W = log(Q1W / (1 - Q1W)) 17. gen logQ0W = log(Q0W / (1 - Q0W)) 18. . // Clever covariate HAW . gen double HAW = (A / ps) - ((1 - A) / (1 - ps)) 19. gen double H1W = 1 / ps 20. gen double H0W = -1 / (1 - ps) 21. . // Estimation of the substitution parameter (Epsilon) . qui glm Y HAW, fam(binomial) offset(logQAW) robust noconstant 22. mat a= e(b) 23. gen double epsilon = a[1,1] 24. . // Targeted ATE, update from Q̅^0 (A,W) to Q̅^1 (A,W) . . gen double Qstar = exp(HAW*epsilon + logQAW)/(1 + exp(HAW*epsilon + logQAW)) 25. gen double Q0star = exp(H0W*epsilon + logQ0W)/(1 + exp(H0W*epsilon + logQ0W)) 26. gen double Q1star = exp(H1W*epsilon + logQ1W)/(1 + exp(H1W*epsilon + logQ1W)) 27. . global cf = $b - $a 28. . gen double POM1 = cond($flag==1,Q1star,Q1star*($cf),.) 29. gen double POM0 = cond($flag==1,Q0star,Q0star*($cf),.) 30. gen double PO = cond($flag==1,Qstar,Qstar*($cf),.) 31. . gen WT = HAW 32. gen PS = ps 33. summ POM1 POM0 WT PS 34. . // Estimating the updated targeted ATE binary outcome . gen double ATE = cond($flag==1,(Q1star - Q0star),(Q1star - Q0star)*($cf),.) 35. qui sum ATE 36. global ATEtmle = r(mean) 37. . // ATE continuous outcome . gen double ATEci = (Q1star - Q0star) 38. qui sum ATEci 39. global ATEci = r(mean) 40. . // Relative risk . qui sum Q1star 41. global Q1 = r(mean) 42. qui sum Q0star 43. global Q0 = r(mean) 44. global RRtmle = $Q1/$Q0 45. . // Statistical inference ATE . gen double IC = cond($flag==1,HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci,(HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci)*$cf,.) 46. . qui sum IC 47. global var = r(Var) 48. qui count 49. global n = r(N) 50. global varICtmle = $var/$n 51. . global pvalue = cond($flag==1,2*(normalden(abs($ATEci/sqrt($varICtmle)))),2*(normalden(abs($ATEtmle/sqrt($varICtmle)))),.) 52. . global LCIa = cond($flag==1,$ATEci -1.96*sqrt($varICtmle),$ATEtmle -1.96*sqrt($varICtmle),.) 53. global UCIa = cond($flag==1,$ATEci +1.96*sqrt($varICtmle),$ATEtmle +1.96*sqrt($varICtmle),.) 54. . // Statistical inference RR . gen double ICrr = 1/Q1star*((A/ps)*(Y - QAW) + (Q1W) - Q1star) - 1/Q0star*((1-A)/(1-ps)*(Y - QAW) + (Q0W) - Q0star) 55. qui sum ICrr 56. global varr = r(Var) 57. global varICrr = $varr/$n 58. . global LCIr = exp(log($RRtmle) - 1.96*sqrt($varICrr)) 59. global UCIr = exp(log($RRtmle) + 1.96*sqrt($varICrr)) 60. . // IC for Additive Risk differences . gen double ICrd = HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci 61. qui sum ICrd 62. global varrd = r(Var) 63. global varICrd = $varrd/$n 64. global LCIard = $ATEci -1.96*sqrt($varICrd) 65. global UCIard = $ATEci +1.96*sqrt($varICrd) 66. . // Display Results . local bin ""ACE (Risk Differences):" %10.4f $ATEtmle _col(5) "; SE:" %10.5f sqrt($varICtmle) _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:(" %5.4f $LCIa "," %7.4f $UCIa ")"" 67. local cont ""ACE (Additive Effect):" %10.4f $ATEtmle _col(5) "; Estimated Variance:" %10.4f $varICtmle _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:(" %8.2f $LCIa "," %9.2f $UCIa " > )"" 68. local contrd ""ACE (Risk Differences):" %10.4f $ATEci _col(5) "; SE:" %10.5f sqrt($varICrd) _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:(" %5.4f $LCIard "," %7.4f $UCIard ")"" 69. . if $flag==1 { 70. di _newline 71. di "TMLE: Average Causal Effect" _newline 72. di `bin' 73. } 74. else if $flag!=1{ 75. di _newline 76. di `cont' 77. di _newline 78. di `contrd' 79. } 80. . local rrbin ""RR:" %9.4f $RRtmle _col(5) "; 95%CI:(" %5.4f $LCIr "," %7.4f $UCIr ")"" 81. if $flag==1 { 82. di _newline 83. di "TMLE: Causal Relative Risk" _newline 84. display `rrbin' 85. } 86. else if $flag!=1{ 87. } 88. . drop ATEci ICrr ICrd logQAW logQ1W logQ0W HAW H1W H0W QAW Q1W Q0W Qstar Q1star Q0star ps Y A epsilon 89. . label var POM1 "Potential Outcome Y(1)" 90. label var POM0 "Potential Otucome Y(0)" 91. label var ATE "Average Treatment Effect" 92. label var WT "Inverse Probability Treatment Weights" 93. label var IC "Variance ATE" 94. label var PS "Propensity Score" 95. . // Clean up . quietly: rm SLS.R 96. //quietly: rm SLS.Rout . quietly: rm data2.dta 97. quietly: rm data.csv 98. quietly: rm .RData 99. quietly: rm runr.do 100. quietly: rm setup.bat 101. end