CIM.html

CIM-METHODS

TUTORIAL: CAUSAL INFERENCE METHODS MADE EASY FOR APPLIED RESEARCHERS/EPIDEMIOLOGISTS/STATISTICIANS

CSG-LSHTM, LONDON, 6th December 2017

Miguel Angel Luque Fernandez, PhD
Assistant Professor of Epidemiology
Cancer Survival Group, LSHTM, London, UK

Copyright (c) 2017 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Bug reports: miguel-angel.luque at lshtm.ac.uk

Content

I) DATA: Nomenclature (Globals and Naive approach)

II) Non-parametric G-Formula

III) Parametric G-Formula (G-Computation)

IV) Inverse Probability Weigthing

V) Double Robust Inverse Probability Weighthing plus Regression Adjustment

VI) Double Robust Augemented Inverse Probability Weighthing plus Regression Adjustment

VII) Double Robust Targeted Maximum Likelihood Estimation

VIII) Conclusion: simulation

IX) Appendix: ELTMLE programme

I) DATA: Nomenclature (Globals and Naive approach)

. clear 

. set more off

. use http://www.stata-press.com/data/r14/cattaneo2.dta
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. describe

Contains data from http://www.stata-press.com/data/r14/cattaneo2.dta
  obs:         4,642                          Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154
 vars:            23                          14 Jan 2015 09:49
 size:       116,050                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
bweight         int     %9.0g                 infant birthweight (grams)
mmarried        byte    %10.0g     mmarried   1 if mother married
mhisp           byte    %9.0g                 1 if mother hispanic
fhisp           byte    %9.0g                 1 if father hispanic
foreign         byte    %9.0g                 1 if mother born abroad
alcohol         byte    %9.0g                 1 if alcohol consumed during pregnancy
deadkids        byte    %9.0g                 previous births where newborn died
mage            byte    %9.0g                 mother's age
medu            byte    %9.0g                 mother's education attainment
fage            byte    %9.0g                 father's age
fedu            byte    %9.0g                 father's education attainment
nprenatal       byte    %9.0g                 number of prenatal care visits
monthslb        int     %9.0g                 months since last birth
order           byte    %9.0g                 order of birth of the infant
msmoke          byte    %27.0g     smoke2     cigarettes smoked during pregnancy
mbsmoke         byte    %9.0g      mbsmoke    1 if mother smoked
mrace           byte    %9.0g                 1 if mother is white
frace           byte    %9.0g                 1 if father is white
prenatal        byte    %9.0g                 trimester of first prenatal care visit
birthmonth      byte    %9.0g                 month of birth
lbweight        byte    %9.0g                 1 if low birthweight baby
fbaby           byte    %9.0g      YesNo      1 if first baby
prenatal1       byte    %9.0g      YesNo      1 if first prenatal visit in 1 trimester
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: 

. global Y bweight // Outcome

. global A mbsmoke // Exposure or treatment

. global C mmarried  // Confounder 

. global W mmarried prenatal1 fbaby medu mage foreign mrace // Confounders

. regr $Y $A $C // Naive approach

      Source |       SS           df       MS      Number of obs   =     4,642
-------------+----------------------------------   F(2, 4639)      =    132.55
       Model |  84050074.2         2  42025037.1   Prob > F        =    0.0000
    Residual |  1.4708e+09     4,639  317058.453   R-squared       =    0.0541
-------------+----------------------------------   Adj R-squared   =    0.0536
       Total |  1.5549e+09     4,641  335032.156   Root MSE        =    563.08

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke |  -224.4212   21.85195   -10.27   0.000    -267.2614    -181.581
    mmarried |    182.794   18.55405     9.85   0.000     146.4193    219.1688
       _cons |    3275.55   16.68283   196.34   0.000     3242.844    3308.256
------------------------------------------------------------------------------

IIa) Non Parametric G-FORMULA by hand: one confounder

ATE: First compute the marginal probability for confounder C

. proportion $C

Proportion estimation             Number of obs   =      4,642

--------------------------------------------------------------
             |                                   Logit
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
mmarried     |
  notmarried |   .3003016   .0067287      .2872778     .313656
     married |   .6996984   .0067287       .686344    .7127222
--------------------------------------------------------------

. matrix m=e(b)

. gen mmarried1 = m[1,2]

. sum mmarried1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   mmarried1 |      4,642    .6996984           0   .6996984   .6996984

. gen mmarried2 = m[1,1]

. sum mmarried2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   mmarried2 |      4,642    .3003016           0   .3003016   .3003016

ATE: Second compute the G-Formula by hand (generalization of standardization)

. sumup $Y, by($A $C)

mbsmoke mmarried |       Obs    Missing       Mean  Std. Dev.        Min        Max 
-----------------+------------------------------------------------------------------
nonsmok notmarri |        939          0   3258.925   635.8025        454       4933
nonsmok married  |       2839          0   3463.843   537.9523        340       5500
smoker  notmarri |        455          0   3085.437   548.4611        397       5018
smoker  married  |        409          0   3195.756    569.468        482       4734
-----------------+------------------------------------------------------------------
 Total           |       4642          0    3361.68   578.8196        340       5500
------------------------------------------------------------------------------------

. matrix y00 = r(Stat1)

. matrix y01 = r(Stat2)

. matrix y10 = r(Stat3)

. matrix y11 = r(Stat4)

. gen ATE = ((y11[3,1]-y01[3,1]))*mmarried1 + ((y10[3,1]-y00[3,1]))*mmarried2

. qui: sum ATE

. display "The ATE is:  " `r(mean)'
The ATE is:  -239.67882

. drop ATE 

. *Check Stata implementation results
. teffects ra ($Y $C) ($A)

Iteration 0:   EE criterion =  1.549e-24  
Iteration 1:   EE criterion =  5.234e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -239.6788   23.14732   -10.35   0.000    -285.0467   -194.3109
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3402.306    9.51684   357.50   0.000     3383.653    3420.958
----------------------------------------------------------------------------------------

How about identififiability: ATET

. proportion $C if $A==1

Proportion estimation             Number of obs   =        864

--------------------------------------------------------------
             |                                   Logit
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
mmarried     |
  notmarried |   .5266204   .0169961      .4931927    .5598111
     married |   .4733796   .0169961      .4401889    .5068073
--------------------------------------------------------------

. matrix m=e(b)

. gen mmarried1atet = m[1,2]

. gen mmarried2atet = m[1,1]

. gen ATT = ((y11[3,1]-y01[3,1]))*mmarried1atet + ((y10[3,1]-y00[3,1]))*mmarried2atet

. qui: sum ATT 

. display "The ATT is:  " `r(mean)'
The ATT is:  -218.26932

. *Check restuls Stata Implementation
. teffects ra ($Y $C) ($A), atet

Iteration 0:   EE criterion =  1.503e-24  
Iteration 1:   EE criterion =  3.239e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATET                   |
               mbsmoke |
(smoker vs nonsmoker)  |  -218.2693    22.4693    -9.71   0.000    -262.3083   -174.2303
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3355.929   12.41832   270.24   0.000      3331.59    3380.268
----------------------------------------------------------------------------------------

. drop ATT

Non-parametric G-Formula: Bostrapping for the ATE SE and 95%CI

. program drop ATE

. program define ATE, rclass
  1. capture drop y1
  2. capture drop y0
  3. capture drop ATE
  4. sumup $Y, by($A $C)
  5. matrix y00 = r(Stat1)
  6. matrix y01 = r(Stat2)
  7. matrix y10 = r(Stat3)
  8. matrix y11 = r(Stat4)
  9. gen ATE = ((y11[3,1]-y01[3,1]))*mmarried1 + ((y10[3,1]-y00[3,1]))*mmarried2
 10. qui sum ATE
 11. return scalar ate = `r(mean)'
 12. end

. qui bootstrap r(ate), reps(1000): ATE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ATE
        _bs_1:  r(ate)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -239.67882     .56238    23.21251   -285.1745  -194.1831   (N)
             |                                      -287.5401  -193.4211   (P)
             |                                      -291.1595  -197.7737  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

. drop ATE

Non-parametric G-Formula: Bostrapping for the ATET SE and 95%CI

. program drop ATE

. program define ATE, rclass
  1. capture drop y1
  2. capture drop y0
  3. capture drop ATT
  4. sumup $Y, by($A $C)
  5. matrix y00 = r(Stat1)
  6. matrix y01 = r(Stat2)
  7. matrix y10 = r(Stat3)
  8. matrix y11 = r(Stat4)
  9. gen ATT = ((y11[3,1]-y01[3,1]))*mmarried1atet + ((y10[3,1]-y00[3,1]))*mmarried2atet
 10. qui sum ATT
 11. return scalar atet = `r(mean)'
 12. end

. qui bootstrap r(atet), reps(1000): ATE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ATE
        _bs_1:  r(atet)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -218.26932   .1785775   21.729215   -260.8578  -175.6808   (N)
             |                                      -261.1092  -177.6079   (P)
             |                                      -261.2072  -177.6395  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

. drop ATT

IIb) Non-parametric implementation of the G-Formula using a fully saturated regression model

. regress $Y ibn.$A ibn.$A#c.($C), noconstant vce(robust)

Linear regression                               Number of obs     =      4,642
                                                F(4, 4638)        =   42409.94
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9728
                                                Root MSE          =     562.86

------------------------------------------------------------------------------------
                   |               Robust
           bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
           mbsmoke |
        nonsmoker  |   3258.925   20.74652   157.08   0.000     3218.252    3299.599
           smoker  |   3085.437   25.69505   120.08   0.000     3035.063    3135.812
                   |
mbsmoke#c.mmarried |
        nonsmoker  |   204.9171    23.0739     8.88   0.000     159.6813    250.1529
           smoker  |   110.3181   38.10346     2.90   0.004     35.61724     185.019
------------------------------------------------------------------------------------

. margins $A, vce(unconditional) // Marginal probabilty for C

Predictive margins                              Number of obs     =      4,642

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Unconditional
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke |
  nonsmoker  |   3402.306   9.520943   357.35   0.000      3383.64    3420.971
     smoker  |   3162.627   21.15799   149.48   0.000     3121.147    3204.107
------------------------------------------------------------------------------

. margins r.$A, contrast(nowald) // Check R meaning ....

Contrasts of predictive margins
Model VCE    : Robust

Expression   : Linear prediction, predict()

------------------------------------------------------------------------
                       |            Delta-method
                       |   Contrast   Std. Err.     [95% Conf. Interval]
-----------------------+------------------------------------------------
               mbsmoke |
(smoker vs nonsmoker)  |  -239.6788   23.14854      -285.061   -194.2967
------------------------------------------------------------------------

Note about margins Uncoditional

. regress $Y ibn.$A ibn.$A#c.($C), noconstant vce(robust) coeflegend

Linear regression                               Number of obs     =      4,642
                                                F(4, 4638)        =   42409.94
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9728
                                                Root MSE          =     562.86

------------------------------------------------------------------------------------
           bweight |      Coef.  Legend
-------------------+----------------------------------------------------------------
           mbsmoke |
        nonsmoker  |   3258.925  _b[0bn.mbsmoke]
           smoker  |   3085.437  _b[1.mbsmoke]
                   |
mbsmoke#c.mmarried |
        nonsmoker  |   204.9171  _b[0bn.mbsmoke#c.mmarried]
           smoker  |   110.3181  _b[1.mbsmoke#c.mmarried]
------------------------------------------------------------------------------------

. predictnl ATE = (_b[1bn.mbsmoke] + _b[1bn.mbsmoke#c.mmarried]*mmarried) - (_b[0bn.mbsmoke] + _b[0bn.mbsmoke#c.mmarried]*mmarried)

. qui: sum ATE

. display "The ATE is:  " `r(mean)'
The ATE is:  -239.67882

. drop ATE

III) G-Computation or parametric G-Formula

. teffects ra ($Y $C) ($A) //Parametric G-Formula implementation in Stata

Iteration 0:   EE criterion =  1.549e-24  
Iteration 1:   EE criterion =  5.234e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -239.6788   23.14732   -10.35   0.000    -285.0467   -194.3109
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3402.306    9.51684   357.50   0.000     3383.653    3420.958
----------------------------------------------------------------------------------------

. 
. regress $Y $C if $A==1

      Source |       SS           df       MS      Number of obs   =       864
-------------+----------------------------------   F(1, 862)       =      8.40
       Model |  2621288.44         1  2621288.44   Prob > F        =    0.0038
    Residual |   268879388       862  311925.044   R-squared       =    0.0097
-------------+----------------------------------   Adj R-squared   =    0.0085
       Total |   271500676       863  314601.015   Root MSE        =     558.5

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   110.3181   38.05526     2.90   0.004     35.62633    185.0099
       _cons |   3085.437     26.183   117.84   0.000     3034.047    3136.827
------------------------------------------------------------------------------

. predict double y1hat 
(option xb assumed; fitted values)

. regress $Y $C if $A==0 

      Source |       SS           df       MS      Number of obs   =     3,778
-------------+----------------------------------   F(1, 3776)      =     93.20
       Model |  29629575.1         1  29629575.1   Prob > F        =    0.0000
    Residual |  1.2005e+09     3,776  317923.211   R-squared       =    0.0241
-------------+----------------------------------   Adj R-squared   =    0.0238
       Total |  1.2301e+09     3,777  325683.776   Root MSE        =    563.85

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   204.9171   21.22641     9.65   0.000     163.3008    246.5334
       _cons |   3258.925   18.40044   177.11   0.000      3222.85    3295.001
------------------------------------------------------------------------------

. predict double y0hat 
(option xb assumed; fitted values)

. mean y1hat y0hat

Mean estimation                   Number of obs   =      4,642

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       y1hat |   3162.627   .7422931      3161.172    3164.082
       y0hat |   3402.306   1.378817      3399.602    3405.009
--------------------------------------------------------------

. lincom _b[y1hat] - _b[y0hat]

 ( 1)  y1hat - y0hat = 0

------------------------------------------------------------------------------
        Mean |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -239.6788   .6365241  -376.54   0.000    -240.9267   -238.4309
------------------------------------------------------------------------------

. 
. * Bostrapping to get the SE and 95%CI for the ATE 
. capture program drop ATE

. program define ATE, rclass
  1. capture drop y1
  2. capture drop y0
  3. reg $Y $C if $A==1 
  4. predict double y1, xb
  5. quiet sum y1
  6. reg $Y $C if $A==0 
  7. predict double y0, xb 
  8. quiet sum y0
  9. mean y1 y0 
 10. lincom _b[y1]-_b[y0]
 11. return scalar ace =`r(estimate)'
 12. end

. qui bootstrap r(ace), reps(1000): ATE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ATE
        _bs_1:  r(ace)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -239.67883   .1668729   23.365851   -285.4751  -193.8826   (N)
             |                                      -284.6081  -194.4855   (P)
             |                                      -284.5827  -194.4103  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

III) G-Computation or parametric G-Formula for more than one confounder

. clear

. use http://www.stata-press.com/data/r14/cattaneo2.dta
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. 
. // Stata Implementation
. teffects ra ($Y $W) ($A)

Iteration 0:   EE criterion =  3.484e-23  
Iteration 1:   EE criterion =  7.862e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -239.0757    25.1374    -9.51   0.000    -288.3441   -189.8073
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3406.643   9.442598   360.77   0.000     3388.136    3425.151
----------------------------------------------------------------------------------------

. 
. // Non Parametric G-FORMULA with multiple variables
. regress $Y ibn.$A       ibn.$A#c.($W), noconstant vce(robust)

Linear regression                               Number of obs     =      4,642
                                                F(16, 4626)       =   11378.26
                                                Prob > F          =     0.0000
                                                R-squared         =     0.9737
                                                Root MSE          =     554.06

-------------------------------------------------------------------------------------
                    |               Robust
            bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------------+----------------------------------------------------------------
            mbsmoke |
         nonsmoker  |   3026.142   62.15547    48.69   0.000     2904.287    3147.996
            smoker  |   3034.277    126.282    24.03   0.000     2786.704     3281.85
                    |
 mbsmoke#c.mmarried |
         nonsmoker  |   61.97948    27.7798     2.23   0.026     7.517833    116.4411
            smoker  |   88.85613   42.22493     2.10   0.035      6.07513    171.6371
                    |
mbsmoke#c.prenatal1 |
         nonsmoker  |   30.83669   26.80587     1.15   0.250     -21.7156    83.38897
            smoker  |  -5.674673   41.07336    -0.14   0.890    -86.19804     74.8487
                    |
    mbsmoke#c.fbaby |
         nonsmoker  |  -90.14129    19.9636    -4.52   0.000    -129.2795   -51.00311
            smoker  |   24.82468   39.98648     0.62   0.535     -53.5679    103.2173
                    |
     mbsmoke#c.medu |
         nonsmoker  |   4.654846   3.844754     1.21   0.226    -2.882706     12.1924
            smoker  |   8.144071   7.764126     1.05   0.294     -7.07732    23.36546
                    |
     mbsmoke#c.mage |
         nonsmoker  |   1.863902   2.163618     0.86   0.389     -2.37782    6.105625
            smoker  |  -7.128437   4.218267    -1.69   0.091    -15.39825    1.141379
                    |
  mbsmoke#c.foreign |
         nonsmoker  |  -22.88371   39.80293    -0.57   0.565    -100.9164    55.14901
            smoker  |   158.5796   114.2716     1.39   0.165    -65.44728    382.6064
                    |
    mbsmoke#c.mrace |
         nonsmoker  |   291.0515   33.25878     8.75   0.000     225.8485    356.2546
            smoker  |   168.8261   54.49301     3.10   0.002     61.99386    275.6584
-------------------------------------------------------------------------------------

. margins $A, vce(unconditional)

Predictive margins                              Number of obs     =      4,642

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Unconditional
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke |
  nonsmoker  |   3406.643   9.458914   360.15   0.000     3388.099    3425.187
     smoker  |   3167.568   23.42163   135.24   0.000      3121.65    3213.485
------------------------------------------------------------------------------

. margins r.$A, contrast(nowald)

Contrasts of predictive margins
Model VCE    : Robust

Expression   : Linear prediction, predict()

------------------------------------------------------------------------
                       |            Delta-method
                       |   Contrast   Std. Err.     [95% Conf. Interval]
-----------------------+------------------------------------------------
               mbsmoke |
(smoker vs nonsmoker)  |  -239.0757   25.12453     -288.3318   -189.8197
------------------------------------------------------------------------

. 
. // G-Computation or parametric G-Formula computation
. regress $Y $W if $A==1

      Source |       SS           df       MS      Number of obs   =       864
-------------+----------------------------------   F(7, 856)       =      4.05
       Model |  8703761.59         7  1243394.51   Prob > F        =    0.0002
    Residual |   262796914       856  307005.741   R-squared       =    0.0321
-------------+----------------------------------   Adj R-squared   =    0.0241
       Total |   271500676       863  314601.015   Root MSE        =    554.08

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   88.85613   43.55466     2.04   0.042     3.369693    174.3426
   prenatal1 |  -5.674673    44.6082    -0.13   0.899    -93.22893    81.87959
       fbaby |   24.82468   42.24249     0.59   0.557    -58.08632    107.7357
        medu |   8.144071   9.431183     0.86   0.388    -10.36688    26.65502
        mage |  -7.128437   4.075119    -1.75   0.081    -15.12683    .8699587
     foreign |   158.5796   121.1829     1.31   0.191     -79.2709      396.43
       mrace |   168.8261   52.84241     3.19   0.001     65.11027     272.542
       _cons |   3034.277   133.0632    22.80   0.000     2773.108    3295.445
------------------------------------------------------------------------------

. predict double y1hat 
(option xb assumed; fitted values)

. regress $Y $W if $A==0 

      Source |       SS           df       MS      Number of obs   =     3,778
-------------+----------------------------------   F(7, 3770)      =     33.88
       Model |  72795077.5         7  10399296.8   Prob > F        =    0.0000
    Residual |  1.1573e+09     3,770  306979.454   R-squared       =    0.0592
-------------+----------------------------------   Adj R-squared   =    0.0574
       Total |  1.2301e+09     3,777  325683.776   Root MSE        =    554.06

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   61.97948    26.2361     2.36   0.018     10.54116    113.4178
   prenatal1 |   30.83669   25.85499     1.19   0.233    -19.85444    81.52781
       fbaby |  -90.14129   19.50776    -4.62   0.000    -128.3881    -51.8945
        medu |   4.654846   4.097495     1.14   0.256    -3.378676    12.68837
        mage |   1.863902   2.024667     0.92   0.357    -2.105647    5.833452
     foreign |  -22.88371   39.00843    -0.59   0.557    -99.36337    53.59596
       mrace |   291.0515   28.31813    10.28   0.000     235.5312    346.5719
       _cons |   3026.142   59.78945    50.61   0.000     2908.919    3143.364
------------------------------------------------------------------------------

. predict double y0hat 
(option xb assumed; fitted values)

. mean y1hat y0hat

Mean estimation                   Number of obs   =      4,642

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       y1hat |   3167.568   1.353491      3164.914    3170.221
       y0hat |   3406.643   2.051065      3402.622    3410.664
--------------------------------------------------------------

. lincom _b[y1hat] - _b[y0hat]

 ( 1)  y1hat - y0hat = 0

------------------------------------------------------------------------------
        Mean |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |  -239.0757   1.561876  -153.07   0.000    -242.1378   -236.0137
------------------------------------------------------------------------------

. 
. * Bostrapping to get the SE and 95%CI for the ATE capture program drop ATE
. program drop ATE

. program define ATE, rclass
  1. capture drop y1
  2. capture drop y0
  3. reg $Y $W if $A==1 
  4. predict double y1, xb
  5. quiet sum y1
  6. reg $Y $W if $A==0 
  7. predict double y0, xb 
  8. quiet sum y0
  9. mean y1 y0 
 10. lincom _b[y1]-_b[y0]
 11. return scalar ace =`r(estimate)'
 12. end

. qui bootstrap r(ace), reps(1000): ATE dots

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ATE dots
        _bs_1:  r(ace)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -239.07573   .9380744   24.832592   -287.7467  -190.4047   (N)
             |                                      -289.9457  -192.0624   (P)
             |                                      -293.7208  -193.5329  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

. 
. // Possibilities of G-compuation in Stata 
. * Potential Outcomes
. teffects ra ($Y $W) ($A), aequations

Iteration 0:   EE criterion =  3.484e-23  
Iteration 1:   EE criterion =  7.862e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -239.0757    25.1374    -9.51   0.000    -288.3441   -189.8073
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3406.643   9.442598   360.77   0.000     3388.136    3425.151
-----------------------+----------------------------------------------------------------
OME0                   |
              mmarried |   61.97948   27.73188     2.23   0.025     7.625998     116.333
             prenatal1 |   30.83669   26.75963     1.15   0.249    -21.61122     83.2846
                 fbaby |  -90.14129   19.92916    -4.52   0.000    -129.2017   -51.08085
                  medu |   4.654846   3.838122     1.21   0.225    -2.867736    12.17743
                  mage |   1.863902   2.159886     0.86   0.388    -2.369396    6.097201
               foreign |  -22.88371   39.73427    -0.58   0.565    -100.7614    54.99404
                 mrace |   291.0515   33.20142     8.77   0.000      225.978    356.1251
                 _cons |   3026.142   62.04826    48.77   0.000     2904.529    3147.754
-----------------------+----------------------------------------------------------------
OME1                   |
              mmarried |   88.85613   42.15209     2.11   0.035      6.23954    171.4727
             prenatal1 |  -5.674673   41.00251    -0.14   0.890    -86.03812    74.68877
                 fbaby |   24.82468   39.91751     0.62   0.534     -53.4122    103.0616
                  medu |   8.144071   7.750734     1.05   0.293    -7.047089    23.33523
                  mage |  -7.128437   4.210991    -1.69   0.090    -15.38183    1.124954
               foreign |   158.5796   114.0745     1.39   0.164    -65.00235    382.1615
                 mrace |   168.8261   54.39902     3.10   0.002     62.20603    275.4463
                 _cons |   3034.277   126.0642    24.07   0.000     2787.195    3281.358
----------------------------------------------------------------------------------------

. * Relative Risk
. teffects ra ($Y $W) ($A), coeflegend

Iteration 0:   EE criterion =  3.484e-23  
Iteration 1:   EE criterion =  7.862e-26  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
----------------------------------------------------------------------------------------
               bweight |      Coef.  Legend
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -239.0757  _b[ATE:r1vs0.mbsmoke]
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3406.643  _b[POmean:0.mbsmoke]
----------------------------------------------------------------------------------------

. nlcom 100*_b[ATE:r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

       _nl_1:  100*_b[ATE:r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |  -7.017927   .7309724    -9.60   0.000    -8.450606   -5.585247
------------------------------------------------------------------------------

IV) INVERSE PROBABILITY WEIGTHING (IPW) based on the Propensity Score

. clear

. use http://www.stata-press.com/data/r14/cattaneo2.dta
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. teffects ipw ($Y) ($A $W, logit), nolog vsquish

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: logit
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -240.2212   24.82773    -9.68   0.000    -288.8826   -191.5597
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3408.235   9.440857   361.01   0.000     3389.732    3426.739
----------------------------------------------------------------------------------------

. 
. * Model for the propensity score (note here the possibility of potential missepecification) 
. logit $A $W, vce(robust) nolog

Logistic regression                             Number of obs     =      4,642
                                                Wald chi2(7)      =     363.42
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -2026.9688               Pseudo R2         =     0.0914

------------------------------------------------------------------------------
             |               Robust
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |  -1.143264   .1041147   -10.98   0.000    -1.347325   -.9392025
   prenatal1 |  -.2891671   .0978237    -2.96   0.003    -.4808979   -.0974362
       fbaby |  -.5087053   .0904202    -5.63   0.000    -.6859256    -.331485
        medu |  -.1446855    .017873    -8.10   0.000    -.1797159   -.1096551
        mage |   .0019855   .0086558     0.23   0.819    -.0149795    .0189505
     foreign |  -1.192764   .2449577    -4.87   0.000    -1.672872   -.7126555
       mrace |   .5582083   .1190348     4.69   0.000     .3249045    .7915122
       _cons |   .9735576    .254319     3.83   0.000     .4751015    1.472014
------------------------------------------------------------------------------

. predict double ps
(option pr assumed; Pr(mbsmoke))

. 
. * Sempling weights based on Horvitz-Thompson estimator
. generate double ipw1 = ($A==1)/ps

. regress $Y [pw=ipw1]
(sum of wgt is 4,506.74922423503)

Linear regression                               Number of obs     =        864
                                                F(0, 863)         =       0.00
                                                Prob > F          =          .
                                                R-squared         =     0.0000
                                                Root MSE          =     571.36

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   3168.014   23.35919   135.62   0.000     3122.167    3213.862
------------------------------------------------------------------------------

. generate double ipw0 = ($A==0)/(1-ps)

. regress $Y [pw=ipw0]
(sum of wgt is 4,660.13481444053)

Linear regression                               Number of obs     =      3,778
                                                F(0, 3777)        =       0.00
                                                Prob > F          =          .
                                                R-squared         =     0.0000
                                                Root MSE          =     573.51

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   3408.235   9.523029   357.89   0.000     3389.565    3426.906
------------------------------------------------------------------------------

. 
. * Bostrapping to get the SE and 95%CI for the ATE capture program drop ATE
. program drop ATE

. program define ATE, rclass
  1. capture drop y1
  2. capture drop y0
  3. regress $Y [pw=ipw1]
  4. matrix y1 = e(b)
  5. gen double y1 = y1[1,1]
  6. regress $Y [pw=ipw0]
  7. matrix y0 = e(b)
  8. gen double y0 = y0[1,1]
  9. mean y1 y0
 10. lincom _b[y1]-_b[y0]
 11. return scalar ace = `r(estimate)'
 12. end

. qui bootstrap r(ace), reps(1000): ATE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ATE
        _bs_1:  r(ace)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -240.22116   .2588013   24.989583   -289.1998  -191.2425   (N)
             |                                      -287.8047  -192.4459   (P)
             |                                      -287.6467  -192.4318  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

. 
. // Test balance after adjustment
. qui teffects ipw ($Y) ($A $W)   

. tebalance summarize

  Covariate balance summary
                                                   Raw     Weighted
                          -----------------------------------------
                          Number of obs =        4,642      4,642.0
                          Treated obs   =          864      2,282.2
                          Control obs   =        3,778      2,359.8
                          -----------------------------------------

  -----------------------------------------------------------------
                  |Standardized differences          Variance ratio
                  |        Raw    Weighted           Raw   Weighted
  ----------------+------------------------------------------------
         mmarried |  -.5953009    -.020219      1.335944   1.017057
        prenatal1 |  -.3242695   -.0186635      1.496155     1.0278
            fbaby |  -.1663271    .0217106      .9430944   1.005001
             medu |  -.5474357   -.1156001      .7315846   .5112843
             mage |   -.300179   -.0740731      .8818025   .7979432
          foreign |  -.1706164    -.034976      .4416089   .8643349
            mrace |  -.1029446    -.028183      1.198452   1.052116
  -----------------------------------------------------------------

Checking positivity violations: Overlap

. qui: teffects ipw ($Y) ($A $W, logit), nolog vsquish

. teffects overlap

Figure 1. Probability density function of the propensity score by exposure status

Overlap by hand

. sort $A

. by $A: summarize ps

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> mbsmoke = nonsmoker

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          ps |      3,778     .169212       .1087   .0102683   .8271439

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-> mbsmoke = smoker

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          ps |        864    .2600891    .1340289   .0358913   .7831799


. kdensity ps if $A==1, generate(x1pointsa d1A) nograph n(10000)
(n() set to 4642)

. kdensity ps if $A==0, generate(x0pointsa d0A) nograph n(10000)
(n() set to 4642)

. label variable d1A "density for mbsmoke=1"

. label variable d0A "density for mbsmoke=0"

. twoway (line d0A x0pointsa , yaxis(1))(line d1A x1pointsa, yaxis(2))

Figure 2. Probability density function of the propensity score by exposure status (hand calculation)

V) Inverse Probability Weighting plus regression adjustment (IPWRA)

. clear

. set more off

. use http://www.stata-press.com/data/r14/cattaneo2.dta
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. teffects ipwra ($Y $W) ($A $W)

Iteration 0:   EE criterion =  1.485e-16  
Iteration 1:   EE criterion =  9.421e-25  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : IPW regression adjustment
Outcome model  : linear
Treatment model: logit
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |   -232.751   26.08798    -8.92   0.000    -283.8825   -181.6195
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3408.781   9.425782   361.64   0.000     3390.307    3427.255
----------------------------------------------------------------------------------------

. nlcom 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

       _nl_1:  100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |  -6.827984   .7588595    -9.00   0.000    -8.315321   -5.340647
------------------------------------------------------------------------------

. 
. // Step One
. * Inverse probability weights
. logit $A $W

Iteration 0:   log likelihood = -2230.7484  
Iteration 1:   log likelihood = -2039.8238  
Iteration 2:   log likelihood = -2027.0399  
Iteration 3:   log likelihood = -2026.9688  
Iteration 4:   log likelihood = -2026.9688  

Logistic regression                             Number of obs     =      4,642
                                                LR chi2(7)        =     407.56
                                                Prob > chi2       =     0.0000
Log likelihood = -2026.9688                     Pseudo R2         =     0.0914

------------------------------------------------------------------------------
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |  -1.143264   .0984479   -11.61   0.000    -1.336218   -.9503091
   prenatal1 |  -.2891671   .0967775    -2.99   0.003    -.4788476   -.0994866
       fbaby |  -.5087053    .089107    -5.71   0.000    -.6833518   -.3340587
        medu |  -.1446855   .0178536    -8.10   0.000    -.1796779   -.1096932
        mage |   .0019855   .0085171     0.23   0.816    -.0147077    .0186787
     foreign |  -1.192764   .2434403    -4.90   0.000    -1.669898   -.7156296
       mrace |   .5582083   .1153203     4.84   0.000     .3321848    .7842319
       _cons |   .9735576   .2573016     3.78   0.000     .4692559    1.477859
------------------------------------------------------------------------------

. predict double dps, pr

. gen ipw = .
(4,642 missing values generated)

. replace ipw=($A==1)/dps if $A==1
(864 real changes made)

. replace ipw=($A==0)/(1-dps) if $A==0
(3,778 real changes made)

. sum ipw 

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         ipw |      4,642     1.97477    2.119163   1.010375   27.86188

. 
. * Stabilized weights
. logit $A, vce(robust) nolog

Logistic regression                             Number of obs     =      4,642
                                                Wald chi2(0)      =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -2230.7484               Pseudo R2         =     0.0000

------------------------------------------------------------------------------
             |               Robust
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |  -1.475377   .0377148   -39.12   0.000    -1.549297   -1.401458
------------------------------------------------------------------------------

. predict double nps, pr

. gen sws = .
(4,642 missing values generated)

. replace sws = nps/dps if $A==1
(864 real changes made)

. replace sws = (1-nps)/(1-dps) if $A==0
(3,778 real changes made)

. sum sws

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         sws |      4,642    .9977565    .3239543   .2376551   5.185839

. 
. // Step Two
. * GCOMP
. reg $Y $W if $A==1 [pw=sws]
(sum of wgt is 838.8262244164944)

Linear regression                               Number of obs     =        864
                                                F(7, 856)         =       3.88
                                                Prob > F          =     0.0004
                                                R-squared         =     0.0411
                                                Root MSE          =     561.78

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   65.34198   45.13777     1.45   0.148    -23.25169    153.9356
   prenatal1 |  -7.480862   46.30099    -0.16   0.872    -98.35762     83.3959
       fbaby |   40.47876   49.47901     0.82   0.414    -56.63562    137.5932
        medu |   21.12795   11.91467     1.77   0.077    -2.257435    44.51334
        mage |  -8.216456   5.313706    -1.55   0.122    -18.64588    2.212963
     foreign |    201.832   140.7535     1.43   0.152    -74.43046    478.0945
       mrace |    211.021   65.76906     3.21   0.001     81.93349    340.1085
       _cons |   2880.082   183.0892    15.73   0.000     2520.725    3239.438
------------------------------------------------------------------------------

. predict double y1, xb

. reg $Y $W if $A==0 [pw=sws]
(sum of wgt is 3,792.75944876671)

Linear regression                               Number of obs     =      3,778
                                                F(7, 3770)        =      25.82
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0607
                                                Root MSE          =     556.35

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |   54.60708   27.87333     1.96   0.050    -.0411953    109.2553
   prenatal1 |   25.72218   27.33137     0.94   0.347    -27.86353    79.30789
       fbaby |  -89.76871    20.5927    -4.36   0.000    -130.1426    -49.3948
        medu |   1.077116   3.674251     0.29   0.769    -6.126596    8.280828
        mage |   3.183998   2.198838     1.45   0.148    -1.127029    7.495025
     foreign |  -32.60431   40.10512    -0.81   0.416    -111.2341    46.02552
       mrace |   295.4244   33.41343     8.84   0.000     229.9142    360.9345
       _cons |   3044.628   62.32354    48.85   0.000     2922.437    3166.819
------------------------------------------------------------------------------

. predict double y0, xb 

. mean y1 y0

Mean estimation                   Number of obs   =      4,642

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
          y1 |    3176.03   1.698071      3172.701    3179.359
          y0 |   3408.781   2.041572      3404.779    3412.784
--------------------------------------------------------------

. 
. *IPWRA (Problem SE) we need bootstrap
. nlcom(_b[y1]-_b[y0])

       _nl_1:  _b[y1]-_b[y0]

------------------------------------------------------------------------------
        Mean |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |   -232.751   1.757386  -132.44   0.000    -236.1954   -229.3066
------------------------------------------------------------------------------

. 
. *Bostrapping SE ATE (Stabilized weights)
. capture program drop ACE

. program define ACE, rclass
  1. capture drop y1
  2. capture drop y0
  3. reg $Y $W if $A==1 [pw=sws]
  4. predict double y1, xb
  5. quiet sum y1
  6. return scalar y1=`r(mean)'
  7. reg $Y $W if $A==0 [pw=sws]
  8. predict double y0, xb 
  9. quiet sum y0
 10. return scalar y0=`r(mean)'
 11. mean y1 y0 
 12. lincom _b[y1]-_b[y0]
 13. return scalar ace =`r(estimate)'
 14. end

. qui bootstrap r(ace), reps(1000): ACE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ACE
        _bs_1:  r(ace)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -232.75103  -1.746588   26.592008   -284.8704  -180.6317   (N)
             |                                      -287.9317  -181.1544   (P)
             |                                      -284.0648  -178.2514  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

. 
. teffects ipwra ($Y $W) ($A $W)

Iteration 0:   EE criterion =  1.485e-16  
Iteration 1:   EE criterion =  9.421e-25  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : IPW regression adjustment
Outcome model  : linear
Treatment model: logit
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |   -232.751   26.08798    -8.92   0.000    -283.8825   -181.6195
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3408.781   9.425782   361.64   0.000     3390.307    3427.255
----------------------------------------------------------------------------------------

VI) Augmented Inverse Probability Weighting plus regression adjustment (IPWRA)

. // Stata AIPW implementation
. clear

. set more off

. use http://www.stata-press.com/data/r14/cattaneo2.dta
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. teffects aipw ($Y $W) ($A $W, logit)

Iteration 0:   EE criterion =  1.485e-16  
Iteration 1:   EE criterion =  5.017e-25  

Treatment-effects estimation                    Number of obs     =      4,642
Estimator      : augmented IPW
Outcome model  : linear by ML
Treatment model: logit
----------------------------------------------------------------------------------------
                       |               Robust
               bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
ATE                    |
               mbsmoke |
(smoker vs nonsmoker)  |  -237.5415   25.66737    -9.25   0.000    -287.8486   -187.2344
-----------------------+----------------------------------------------------------------
POmean                 |
               mbsmoke |
            nonsmoker  |   3409.018   9.421793   361.82   0.000     3390.552    3427.484
----------------------------------------------------------------------------------------

. nlcom 100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

       _nl_1:  100*_b[r1vs0.mbsmoke]/_b[POmean:0.mbsmoke]

------------------------------------------------------------------------------
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _nl_1 |  -6.968033   .7462114    -9.34   0.000    -8.430581   -5.505486
------------------------------------------------------------------------------

. 
. // AIPTW implementation
. qui sum $Y

. gen double Y = ($Y -`r(min)')/(`r(max)' - `r(min)')

. qui sum $Y 

. global cf = (`r(max)' - `r(min)')

. * Step 1: prediction model for the outcome Q0 (g-computation)
. qui glm Y $A $W, fam(bin) 

. predict double QAW, xb

. qui glm Y $W if $A==1, fam(bin) 

. predict double Q1W, mu

. qui glm Y $W if $A==0, fam(bin)

. predict double Q0W, mu

. 
. * Step 2: prediction model for the treatment g0 (IPTW)
. * Inverse probability weights (Stabilized weights)
. qui logit $A $W

. predict double dps, pr

. * Stabilized weights
. qui logit $A

. predict double nps, pr

. gen sws = .
(4,642 missing values generated)

. replace sws = nps/dps if $A==1
(864 real changes made)

. replace sws = (1-nps)/(1-dps) if $A==0
(3,778 real changes made)

. sum sws

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         sws |      4,642    .9977565    .3239543   .2376551   5.185839

. 
. * Step 3: Estimation Equation
. gen double ATE = (sws*(Y-QAW) + (Q1W - Q0W)*$cf)

. sum ATE

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         ATE |      4,642   -238.8908    106.5089  -487.0504   195.8233

. * Marginal Structural Model 
. glm $Y $A [pw=sws],nolog

Generalized linear models                         No. of obs      =      4,642
Optimization     : ML                             Residual df     =      4,640
                                                  Scale parameter =   327735.3
Deviance         =   1520691611                   (1/df) Deviance =   327735.3
Pearson          =   1520691611                   (1/df) Pearson  =   327735.3

Variance function: V(u) = 1                       [Gaussian]
Link function    : g(u) = u                       [Identity]

                                                  AIC             =   15.50565
Log pseudolikelihood = -35986.62428               BIC             =   1.52e+09

------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     mbsmoke |  -240.2212    25.2155    -9.53   0.000    -289.6426   -190.7997
       _cons |   3408.235   9.522794   357.90   0.000     3389.571      3426.9
------------------------------------------------------------------------------

INTERESTING: Look at the 95%CI for the AIPTW estimator using BOOTSTRAP!!

. drop ATE

. capture program drop ACE

. program define ACE, rclass
  1. capture drop ATE
  2. * Estimation Equation
. gen double ATE = (sws*(Y-QAW) + (Q1W - Q0W)*$cf)
  3. mean ATE
  4. lincom _b[ATE]
  5. return scalar ace =`r(estimate)'
  6. end

. qui bootstrap r(ace), reps(1000): ACE

. estat boot, all

Bootstrap results                               Number of obs     =      4,642
                                                Replications      =       1000

      command:  ACE
        _bs_1:  r(ace)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -238.89079   .0292854   1.5841525   -241.9957  -235.7859   (N)
             |                                      -241.9429  -235.7569   (P)
             |                                      -241.9064  -235.7392  (BC)
------------------------------------------------------------------------------
(N)    normal confidence interval
(P)    percentile confidence interval
(BC)   bias-corrected confidence interval

VII) Ensemble Learning Targeted Maximum Likelihood Estimation

. eltmle $Y $A $W, tmle
D:\Dropbox\CAUSALITY\BOX\ShortCourseGranada\My-STATA-FILES

. shell "C:\Program Files\R\R-3.3.2\bin\x64\R.exe" CMD BATCH SLS.R 

. 
end of do-file

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        POM1 |      4,642    2839.505     111.259   2355.208   3111.317
        POM0 |      4,642    3065.963    132.7883   2589.092   3202.195
          WT |      4,642    -.044143    2.959656  -6.186227   32.51731
          PS |      4,642    .1863172    .1229295       .025   .8383506


ACE (Additive Effect): -226.4587; Estimated Variance:  625.9183; p-value: 0.0000; 95%CI:( -275.49,  -177.42)


ACE (Risk Differences):   -0.0439; SE:   0.00485; p-value: 0.0000; 95%CI:(-0.0534,-0.0344)

VII) SIMULATION

DATA GENERATION

w1=Socieconomic status, w2=Age, w3=Cancer stage, w4=Comorbidities

Note: the interaction between age and comorbidities

. clear

. set obs 1000
number of observations (_N) was 0, now 1,000

. set seed 777

. gen w1    = round(runiform(1, 5)) //Quintiles of Socioeconomic Deprivation

. gen w2    = rbinomial(1, 0.45) //Binary: probability age >65 = 0.45

. gen w3    = round(runiform(0, 1) + 0.75*(w2) + 0.8*(w1)) //Stage 

. recode w3 (5/6=1)       //Stage (TNM): categorical 4 levels
(w3: 128 changes made)

. gen w4    = round(runiform(0, 1) + 0.75*(w2) + 0.2*(w1)) //Comorbidites: categorical four levels

. gen A     = (rbinomial(1,invlogit(-1 -  0.15*(w4) + 1.5*(w2) + 0.75*(w3) + 0.25*(w1) + 0.8*(w2)*(w4)))) //Binary treatment

. gen Y     = (rbinomial(1,invlogit(-3 + A + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1)))) //Binary outcome

. // Potential outcomes
. gen Yt1 = (invlogit(-3 + 1 + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1)))

. gen Yt0 = (invlogit(-3 + 0 + 0.25*(w4) + 0.75*(w3) + 0.8*(w2)*(w4) + 0.05*(w1)))

. // True ATE
. gen psi = Yt1-Yt0

TRUE SIMULATED ATE = 17.8%

. mean psi

Mean estimation                   Number of obs   =      1,000

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         psi |   .1787412   .0020142      .1747886    .1826938
--------------------------------------------------------------

II) ATE estimation

Note: We misspecified the models excluding the interaction between age and comorbidities

Regression adjustment ATE = 24.1%

. teffects ra (Y w1 w2 w3 w4) (A)

Iteration 0:   EE criterion =  8.701e-32  
Iteration 1:   EE criterion =  1.434e-32  

Treatment-effects estimation                    Number of obs     =      1,000
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           A |
   (1 vs 0)  |   .2410504   .1148066     2.10   0.036     .0160336    .4660672
-------------+----------------------------------------------------------------
POmean       |
           A |
          0  |   .4739475   .1141366     4.15   0.000     .2502439     .697651
------------------------------------------------------------------------------

. estimates store ra

Inverse probability weighting ATE = 37.9%

. teffects ipw (Y) (A w1 w2 w3 w4) 

Iteration 0:   EE criterion =  3.522e-23  
Iteration 1:   EE criterion =  2.840e-33  

Treatment-effects estimation                    Number of obs     =      1,000
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: logit
------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           A |
   (1 vs 0)  |    .378604   .0807516     4.69   0.000     .2203337    .5368743
-------------+----------------------------------------------------------------
POmean       |
           A |
          0  |   .3362695    .079327     4.24   0.000     .1807913    .4917476
------------------------------------------------------------------------------

. estimates store ipw

IPTW-RA ATE = 31.9%

. teffects ipwra (Y w1 w2 w3 w4) (A w1 w2 w3 w4)

Iteration 0:   EE criterion =  3.522e-23  
Iteration 1:   EE criterion =  5.128e-33  

Treatment-effects estimation                    Number of obs     =      1,000
Estimator      : IPW regression adjustment
Outcome model  : linear
Treatment model: logit
------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           A |
   (1 vs 0)  |   .3188398   .0706036     4.52   0.000     .1804593    .4572202
-------------+----------------------------------------------------------------
POmean       |
           A |
          0  |   .3958477   .0691749     5.72   0.000     .2602674    .5314281
------------------------------------------------------------------------------

. estimates store ipwra

AIPTW ATE = 28.8%

. teffects aipw (Y w1 w2 w3 w4) (A w1 w2 w3 w4)

Iteration 0:   EE criterion =  3.522e-23  
Iteration 1:   EE criterion =  3.600e-33  

Treatment-effects estimation                    Number of obs     =      1,000
Estimator      : augmented IPW
Outcome model  : linear by ML
Treatment model: logit
------------------------------------------------------------------------------
             |               Robust
           Y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
           A |
   (1 vs 0)  |    .285314   .1152551     2.48   0.013     .0594183    .5112098
-------------+----------------------------------------------------------------
POmean       |
           A |
          0  |   .4293636   .1144905     3.75   0.000     .2049664    .6537608
------------------------------------------------------------------------------

. estimates store aipw

III) RESULTS

. quie reg psi

. estimates store psi

. estout psi ra ipw ipwra aipw

-----------------------------------------------------------------------------
                      psi           ra          ipw        ipwra         aipw
                        b            b            b            b            b
-----------------------------------------------------------------------------
main                                                                         
r1vs0.A                       .2410504      .378604     .3188398      .285314
_cons            .1787412                                                    
-----------------------------------------------------------------------------
POmean                                                                       
0.A                           .4739475     .3362695     .3958477     .4293636
-----------------------------------------------------------------------------
OME0                                                                         
w1                            .0395478                  .1119125     .0395478
w2                            .1770299                  -.098543     .1770299
w3                            .1715455                  .0389883     .1715455
w4                            .1019039                  .2425232     .1019039
_cons                        -.3331923                 -.3427923    -.3331923
-----------------------------------------------------------------------------
OME1                                                                         
w1                            .0100658                  .0071447     .0100658
w2                            .1529128                  .1510302     .1529128
w3                            .0867786                  .0859272     .0867786
w4                            .1157346                  .1193923     .1157346
_cons                         .2148979                  .2211646     .2148979
-----------------------------------------------------------------------------
TME1                                                                         
w1                                          .200913      .200913      .200913
w2                                         3.383105     3.383105     3.383105
w3                                         .6876172     .6876172     .6876172
w4                                        -.1261434    -.1261434    -.1261434
_cons                                       -.85242      -.85242      -.85242
-----------------------------------------------------------------------------

Ensemble Learning Maximum Likelihood Estimation (ELTMLE) ATE = 22.1%

. eltmle Y A w1 w2 w3 w4, tmle
D:\Dropbox\CAUSALITY\BOX\ShortCourseGranada\My-STATA-FILES

. shell "C:\Program Files\R\R-3.3.2\bin\x64\R.exe" CMD BATCH SLS.R 

. 
end of do-file

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        POM1 |      1,000    .7100199     .187985     .20515   .9727612
        POM0 |      1,000    .4891232     .255392   .0454817   .9379262
          WT |      1,000    .2563642    3.599528        -40   2.046998
          PS |      1,000      .86845    .1383139   .4885203       .975


TMLE: Average Causal Effect

ACE (Risk Differences):    0.2209; SE:   0.05458; p-value: 0.0002; 95%CI:(0.1139, 0.3279)


TMLE: Causal Relative Risk

RR:   1.4516; 95%CI:(1.1314, 1.8624)

RELATIVE BIAS IPTW

. display (0.38 - 0.18)/.38 
.52631579

RELATIVE BIAS AIPTW

. display (0.28 - 0.18)/0.28
.35714286

RELATIVE BIAS ELTMLE

. display (0.22 - 0.18)/0.22
.18181818

Appendix: ELTMLE programme

. *! version 2.2.1  30.Jun.2017
. *! ELTMLE: Stata module for Ensemble Learning Targeted Maximum Likelihood Estimation
. *! by Miguel Angel Luque-Fernandez [cre,aut]
. *! Bug reports:
. *! miguel-angel.luque at lshtm.ac.uk
. 
. /*
> Copyright (c) 2017  <Miguel Angel Luque-Fernandez>
> 
> Permission is hereby granted, free of charge, to any person obtaining a copy
> of this software and associated documentation files (the "Software"), to deal
> in the Software without restriction, including without limitation the rights
> to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> copies of the Software, and to permit persons to whom the Software is
> furnished to do so, subject to the following conditions:
> 
> The above copyright notice and this permission notice shall be included in
> all copies or substantial portions of the Software.
> 
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> THE SOFTWARE.
> */
. 
. ***************************************************************************
. ** MIGUEL ANGEL LUQUE FERNANDEZ
. ** mluquefe@hsph.havard.edu // miguel-angel.luque@lshtm.ac.uk
. ** TMLE ALGORITHM IMPLEMENTATION IN STATA FOR BINARY OR CONTINUOUS 
. ** OUTCOME AND BINARY TREATMENT FOR WINDOWS USERS 
. ** Improved AIPTW with Super Learner (Ensemble Learning)
. ** This program requires R to be installed in your computer
. ** JUNE 2017
. ****************************************************************************
. 
. * Improved Imfluence curve estimation for the causal relative risk
. * Improved display including potential outcomes and weights to check balance
. * Included estimation for continuous outcomes 
. * Added additive causal effects for continuous outcome
. * Fixed risk difference and causal relative risk for SLAIPW and SLAIPWBGAM 
. * Checked consistency of confidence intervals for RR
. 
. capture program drop eltmle

. program define eltmle
  1.      syntax varlist(min=3) [if] [pw] [, slaipw slaipwbgam tmle tmlebgam] 
  2.          version 13.2
  3.          marksample touse
  4.          local var `varlist' if `touse'
  5.          tokenize `var'
  6.          local yvar = "`1'" 
  7.      global flag = cond(`yvar'<=1,1,0)
  8.          qui sum `yvar'
  9.          global b = `r(max)'
 10.          global a = `r(min)'
 11.          qui replace `yvar' = (`yvar' - `r(min)') / (`r(max)' - `r(min)') if `yvar'>1
 12.      local dir `c(pwd)'
 13.          cd "`dir'"
 14.          qui export delimited `var' using "data.csv", nolabel replace 
 15.          if "`slaipw'" == "" & "`slaipwbgam'" == "" & "`tmlebgam'" == "" {
 16.                 tmle `varlist'
 17.          }
 18.          else if "`tmlebgam'" == "tmlebgam" {
 19.                 tmlebgam `varlist'
 20.          }
 21.          else if "`slaipw'" == "slaipw" { 
 22.             slaipw `varlist'
 23.          }
 24.          else if "`slaipwbgam'" == "slaipwbgam" {
 25.                 slaipwbgam `varlist'
 26.          }
 27. end 

. 
. program tmle  
  1. // Write R Code dependencies: foreign Surperlearner 
. set more off
  2. qui: file close _all
  3. qui: file open rcode using SLS.R, write replace
  4. qui: file write rcode ///
>         `"set.seed(123)"' _newline ///
>         `"list.of.packages <- c("foreign","SuperLearner")"' _newline ///
>         `"new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]"' _newline ///
>         `"if(length(new.packages)) install.packages(new.packages, repos='http://cran.us.r-project.org')"' _newline ///
>         `"library(SuperLearner)"' _newline ///
>         `"library(foreign)"' _newline ///
>         `"data <- read.csv("data.csv", sep=",")"' _newline ///
>         `"attach(data)"' _newline ///
>         `"SL.library <- c("SL.glm","SL.step","SL.glm.interaction")"' _newline ///
>         `"n <- nrow(data)"' _newline ///
>         `"nvar <- dim(data)[[2]]"' _newline ///
>         `"Y <- data[,1]"' _newline ///
>         `"A <- data[,2]"' _newline ///
>         `"X <- data[,2:nvar]"' _newline ///
>         `"W <- data[,3:nvar]"' _newline ///
>         `"X1 <- X0 <- X"' _newline /// 
>         `"X1[,1] <- 1"' _newline ///
>         `"X0[,1] <- 0"' _newline ///
>         `"newdata <- rbind(X,X1,X0)"' _newline /// 
>         `"Q <- try(SuperLearner(Y = data[,1] ,X = X, SL.library=SL.library, family=binomial(), newX=newdata, method="method.NNLS"), silent=TRUE)"' _newline ///
>         `"Q <- as.data.frame(Q[[4]])"' _newline ///
>         `"QAW <- Q[1:n,]"' _newline ///
>         `"Q1W <- Q[((n+1):(2*n)),]"' _newline ///
>         `"Q0W <- Q[((2*n+1):(3*n)),]"' _newline ///
>         `"g <- suppressWarnings(SuperLearner(Y = data[,2], X = W, SL.library = SL.library, family = binomial(), method = "method.NNLS"))"' _newline ///
>         `"ps <- g[[4]]"' _newline ///
>         `"ps[ps<0.025] <- 0.025"' _newline ///
>         `"ps[ps>0.975] <- 0.975"' _newline ///
>         `"data <- cbind(data,QAW,Q1W,Q0W,ps,Y,A)"' _newline ///
>         `"write.dta(data, "data2.dta")"'  
  5. qui: file close rcode
  6. 
. // Write bacth file to find R.exe path and R version
. set more off
  7. qui: file close _all
  8. qui: file open bat using setup.bat, write replace
  9. qui: file write bat ///
> `"@echo off"' _newline ///
> `"SET PATHROOT=C:\Program Files\R\"' _newline ///
> `"echo Locating path of R..."' _newline ///
> `"echo."' _newline ///
> `"if not exist "%PATHROOT%" goto:NO_R"' _newline ///
> `"for /f "delims=" %%r in (' dir /b "%PATHROOT%R*" ') do ("' _newline ///
>         `"echo Found %%r"' _newline ///
>         `"echo shell "%PATHROOT%%%r\bin\x64\R.exe" CMD BATCH SLS.R > runr.do"' _newline ///
>         `"echo All set!"' _newline ///  
>         `"goto:DONE"' _newline ///
> `")"' _newline ///
> `":NO_R"' _newline ///
> `"echo R is not installed in your system."' _newline ///
> `"echo."' _newline ///
> `"echo Download it from https://cran.r-project.org/bin/windows/base/"' _newline ///
> `"echo Install it and re-run this script"' _newline ///
> `":DONE"' _newline ///
> `"echo."' _newline ///
> `"pause"'
 10. qui: file close bat
 11. 
. //Run batch
. shell setup.bat 
 12. //Run R 
. do runr.do
 13. 
. // Read Revised Data Back to Stata
. clear
 14. quietly: use "data2.dta", clear
 15. 
. // Q to logit scale
. gen logQAW = log(QAW / (1 - QAW))
 16. gen logQ1W = log(Q1W / (1 - Q1W))
 17. gen logQ0W = log(Q0W / (1 - Q0W))
 18.  
. // Clever covariate HAW
. gen double HAW = (A / ps) - ((1 - A) / (1 - ps))
 19. gen double H1W = 1 / ps
 20. gen double H0W = -1 / (1 - ps)
 21. 
. // Estimation of the substitution parameter (Epsilon)
. qui glm Y HAW, fam(binomial) offset(logQAW) robust noconstant
 22. mat a= e(b)
 23. gen double epsilon = a[1,1]
 24. 
. // Targeted ATE, update from Q̅^0 (A,W) to Q̅^1 (A,W)
. 
. gen double  Qstar = exp(HAW*epsilon + logQAW)/(1 + exp(HAW*epsilon + logQAW))
 25. gen double Q0star = exp(H0W*epsilon + logQ0W)/(1 + exp(H0W*epsilon + logQ0W))
 26. gen double Q1star = exp(H1W*epsilon + logQ1W)/(1 + exp(H1W*epsilon + logQ1W))
 27. 
. global cf = $b - $a
 28. 
. gen double POM1 = cond($flag==1,Q1star,Q1star*($cf),.)
 29. gen double POM0 = cond($flag==1,Q0star,Q0star*($cf),.)
 30. gen double   PO = cond($flag==1,Qstar,Qstar*($cf),.)
 31. 
. gen    WT = HAW
 32. gen    PS = ps
 33. summ   POM1 POM0 WT PS
 34. 
. // Estimating the updated targeted ATE binary outcome
. gen double ATE = cond($flag==1,(Q1star - Q0star),(Q1star - Q0star)*($cf),.)
 35. qui sum ATE
 36. global ATEtmle = r(mean)
 37. 
. // ATE continuous outcome 
. gen double ATEci = (Q1star - Q0star)
 38. qui sum ATEci 
 39. global ATEci = r(mean)
 40. 
. // Relative risk
. qui sum Q1star
 41. global Q1 = r(mean)
 42. qui sum Q0star
 43. global Q0 = r(mean)
 44. global RRtmle = $Q1/$Q0
 45. 
. // Statistical inference ATE 
. gen double IC = cond($flag==1,HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci,(HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci)*$cf,.)
 46. 
. qui sum IC
 47. global var = r(Var)
 48. qui count
 49. global n = r(N)
 50. global varICtmle = $var/$n
 51. 
. global pvalue = cond($flag==1,2*(normalden(abs($ATEci/sqrt($varICtmle)))),2*(normalden(abs($ATEtmle/sqrt($varICtmle)))),.)
 52. 
. global LCIa =  cond($flag==1,$ATEci -1.96*sqrt($varICtmle),$ATEtmle -1.96*sqrt($varICtmle),.)
 53. global UCIa =  cond($flag==1,$ATEci +1.96*sqrt($varICtmle),$ATEtmle +1.96*sqrt($varICtmle),.)
 54. 
. // Statistical inference RR
. gen double ICrr = 1/Q1star*((A/ps)*(Y - QAW) + (Q1W) - Q1star) - 1/Q0star*((1-A)/(1-ps)*(Y - QAW) + (Q0W) - Q0star)
 55. qui sum ICrr
 56. global varr = r(Var)
 57. global varICrr = $varr/$n
 58. 
. global LCIr =  exp(log($RRtmle) - 1.96*sqrt($varICrr))
 59. global UCIr =  exp(log($RRtmle) + 1.96*sqrt($varICrr))
 60. 
. // IC for Additive Risk differences
. gen double ICrd = HAW*(Y - Qstar) + (Q1star - Q0star) - $ATEci
 61. qui sum ICrd
 62. global varrd = r(Var)
 63. global varICrd = $varrd/$n
 64. global LCIard =  $ATEci -1.96*sqrt($varICrd)
 65. global UCIard =  $ATEci +1.96*sqrt($varICrd)
 66. 
. // Display Results 
. local bin  ""ACE (Risk Differences):" %10.4f $ATEtmle _col(5) "; SE:" %10.5f sqrt($varICtmle) _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:("  %5.4f $LCIa ","   %7.4f $UCIa ")""
 67. local cont ""ACE (Additive Effect):" %10.4f $ATEtmle _col(5) "; Estimated Variance:" %10.4f $varICtmle _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:("  %8.2f $LCIa ","  %9.2f $UCIa "
> )""
 68. local contrd  ""ACE (Risk Differences):" %10.4f $ATEci _col(5) "; SE:" %10.5f sqrt($varICrd) _col(5) "; p-value:" %7.4f $pvalue _col(5) "; 95%CI:("  %5.4f $LCIard ","  %7.4f $UCIard ")""
 69. 
. if $flag==1 {
 70. di _newline
 71. di "TMLE: Average Causal Effect" _newline
 72. di `bin'
 73. }
 74. else if $flag!=1{
 75. di _newline
 76. di `cont'
 77. di _newline
 78. di `contrd'
 79. }
 80. 
. local rrbin ""RR:" %9.4f $RRtmle _col(5) "; 95%CI:(" %5.4f $LCIr "," %7.4f $UCIr ")""
 81. if $flag==1 {
 82. di _newline
 83. di "TMLE: Causal Relative Risk" _newline 
 84. display `rrbin'
 85. }
 86. else if $flag!=1{
 87. }
 88. 
. drop ATEci ICrr ICrd logQAW logQ1W logQ0W HAW H1W H0W QAW Q1W Q0W Qstar Q1star Q0star ps Y A epsilon
 89. 
. label var POM1 "Potential Outcome Y(1)"
 90. label var POM0 "Potential Otucome Y(0)"
 91. label var ATE "Average Treatment Effect"
 92. label var WT "Inverse Probability Treatment Weights"
 93. label var IC "Variance ATE"
 94. label var PS "Propensity Score"
 95. 
. // Clean up
. quietly: rm SLS.R
 96. //quietly: rm SLS.Rout
. quietly: rm data2.dta
 97. quietly: rm data.csv
 98. quietly: rm .RData
 99. quietly: rm runr.do
100. quietly: rm setup.bat 
101. end

TUTORIAL: CAUSAL INFERENCE METHODS MADE EASY FOR APPLIED RESEARCHERS/EPIDEMIOLOGISTS/STATISTICIANS CSG-LSHTM, LONDON, 6th December 2017 Miguel Angel Luque Fernandez, PhD Assistant Professor of Epidemiology Cancer Survival Group, LSHTM, London, UK