--- title: "Nonparametric and Cox-based estimation of average treatment effects in competing risks using 'causalCmprsk' package" author: "Bella Vakulenko-Lagun, Colin Magdamo, Marie-Laure Charpignon, Bang Zheng, Mark Albers, Sudeshna Das" date: "`r Sys.Date()`" output: bookdown::html_vignette2: toc: true number_sections: false base_format: rmarkdown::html_vignette bibliography: references.bib vignette: > %\VignetteIndexEntry{Nonparametric and Cox-based estimation of average treatment effects in competing risks using 'causalCmprsk' package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The `causalCmprsk` package is designed for estimation of average treatment effects (ATE) of point interventions on time-to-event outcomes with $K\geq 1$ competing events. We will illustrate how to use `causalCmprsk` package by analyzing data from an observational study on right heart catheterization (RHC) procedure, collected by @Connors. # Why `causalCmprsk`? *** - It implements both nonparametric estimation (based on the weighted Aalen-Johansen estimator) which does not assume any model for potential outcomes, and Cox-based estimation that relies on the proportional hazards assumption for potential outcomes. - It provides three measures of treatment effect on time-to-event outcomes: **cause-specific hazard ratios** which are time-dependent measures under a nonparametric model, risk-based measures such as **cause-specific risk differences** and **cause-specific risk ratios**, and **restricted-mean-time differences** which quantify how much time on average was lost (or gained) due to treatment by some specified time point. Restricted-mean-time difference is an intuitive summary of treatment effects that translates the treatment effect from the probability (risk) into the time domain and allows to quantify, for example, by how many days or months the treatment speeds up the recovery or postpones death. - It implements different weights representing different target populations. `ATE` and `stab.ATE` weights aim to estimate average treatment effects for the original population. `ATT` weights target average treatment effects for the population of treated, and `ATC` weights correspond to the population of untreated. `overlap` weights aim to estimate average treatment effects for the **overlap population**, that is the population with the largest equipoise regarding the treatment. - It uses Bayesian bootstrap for estimation of uncertainty. Bayesian bootstrap is considered a more stable alternative to the simple bootstrap for time-to-event data. We implemented parallel computing to speed up bootstrap replications. - It can be used for a Randomized Controlled Trials setting when there is no need for emulation of baseline randomization, by using `unadj` option in functions `fit.nonpar` and `fit.cox`. - It has a function `get.numAtRisk` returning the time-varying number-at-risk for both "unadjusted" (raw) data, and "weighted" pseudo-population. The number-at-risk statistic can be used for diagnostics of extremely influential weights. - Competing events can be dependent. - Data can be right-censored. # Analysis of RHC data *** ```{r} pckgs <- c("knitr", "tidyverse", "ggalt", "cobalt", "ggsci", "modEvA", "naniar", "DT", "Hmisc", "hrbrthemes", "summarytools", "survival", "causalCmprsk") # "kableExtra", ``` ```{r cran_packages0, echo=TRUE, include=TRUE, message=FALSE, warning=FALSE} ix=NULL for (package in pckgs) if (!requireNamespace(package, quietly = TRUE)) ix <- c(ix, package) if (length(ix)!=0) { pr <- "WARNING: Packages: " for (i in ix) pr <- paste(pr, i, sep="") pr <- paste(pr, "\nhave to be installed in order to successfully compile the rest of the vignette. Please install these packages and compile again.", sep="") cat(pr) knitr::knit_exit() } ``` ```{r cran_packages, echo=TRUE, include=TRUE, message=FALSE, warning=FALSE} for (package in pckgs) library(package, character.only=TRUE) opts_chunk$set(results = 'asis', # Can also be set at the chunk-level comment = NA, prompt = FALSE, cache = FALSE) summarytools::st_options(plain.ascii = FALSE, # Always use this option in Rmd documents style = "rmarkdown", # Always use this option in Rmd documents footnote = NA, # Makes html-rendered results more concise subtitle.emphasis = FALSE) # Improves layout with some rmardown themes ``` ## Reading, cleaning and preparing data First we read, reformat, reorder and clean the RHC data: ```{r code1, echo=TRUE} Hmisc::getHdata(rhc) # loading data using the Hmisc package rhc_raw <- rhc ``` We will not use variables `adld3p` and `urin1` due to the high percent of missing values: ```{r} miss.report <- rhc_raw %>% miss_var_summary() kable(miss.report[1:10,]) # %>% kable_styling(bootstrap_options = "striped", full_width = F) ``` We create dummy variables by breaking up all categorical covariates into sets of binary indicators. We add the time-to-event variable, `Time`, and the type-of-event variable, `E` which takes `1` for `"discharge alive"` and 2 for `"in-hospital death"`: ```{r} rhc_cleaning1 <- rhc_raw %>% mutate(RHC = as.numeric(swang1 == "RHC"), trt=swang1, E = ifelse(is.na(dthdte), 1, ifelse(dschdte==dthdte, 2, 1)), Time = dschdte - sadmdte, #=time-to-"Discharge" (either alive or due to death in a hospital) T.death = ifelse(is.na(dthdte), lstctdte - sadmdte, dthdte - sadmdte), #=min(Time to death before or after discharge, Time to a last follow-up visit) D = ifelse(death =="No", 0, 1), # death indicator sex_Male = as.numeric(sex == "Male"), race_black = as.numeric(race == "black"), race_other = as.numeric(race == "other"), income_11_25K = as.numeric(income == "$11-$25k"), income_25_50K = as.numeric(income == "$25-$50k"), income_50K = as.numeric(income == "> $50k"), ninsclas_Private_Medicare = as.numeric(ninsclas == "Private & Medicare"), ninsclas_Medicare = as.numeric(ninsclas == "Medicare"), ninsclas_Medicare_Medicaid = as.numeric(ninsclas == "Medicare & Medicaid"), ninsclas_Medicaid = as.numeric(ninsclas == "Medicaid"), ninsclas_No_Insurance = as.numeric(ninsclas == "No insurance"), # combine cat1 with cat2, i.e the primary disease category with the secondary disease category: cat_CHF = as.numeric(cat1 == "CHF" | (!is.na(cat2))&(cat2 == "CHF") ), cat_Cirrhosis = as.numeric(cat1 == "Cirrhosis" | (!is.na(cat2))&(cat2 == "Cirrhosis")), cat_Colon_Cancer = as.numeric(cat1 == "Colon Cancer" | (!is.na(cat2))&(cat2 == "Colon Cancer")), cat_Coma = as.numeric(cat1 == "Coma" | (!is.na(cat2))&(cat2 == "Coma")), cat_COPD = as.numeric(cat1 == "COPD" | (!is.na(cat2))&(cat2 == "COPD")), cat_Lung_Cancer = as.numeric(cat1 == "Lung Cancer" | (!is.na(cat2))&(cat2 == "Lung Cancer")), cat_MOSF_Malignancy = as.numeric(cat1 == "MOSF w/Malignancy" | (!is.na(cat2))&(cat2 == "MOSF w/Malignancy")), cat_MOSF_Sepsis = as.numeric(cat1 == "MOSF w/Sepsis" | (!is.na(cat2))&(cat2 == "MOSF w/Sepsis")), dnr1_Yes = as.numeric(dnr1 == "Yes"), card_Yes = as.numeric(card == "Yes"), gastr_Yes = as.numeric(gastr == "Yes"), hema_Yes = as.numeric(hema == "Yes"), meta_Yes = as.numeric(meta == "Yes"), neuro_Yes = as.numeric(neuro == "Yes"), ortho_Yes = as.numeric(ortho == "Yes"), renal_Yes = as.numeric(renal == "Yes"), resp_Yes = as.numeric(resp == "Yes"), seps_Yes = as.numeric(seps == "Yes"), trauma_Yes = as.numeric(trauma == "Yes"), ca_Yes = as.numeric(ca == "Yes"), ca_Metastatic = as.numeric(ca == "Metastatic") ) # variables selection and data reordering: rhc_full <- rhc_cleaning1 %>% select(ptid, RHC, trt, Time, T.death, E, D, sex_Male, age, edu, race_black, race_other, income_11_25K, income_25_50K, income_50K, ninsclas_Private_Medicare, ninsclas_Medicare, ninsclas_Medicare_Medicaid, ninsclas_Medicaid, ninsclas_No_Insurance, cat_CHF, cat_Cirrhosis, cat_Colon_Cancer, cat_Coma, cat_COPD, cat_Lung_Cancer, cat_MOSF_Malignancy, cat_MOSF_Sepsis, # diagnoses: dnr1_Yes, card_Yes, gastr_Yes, hema_Yes, meta_Yes, neuro_Yes, ortho_Yes, renal_Yes, resp_Yes, seps_Yes, trauma_Yes, ca_Yes, ca_Metastatic, # lab tests: wtkilo1, hrt1, meanbp1, resp1, temp1, aps1, das2d3pc, scoma1, surv2md1, alb1, bili1, crea1, hema1, paco21, pafi1, ph1, pot1, sod1, wblc1, # all variables with "hx" are preexisting conditions: amihx, cardiohx, chfhx, chrpulhx, dementhx, gibledhx, immunhx, liverhx, malighx, psychhx, renalhx, transhx, death, sadmdte, dschdte, dthdte, lstctdte) ``` There is one observation with a missing discharge date but a known date of death. That is its length-of-stay is interval-censored, therefore we will remove this observation from the analysis of the length-of-stay, but will include it in the analysis of 30-day survival. We will add two variables to the `rhc_full` dataset, `T.death` which is the time-to-death censored at 30 days, and a corresponding censoring indicator, `D.30`. ```{r} # omit 1 obs with missing discharge date for the length-of-stay analysis: rhc <- rhc_full[!is.na(rhc_raw$dschdte),] rhc_full$T.death.30 <- pmin(30, rhc_full$T.death) rhc_full$D.30 <- rhc_full$D*ifelse(rhc_full$T.death <=30, 1, 0) ``` In summary, we will use `rhc` dataset in our analysis of length-of-stay, and `rhc_full` dataset in the analysis of 30-day survival. The original data set `rhc_full` has `` `r nrow(rhc_full)` `` observations. Out of them `` `r sum(rhc_full$D)` `` died during the follow-up, and `` `r sum(rhc$E==2)` `` from them died while staying in the hospital. There are `` `r sum(rhc$E==1)` `` patients who discharged alive. The number of events of each type in the length-of-stay analysis is: ```{r, echo=TRUE} E <- as.factor(rhc$E) levels(E) <- c("discharge", "in-hospital death") t <- addmargins(table(E, useNA="no")) kable(t) # %>% kable_styling(bootstrap_options = "striped", full_width = F) ``` The number of deaths over the follow-up: ```{r, echo=TRUE} D <- as.factor(rhc_full$D) levels(D) <- c("censored", "died") t <- addmargins(table(D, useNA="no")) kable(t) # %>% kable_styling(bootstrap_options = "striped", full_width = F) ``` The number of deaths during the first 30 days following the admission to an ICU: ```{r, echo=TRUE} D.30 <- as.factor(rhc_full$D.30) levels(D.30) <- c("censored", "died") t <- addmargins(table(D.30, useNA="no")) kable(t) #%>% kable_styling(bootstrap_options = "striped", full_width = F) ``` The numbers of patients in two treatment arms are: ```{r, echo=TRUE} t <- addmargins(table(rhc_full$trt, useNA="no")) kable(t) #%>% kable_styling(bootstrap_options = "striped", full_width = F) ``` ## Fitting PS model and its diagnostics The previous analyses of RHC data showed that the distributions of covariates differ significantly across the treatment groups, therefore in order to make some causal conclusions we need to balance them. For the PS model, we used a logistic regression that includes the main effects of all available covariates, without any selection. A list of covariates we included in the PS model is the following: ```{r} covs.names <- c("age", "sex_Male", "edu", "race_black", "race_other", "income_11_25K", "income_25_50K", "income_50K", "ninsclas_Private_Medicare", "ninsclas_Medicare", "ninsclas_Medicare_Medicaid", "ninsclas_Medicaid", "ninsclas_No_Insurance", "cat_CHF", "cat_Cirrhosis", "cat_Colon_Cancer", "cat_Coma", "cat_COPD", "cat_Lung_Cancer", "cat_MOSF_Malignancy", "cat_MOSF_Sepsis", "dnr1_Yes", "wtkilo1", "hrt1", "meanbp1", "resp1", "temp1", "card_Yes", "gastr_Yes", "hema_Yes", "meta_Yes", "neuro_Yes", "ortho_Yes", "renal_Yes", "resp_Yes", "seps_Yes", "trauma_Yes", "ca_Yes", "ca_Metastatic", "amihx", "cardiohx", "chfhx", "chrpulhx", "dementhx", "gibledhx", "immunhx", "liverhx", "malighx", "psychhx", "renalhx", "transhx", "aps1", "das2d3pc", "scoma1", "surv2md1", "alb1", "bili1", "crea1", "hema1", "paco21", "pafi1", "ph1", "pot1", "sod1", "wblc1") ``` ### Testing goodness-of-fit of a PS model {#GOF} We use `get.weights` function in order to fit a PS model and estimate weights. The input argument `wtype="stab.ATE"` requests calculation of stabilized ATE weights (@sw_hernan): ```{r} form.txt <- paste0("RHC", " ~ ", paste0(covs.names, collapse = "+")) trt.formula <- as.formula(form.txt) ps.stab.ATE <- get.weights(formula=trt.formula, data=rhc, wtype = "stab.ATE") ``` The estimated propensity scores are saved in the `ps` field of the output list `ps.stab.ATE`, and the `summary.glm` field returns the summary of the fitted logistic regression. The requested weights are saved in the `w` field. We can verify the goodness-of-fit (GOF) of the treatment assignment model both graphically and using the formal test of @HL, where we check whether the model-based predicted probabilities of treatment assignment are similar to the corresponding empirical probabilities: ```{r, fig.align="center", fig.cap="Visual test of goodness-of-fit of a PS model and Hosmer-Lemeshow test.", fig.height=5, fig.width=5} gof <- HLfit(obs=rhc$RHC, pred=ps.stab.ATE$ps, n.bins=50, bin.method = "quantiles", plot.bin.size=FALSE) ``` Ideally, all the points should fall on the diagonal line. In our example, as the Hosmer and Lemeshow's test shows, the logistic regression model with all the main effects does not fit the data well ($pvalue<0.001$), although visually the estimates fluctuate around the reference line. For comparison, if we omit all the covariates that are insignificant at the 0.05 level from the original model, the new model perfectly fits the data: ```{r label=better, fig.align="center", fig.cap="Checking goodness-of-fit of a PS model with selection of significant covariates.", fig.height=5, fig.width=5} ind <- ps.stab.ATE$summary.glm$coefficients[,4]<0.05 form.txt <- paste0("RHC", " ~ ", paste0(covs.names[ind], collapse = "+")) trt.formula.1 <- as.formula(form.txt) ps.stab.ATE.2 <- get.weights(formula=trt.formula.1, data=rhc, wtype = "stab.ATE") gof <- HLfit(obs=rhc$RHC, pred=ps.stab.ATE.2$ps, n.bins=50, bin.method = "quantiles", plot.bin.size=FALSE) ``` However, we notice that: 1. Although the omission of all insignificant covariates resulted in a better fit, using this selective model for weighting can leave the excluded covariates imbalanced, and moreover this might even distort the initially existing balance in them, as we can see further on Figure \@ref(fig:figcompare) which corresponds to the PS model with only significant covariates. The GOF criterion optimizes the loss function which does not necessarily goes along with balancing of important confounders. 2. Another potential "danger" of keeping only strong predictors of the treatment assingment in the PS model is an issue of possible bias amplification - see, for example, @Ding. In summary, a *good* PS model cannot be built based only on the covariates-"treatment assignment" relationship. It should incorporate extra knowledge on the covariates-outcome relationship and other aspects of the data generating processes. In what follows, we will proceed with the PS model that includes all the available covariates. ### Checking overlap {#overlap} In order to see whether there is a non-overlap problem, we compare the distributions of estimated propensity scores in two treatment groups: ```{r, fig.align="center", label=fig9, fig.cap="Checking overlap in PS distributions.", fig.height=5, fig.width=5} rhc.ps <- rhc %>% mutate(ps = ps.stab.ATE$ps, stab.ATE.w = ps.stab.ATE$w) ggplot(rhc.ps, aes(x = ps, fill = trt, color=trt)) + geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.8, 0.8), legend.background=element_rect(fill="transparent")) ``` The plot reveals that the majority of patients who actually did not go through RHC procedure, had low (<0.5) propensities to get it. In general, there is no clear violation of overlap, and both groups appear near both edges of the support. However, this plot is not able to identify the strata responsible for the most extreme propensities. We will address this question [below](#target) by comparing between the overlap population and the original population mixes. ### Estimation of weights and balance diagnostics {#balance} In order to evaluate the attained mean covariate balance after weighting we will look at the absolute standardized differences, ASD (@fan_li_2, @Zaslavsky, @mao_li). For continuous covariates, we will also inspect the weighted marginal distributions. We will compare already estimated `stab.ATE` weights with the overlap weights, estimated by running: ```{r} ps.overlap <- get.weights(formula=trt.formula, data=rhc, wtype = "overlap") rhc.ps <- rhc.ps %>% mutate(overlap.w = ps.overlap$w) ``` A convenient way to summarize all the ASD of three weights (`stab.ATE`, `overlap` and original unadjusted population) is the `love.plot` from the `cobalt` package (@cobalt): ```{r, fig.align="center", label=fig10, fig.width=6.5, fig.height=8, message=FALSE, warning=FALSE, fig.cap="Testing covariate balance of the PS model with all covariates."} #fig.asp=1.1, covs <- subset(rhc.ps, select = covs.names) suppressWarnings(love.plot(RHC ~ covs, data=rhc.ps, weights = list(overlap=rhc.ps$overlap.w, stab.ATE=rhc.ps$stab.ATE.w), threshold = .1, abs=TRUE, binary = "std", stars="raw", s.d.denom="pooled", var.order = "unadjusted", title = "Comparing balancing weights") + scale_fill_npg() + scale_color_npg()) ``` We follow the established convention that "a standardised difference of less than 0.1 can be considered as adequate balance" (@PS_diag). The stabilized ATE weighting resulted in good average balance, and overlap weights estimated from a logistic regression, as expected, yielded exact average balance (@Zaslavsky). For comparison, we inspect the balance attained by [a "better-fitted" PS model](#GOF) which omitted all insignificant covariates: ```{r figcompare, label=figcompare, fig.align="center", fig.width=6.5, fig.height=8, message=FALSE, warning=FALSE, fig.cap="Testing covariate balance of the PS model with selection of significant covariates."} ps.overlap.2 <- get.weights(formula=trt.formula.1, data=rhc, wtype = "overlap") rhc.ps <- rhc.ps %>% mutate(ps.2 = ps.stab.ATE.2$ps, stab.ATE.w.2 = ps.stab.ATE.2$w, overlap.w.2 = ps.overlap.2$w) covs <- subset(rhc.ps, select = covs.names) suppressWarnings(love.plot(RHC ~ covs, data=rhc.ps, weights = list(overlap=rhc.ps$overlap.w.2, stab.ATE=rhc.ps$stab.ATE.w.2), threshold = .1, abs=TRUE, binary = "std", stars="raw", s.d.denom="pooled", var.order = "unadjusted", title = "Comparing balancing weights") + scale_fill_npg() + scale_color_npg()) ``` Our conclusion is that the exclusion of insignificant variables from the PS model might result in disturbing the previously existing balance in some of those variables, without necessarily improving the balance in included covariates. As an illustration of further balance diagnostics in continuous covariates, we chose two continuous variables with the least balance in the original population, and inspected how the weights performed in balancing the whole distribution. For the APACHE score (`aps1` variable): ```{r, fig.align="center", out.width=c('33%', '33%', '34%'), fig.show='hold', message=FALSE, warning=FALSE, fig.cap="Comparing distributions of `aps1` across two treatment arms for: (A) unadjusted, (B) stab.ATE, (C) overlap weights"} rhc.ps$stab.ATE.w.1 <- rhc.ps$overlap.w.1<- rep(NA, nrow(rhc.ps)) rhc.ps$stab.ATE.w.1[rhc.ps$trt=="RHC"] <- rhc.ps$stab.ATE.w[rhc.ps$trt=="RHC"]/sum(rhc.ps$stab.ATE.w[rhc.ps$trt=="RHC"]) rhc.ps$stab.ATE.w.1[rhc.ps$trt=="No RHC"] <- rhc.ps$stab.ATE.w[rhc.ps$trt=="No RHC"]/sum(rhc.ps$stab.ATE.w[rhc.ps$trt=="No RHC"]) rhc.ps$overlap.w.1[rhc.ps$trt=="RHC"] <- rhc.ps$overlap.w[rhc.ps$trt=="RHC"]/sum(rhc.ps$overlap.w[rhc.ps$trt=="RHC"]) rhc.ps$overlap.w.1[rhc.ps$trt=="No RHC"] <- rhc.ps$overlap.w[rhc.ps$trt=="No RHC"]/sum(rhc.ps$overlap.w[rhc.ps$trt=="No RHC"]) ggplot(rhc.ps, aes(x = aps1, fill = trt, color=trt)) + ggtitle("A")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ggplot(rhc.ps, aes(x = aps1, fill = trt, color=trt, weight=stab.ATE.w.1)) + ggtitle("B")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ggplot(rhc.ps, aes(x = aps1, fill = trt, color=trt, weight=overlap.w.1)) + ggtitle("C")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ``` For the mean blood pressure (`meanbp1` variable): ```{r, fig.align="center", out.width=c('33%', '33%', '34%'), message=FALSE, warning=FALSE, fig.cap="Comparing distributions of `meanbp1` across two treatment arms for: (A) unadjusted, (B) stab.ATE, (C) overlap weights", fig.show='hold'} ggplot(rhc.ps, aes(x = meanbp1, fill = trt, color=trt)) + ggtitle("A")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ggplot(rhc.ps, aes(x = meanbp1, fill = trt, color=trt, weight=stab.ATE.w.1)) + ggtitle("B")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ggplot(rhc.ps, aes(x = meanbp1, fill = trt, color=trt, weight=overlap.w.1)) + ggtitle("C")+ geom_density(alpha = 0.5) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.7, 0.7), legend.background=element_rect(fill="transparent")) ``` For both variables, the balance achieved by `stab.ATE` and `overlap` weights looks pretty adequate. ### Characterizing the target population {#target} It is very important to describe the baseline characteristics of the target population for which we are going to estimate the ATE, since it can help us to understand whether a pseudo-population created by weighting is clinically relevant and which randomized trial is best emulated by the chosen weights (@target_pop). In Table \@ref(tab:table1) we summarize the marginal covariate distributions for three populations: 1. original (unweighted) population, 2. the population created by `stab.ATE` weights and 3. the overlap population ```{r, label=table1} d <- rhc.ps[, covs.names] add.names <- function(x, a) { paste(x,a, sep="")} # original population: descr.orig <- round( cbind( apply(d, 2, wtd.mean, weights=rep(1, nrow(d))), sqrt(apply(d, 2, wtd.var, weights=rep(1, nrow(d)))), t(apply(d, 2, wtd.quantile, weights=rep(1, nrow(d)), probs=c(.25, .5, .75))) ), dig=2) descr.ATE <- round( cbind( apply(d, 2, wtd.mean, weights=rhc.ps$stab.ATE.w), sqrt(apply(d, 2, wtd.var, weights=rhc.ps$stab.ATE.w)), t(apply(d, 2, wtd.quantile, weights=rhc.ps$stab.ATE.w, probs=c(.25, .5, .75))) ), dig=2) descr.overlap <- round( cbind( apply(d, 2, wtd.mean, weights=rhc.ps$overlap.w), sqrt(apply(d, 2, wtd.var, weights=rhc.ps$overlap.w)), t(apply(d, 2, wtd.quantile, weights=rhc.ps$overlap.w, probs=c(.25, .5, .75))) ), dig=2) df3 <- data.frame(cbind(descr.orig, descr.ATE, descr.overlap)) row.names(df3) <- covs.names names(df3) <- rep( c("mean", "sd", "q1", "med", "q3"), times=3) kable(df3, caption="Comparing marginal distributions of covariates for three populations. q1, med and q3 correspond to 1st, 2nd and 3rd quartiles of covariates distributions for: (1) the original population; (2) stab.ATE population; (3) overlap population.") #%>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE) %>% # add_header_above(c("covariate "= 1, "original population" = 5, "stab.ATE population" = 5, "overlap population"=5)) %>% # column_spec(2, background = "lightgrey") %>% # column_spec(7, background = "lightgrey") %>% # column_spec(12, background = "lightgrey") %>% # scroll_box(width = "100%", height = "500px") ``` There is not observed any significant difference between the target populations (i.e., ATE and overlap populations), both of which are similar to the original population. The `stab.ATE` weighting is expected to produce a population which is similar to the original population on average, with respect to covariates included in the PS model. In `stab.ATE` weighting no observation is excluded from the analysis, i.e. no weights are close to zero. But there could be possibly extremely influential observations with relatively high weights. Those are the observations that are rarely represented in their assigned treatment group. Figure \@ref(fig:ATEwdistr) shows the distributions of `stab.ATE` weights. ```{r label=ATEwdistr, fig.height=5, fig.width=5, fig.align="center" , message=FALSE, warning=FALSE, fig.cap="Distribution of stab.ATE weights."} ggplot(rhc.ps, aes(x = stab.ATE.w, fill = trt, color=trt)) + geom_histogram(alpha = 0.4, bins=30) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.8, 0.8), legend.background=element_rect(fill="transparent")) + ggtitle("Distribution of stab.ATE weights") ``` Table \@ref(tab:influential) lists observations with `stab.ATE` weights greater than 10, together with their overlap weights. Notice that these observations have the highest overlap weights as well, but since overlap weights are bounded between 0 and 1, their influence in overlap weighting will be diminished. ```{r, label=influential} kable(rhc.ps[rhc.ps$stab.ATE.w>10, c("trt", "stab.ATE.w", "overlap.w", "ps")], caption="The list of most influential observations in the stab.ATE weighting.") ``` In the overlap weighting, the "influential" observations are defined in a different way from the `stab.ATE` weighting. If a person with covariates $C=c$ has high probability of being assigned to his actual treatment group, he would have low probability of appearing in the opposite group, meaning that if he would appear in the opposite group he would be highly influential. That is why, this "potentially influential" person will get a near 0 overlap weight in order to reduce his *potential* influence. Those *potentially influential* observations are different from those *actually influential* observations in `stab.ATE` weighting. Figure \@ref(fig:overlapdistr) shows the distributions of overlap weights. ```{r, label=overlapdistr, fig.height=5, fig.width=5, fig.align="center" , message=FALSE, warning=FALSE, fig.cap="Distribution of overlap weights."} ggplot(rhc.ps, aes(x = overlap.w, fill = trt, color=trt)) + geom_histogram(alpha = 0.4, bins=30) + scale_fill_npg() + scale_color_npg()+ theme(plot.title = element_text(hjust = 0.5), legend.position = c(0.8, 0.8), legend.background=element_rect(fill="transparent")) + ggtitle("Distribution of overlap weights") ``` We will base our final estimates on stabilized ATE weights. ## Estimating the effect of RHC on the length-of-stay {#cmprsk} ### Graphical test of a proportional hazards assumption {#checkPH} We estimate all the parameters using both nonparametric and Cox-based approaches: ```{r} # overlap weights========================: # Nonparametric estimation: res.overlap <- fit.nonpar(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="overlap", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) # The small number of bootstrap replications (nbs.rep=50) was chosen for illustration purposes. # Cox-based estimation: res.cox.overlap <- fit.cox(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="overlap", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) # stab.ATE weights==========================: # Nonparametric estimation: res.stab.ATE <- fit.nonpar(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="stab.ATE", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) # Cox-based estimation: res.cox.stab.ATE <- fit.cox(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="stab.ATE", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) # unadjusted analysis========================: # Nonparametric estimation: res.un <- fit.nonpar(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="unadj", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) # Cox-based estimation: res.cox.un <- fit.cox(df=rhc, X="Time", E="E", trt.formula=trt.formula, wtype="unadj", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=50, seed=17, parallel = FALSE) ``` We aggregate log-ratios of cumulative hazards from nonparametric and Cox-based estimation into one data frame for each type of weights: ```{r echo=TRUE, message=FALSE, warning=FALSE} get.logCumHR <- function(res, res.cox) { df1 <- rbind( data.frame(time=res$time, Outcome=1, CumHazR=res$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR, CIL.CumHazR=res$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.L, CIU.CumHazR=res$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.U), data.frame(time=res$time, Outcome=2, CumHazR=res$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR, CIL.CumHazR=res$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR.CI.L, CIU.CumHazR=res$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR.CI.U)) df2 <- data.frame(time=c(min(res.cox$time), max(res.cox$time)), Outcome=rep(3, 2), CumHazR=rep(res.cox$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR, 2), CIL.CumHazR=rep(NA, 2), CIU.CumHazR=rep(NA, 2)) df3 <- data.frame(time=c(min(res.cox$time), max(res.cox$time)), Outcome=rep(4, 2), CumHazR=rep(res.cox$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR, 2), CIL.CumHazR=rep(NA, 2), CIU.CumHazR=rep(NA, 2)) df <- rbind(df1, df2, df3) df$Outcome <- factor(df$Outcome) levels(df$Outcome) <- c("Discharge-nonpar", "Death-nonpar", paste("Discharge-Cox:", round(res.cox$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR,dig=2), ", 95% CI=[", round(res.cox$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.L,dig=2), ",", round(res.cox$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.U,dig=2), "]", sep=""), paste("Death-Cox:", round(res.cox$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR,dig=2), ", 95% CI=[", round(res.cox$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR.CI.L,dig=2), ",", round(res.cox$trt.eff[[paste("Ev=", 2, sep="")]]$log.CumHazR.CI.U,dig=2), "]", sep="")) return(df) } logCumHR.overlap <- get.logCumHR(res.overlap, res.cox.overlap) logCumHR.stab.ATE <- get.logCumHR(res.stab.ATE, res.cox.stab.ATE) logCumHR.unadj <- get.logCumHR(res.un, res.cox.un) ``` A standard way to test a proportional hazards assumption is by plotting the *log-ratio*, $\log \frac{H_k^1(t)}{H_k^0(t)}$, $k=1,2$, versus $t$, with $H_k^a(t)$, $a=0,1$ estimated nonparametrically. Under the Cox proportional hazards model, this *log-ratio* equals $\beta$ that does not depend on time: ```{r echo=TRUE, fig.align="center", fig.show='hold', fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="log(Cum.HR(t)) of RHC vs No-RHC for discharge and in-hospital death."} plot.logCumHazR <- function(df, title) { p <- ggplot(df, aes(x=time, y=CumHazR, color=Outcome, fill=Outcome, shape=Outcome)) + ggtitle(title) + geom_step(size=1.1) + geom_ribbon(aes(ymin=CIL.CumHazR, ymax=CIU.CumHazR), alpha=0.2, stat="stepribbon") + scale_fill_npg() + scale_color_npg() p <- p + xlab("time from admission to ICU (days)") + ylab("log.CumHR(t)") + theme(axis.text.x = element_text(face="bold", angle=45), axis.text.y = element_text(face="bold"), plot.title = element_text(hjust = 0.5))+ theme(legend.position = c(0.5, 0.3), legend.background=element_rect(fill="transparent"), panel.grid.minor = element_line(size = .5,colour = "gray92"), panel.grid.major = element_line(size = .5,colour = "#C0C0C0")) + geom_vline(xintercept=30, linetype="dashed") p } plot.logCumHazR(df=logCumHR.overlap, title="Overlap weights") plot.logCumHazR(df=logCumHR.stab.ATE, title="Stabilized ATE weights") plot.logCumHazR(df=logCumHR.unadj, title="Unadjusted") ``` We will focus on the stabilized ATE weighting analysis. ### Estimating cause-specific hazards, risk and RMT #### Comparing cause-specific hazards over 3 populations We aggregate hazards obtained from three weighting schemes, overlap, stab.ATE and unadjusted, into one data frame and compare the absolute cumulative hazards for three populations: ```{r echo=TRUE, message=FALSE, warning=FALSE} get.CumHaz <- function(res1, res2, res3, Ev) { df <- rbind( data.frame(time=res1$time, population=1, TRT=1, CumHaz=res1$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res1$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res1$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U), data.frame(time=res2$time, population=2, TRT=1, CumHaz=res2$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res2$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res2$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U), data.frame(time=res3$time, population=3, TRT=1, CumHaz=res3$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res3$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res3$trt.1[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U), data.frame(time=res1$time, population=1, TRT=0, CumHaz=res1$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res1$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res1$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U), data.frame(time=res2$time, population=2, TRT=0, CumHaz=res2$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res2$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res2$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U), data.frame(time=res3$time, population=3, TRT=0, CumHaz=res3$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz, CIL.CumHaz=res3$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.L, CIU.CumHaz=res3$trt.0[[paste("Ev=", Ev, sep="")]]$CumHaz.CI.U) ) df$population <- factor(df$population) levels(df$population) <- c("overlap", "stab.ATE", "unadjusted") df } df.CumHaz <- list() E.set <- sort(unique(rhc$E)) E.set <- E.set[E.set!=0] # 0 is for censoring status for (k in E.set) df.CumHaz[[k]] <- get.CumHaz(res.overlap, res.stab.ATE, res.un, k) plot.CumHaz <- function(df, title, ymax) { p <- ggplot(df, aes(x=time, y=CumHaz, color=population, fill=population, shape=population)) + ggtitle(title) + geom_step(size=1.1) + # geom_ribbon(aes(ymin=CIL.CumHaz, ymax=CIU.CumHaz), alpha=0.2, stat="stepribbon") + scale_fill_npg() + scale_color_npg() p <- p + xlab("time from admission to ICU (days)") + ylab("Cumulative Hazard (t)") + ylim(0, ymax)+ theme(axis.text.x = element_text(face="bold", angle=45), axis.text.y = element_text(face="bold"), plot.title = element_text(hjust = 0.5))+ theme(legend.position = c(0.7, 0.3), legend.background=element_rect(fill="transparent"), panel.grid.minor = element_line(size = .5,colour = "gray92"), panel.grid.major = element_line(size = .5,colour = "#C0C0C0")) + geom_vline(xintercept=30, linetype="dashed") p } ``` In the following plot we compare the absolute cumulative hazards across three populations: ```{r echo=TRUE, out.width=c('45%', '45%'), fig.width=4.5, fig.height=4.5, fig.align="center" , message=FALSE, warning=FALSE, fig.cap="Cumulative hazards for discharge and in-hospital death. Comparison of 3 target populations.", fig.show='hold'} plot.CumHaz(df=df.CumHaz[[1]][df.CumHaz[[1]]$TRT==1,], title="RHC: CumHaz of Discharge", ymax=6) plot.CumHaz(df=df.CumHaz[[1]][df.CumHaz[[1]]$TRT==0,], title="No-RHC: CumHaz of Discharge", ymax=6) plot.CumHaz(df=df.CumHaz[[2]][df.CumHaz[[2]]$TRT==1,], title="RHC: CumHaz of Death", ymax=2.5) plot.CumHaz(df=df.CumHaz[[2]][df.CumHaz[[2]]$TRT==0,], title="No-RHC: CumHaz of Death", ymax=2.5) ``` #### Cause-specific cumulative hazards for stabilized ATE weights ```{r label=CumHaz4, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Cause-specific cumulative hazards.", fig.show='hold'} df <- rbind(summary(res.stab.ATE, event=1, estimand="CumHaz"), summary(res.stab.ATE, event=2, estimand="CumHaz")) df$Event_TRT <- factor(2*(df$Event==2) + 1*(df$TRT==0)) levels(df$Event_TRT) <- c("Discharge-RHC", "Discharge-No RHC", "In-hospital death-RHC", "In-hospital death-No RHC") ggplot(df, aes(x=time, y=CumHaz, color=Event_TRT, fill=Event_TRT, shape=Event_TRT, linetype=Event_TRT)) + geom_step(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.CumHaz, ymax=CIU.CumHaz), alpha=0.2, stat="stepribbon") + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylab("Cumulative hazard of event by time t") + theme(legend.position = c(0.41, 0.83), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` #### Risk curves or cumulative incidence functions (CIF) for stabilized ATE weights ```{r label=Risk4, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Risk curves", fig.show='hold'} df <- rbind(summary(res.stab.ATE, event=1, estimand="CIF"), summary(res.stab.ATE, event=2, estimand="CIF")) df$Event_TRT <- factor(2*(df$Event==2) + 1*(df$TRT==0)) levels(df$Event_TRT) <- c("Discharge-RHC", "Discharge-No RHC", "In-hospital death-RHC", "In-hospital death-No RHC") ggplot(df, aes(x=time, y=CIF, color=Event_TRT, fill=Event_TRT, shape=Event_TRT, linetype=Event_TRT)) + geom_step(linewidth=1.1) + xlim(0, 200) + geom_ribbon(aes(ymin=CIL.CIF, ymax=CIU.CIF), alpha=0.2, stat="stepribbon") + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylab("Probability of event by time t") + theme(legend.position = c(0.7, 0.25), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` #### RMT for stabilized ATE weights An alternative way to summarize risk curves over time is to estimate the area under a risk curve. For a "good" outcome, e.g., discharge, the larger the area, the better the treatment. And, the opposite holds for a "bad" outcome, e.g. death, for which the smaller the area is, the better. ```{r label=RMT4, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Restricted-mean-time-lost/gained due to specific event.", fig.show='hold'} df <- rbind(summary(res.stab.ATE, event=1, estimand="RMT"), summary(res.stab.ATE, event=2, estimand="RMT")) df$Event_TRT <- factor(2*(df$Event==2) + 1*(df$TRT==0)) levels(df$Event_TRT) <- c("Discharge-RHC", "Discharge-No RHC", "In-hospital death-RHC", "In-hospital death-No RHC") ggplot(df, aes(x=time, y=RMT, color=Event_TRT, fill=Event_TRT, shape=Event_TRT, linetype=Event_TRT)) + geom_line(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.RMT, ymax=CIU.RMT), alpha=0.2) + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylab("average time lost due to death (gained by recovery)") + theme(legend.position = c(0.35, 0.8), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` ### Estimating treatment effects for stabilized ATE weights ```{r label=fig16, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Risk differences for discharge and in-hospital death.", fig.show='hold'} df <- rbind(summary(res.stab.ATE, event=1, estimand="RD"), summary(res.stab.ATE, event=2, estimand="RD")) df$Event <- as.factor(df$Event) levels(df$Event) <- c("Discharge", "In-hospital death") ggplot(df, aes(x=time, y=RD, color=Event, fill=Event, shape=Event, linetype=Event)) + geom_step(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.RD, ymax=CIU.RD), alpha=0.2, stat="stepribbon") + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylab("Risk difference") + xlim(0, 200) + theme(legend.position = c(0.7, 0.6), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` ```{r echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Difference in average time lost/gained due to treatment.", fig.show='hold'} df <- rbind(summary(res.stab.ATE, event=1, estimand="ATE.RMT"), summary(res.stab.ATE, event=2, estimand="ATE.RMT")) df$Event <- as.factor(df$Event) levels(df$Event) <- c("Discharge", "In-hospital death") ggplot(df, aes(x=time, y=ATE.RMT, color=Event, fill=Event, shape=Event, linetype=Event)) + geom_line(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.ATE.RMT, ymax=CIU.ATE.RMT), alpha=0.2) + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylab("average time (days) lost/gained due to treatment") + theme(legend.position = c(0.7, 0.5), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` ### Estimating treatment effect on the composite Length-of-Stay (LOS) for stabilized ATE weights From the cause-specific parameters we can derive an overall effect of RHC on the composite LOS. Here we focus on the the difference between the cumulative distribution functions of the LOS in two treatment arms: ```{r echo=TRUE, fig.align="center", fig.width=5, fig.height=5, message=FALSE, warning=FALSE, fig.cap="Cumulative distribution of length-of-stay in RHC vs No-RHC groups."} alpha=0.05 # trt.0: bs.CumPr.0 <- 1-exp(-res.stab.ATE$trt.0[[paste("Ev=", 1, sep="")]]$bs.CumHaz - res.stab.ATE$trt.0[[paste("Ev=", 2, sep="")]]$bs.CumHaz) CumPr.CIL.0 <- apply(bs.CumPr.0, 2, quantile, prob=alpha/2, na.rm=TRUE) CumPr.CIU.0 <- apply(bs.CumPr.0, 2, quantile, prob=1-alpha/2, na.rm=TRUE) # trt.1: bs.CumPr.1 <- 1-exp(-res.stab.ATE$trt.1[[paste("Ev=", 1, sep="")]]$bs.CumHaz - res.stab.ATE$trt.1[[paste("Ev=", 2, sep="")]]$bs.CumHaz) CumPr.CIL.1 <- apply(bs.CumPr.1, 2, quantile, prob=alpha/2, na.rm=TRUE) CumPr.CIU.1 <- apply(bs.CumPr.1, 2, quantile, prob=1-alpha/2, na.rm=TRUE) df <- rbind(data.frame(time=res.stab.ATE$time, TRT=1, CumPr=1-exp(-res.stab.ATE$trt.1[[paste("Ev=", 1, sep="")]]$CumHaz - res.stab.ATE$trt.1[[paste("Ev=", 2, sep="")]]$CumHaz), CumPr.CIL=CumPr.CIL.1, CumPr.CIU=CumPr.CIU.1), data.frame(time=res.stab.ATE$time, TRT=2, CumPr=1-exp(-res.stab.ATE$trt.0[[paste("Ev=", 1, sep="")]]$CumHaz - res.stab.ATE$trt.0[[paste("Ev=", 2, sep="")]]$CumHaz), CumPr.CIL=CumPr.CIL.0, CumPr.CIU=CumPr.CIU.0)) df$TRT <- as.factor(df$TRT) levels(df$TRT) <- c("RHC", "No-RHC") ggplot(df, aes(x=time, y=CumPr, color=TRT, fill=TRT, shape=TRT)) + geom_step(linewidth=1.1) + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlim(0,100) + xlab("time (days)") + ylab("Cumulative distribution of length-of-stay") + geom_ribbon(aes(ymin=CumPr.CIL, ymax=CumPr.CIU), alpha=0.2, stat="stepribbon")+ theme(axis.text.x = element_text(face="bold", angle=45), axis.text.y = element_text(face="bold"), plot.title = element_text(hjust = 0.5))+ theme(legend.position = c(0.6, 0.3), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent"), legend.title=element_blank())+ geom_vline(xintercept=30, linetype="dashed") ``` ## Estimating the effect of RHC on 30-day survival for stabilized ATE weights {#survival30} Here we would really like to know whether the RHC procedure improves short-term 30-day survival, either death happens in the hospital or at home. Our time outcome is the time-to-death, which might be censored at day 30. Below we illustrate how to use `causalCmprsk` in survival data with only one time-to-event outcome. ### Graphical test of a proportional hazards assumption ```{r} # nonparametric: res.30 <- fit.nonpar(df=rhc_full, X="T.death.30", E="D.30", trt.formula=trt.formula, wtype="stab.ATE", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=80, seed=17, parallel = FALSE) # Cox-based estimation: res.cox.30 <- fit.cox(df=rhc_full, X="T.death.30", E="D.30", trt.formula=trt.formula, wtype="stab.ATE", cens=0, conf.level=0.95, bs=TRUE, nbs.rep=80, seed=17, parallel = FALSE) ``` ```{r echo=TRUE, fig.align="center", fig.width=5, fig.height=5, message=FALSE, warning=FALSE, fig.cap="log-ratio of cumulative hazards for 30-day mortality, using stabilized ATE weighting."} df1.30 <- cbind(summary(res.30, event=1, estimand="logHR") %>% select(time, logCumHazR, CIL.logCumHazR, CIU.logCumHazR), Est.method=1) HR.30 <- res.cox.30$trt.eff[[paste("Ev=", 1, sep="")]][c("log.CumHazR", "log.CumHazR.CI.L", "log.CumHazR.CI.U")] df2.30 <- data.frame(time=c(min(res.cox.30$time), max(res.cox.30$time)), logCumHazR=HR.30[[1]], CIL.logCumHazR=NA, CIU.logCumHazR=NA, Est.method=2) df.30 <- rbind(df1.30, df2.30) df.30$Est.method <- factor(df.30$Est.method) levels(df.30$Est.method) <- c("nonpar", paste("Cox: ", round(res.cox.30$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR,dig=2), ", 95% CI=[", round(res.cox.30$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.L,dig=2), ",", round(res.cox.30$trt.eff[[paste("Ev=", 1, sep="")]]$log.CumHazR.CI.U,dig=2), "]", sep="")) ggplot(df.30, aes(x=time, y=logCumHazR, color=Est.method, fill=Est.method, shape=Est.method)) + geom_step(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.logCumHazR, ymax=CIU.logCumHazR), alpha=0.2, stat="stepribbon") + scale_color_brewer(palette = "Set1") + scale_fill_brewer(palette = "Set1") + xlab("time from admission to ICU (days)") + ylim(-0.1, 0.55)+ ylab("log-ratio of CumHaz(t)") + theme(axis.text.x = element_text(face="bold", angle=45), axis.text.y = element_text(face="bold"), plot.title = element_text(hjust = 0.5)) + theme(legend.position = c(0.6, 0.7), legend.title=element_blank(), legend.text=element_text(size=12), legend.background=element_rect(fill="transparent")) ``` ### Estimating risk difference ```{r label=fig162, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Risk difference for 30-day mortality.", fig.show='hold'} ggplot(summary(res.30, event=1, estimand="RD"), aes(x=time, y=RD)) + geom_step(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.RD, ymax=CIU.RD), alpha=0.2, stat="stepribbon") + xlab("time from admission to ICU (days)") + ylab("Risk difference") est30 <- get.pointEst(res.30, 30) cat("30-day risk difference=", round((est30$trt.eff[[1]]$RD), dig=4), ", 95% CI=[", round((est30$trt.eff[[1]]$RD.CI.L), dig=4), ", ", round((est30$trt.eff[[1]]$RD.CI.U), dig=4), "]\n", sep="") ``` ### Estimating risk ratio ```{r label=fig163, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Risk ratio for 30-day mortality.", fig.show='hold'} ggplot(summary(res.30, event=1, estimand="RR"), aes(x=time, y=RR)) + geom_step(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.RR, ymax=CIU.RR), alpha=0.2, stat="stepribbon") + xlab("time from admission to ICU (days)") + ylab("Risk ratio") cat("30-day risk ratio=", round((est30$trt.eff[[1]]$RR), dig=4), ", 95% CI=[", round((est30$trt.eff[[1]]$RR.CI.L), dig=4), ", ", round((est30$trt.eff[[1]]$RR.CI.U), dig=4), "]\n", sep="") ``` ### Estimating difference in RMT The above risk difference can be translated into the domain of **the average days lost** as a result from undergoing RHC: ```{r , label=figRMT30, echo=TRUE, fig.align="center", fig.width=4.5, fig.height=4.5, message=FALSE, warning=FALSE, fig.cap="Difference in average time lost due to RHC in the 30-day mortality.", fig.show='hold'} ggplot(summary(res.30, event=1, estimand="ATE.RMT"), aes(x=time, y=ATE.RMT)) + geom_line(linewidth=1.1) + geom_ribbon(aes(ymin=CIL.ATE.RMT, ymax=CIU.ATE.RMT), alpha=0.2) + xlab("time from admission to ICU (days)") + ylab("RMT difference (days)") cat("30-day RMT difference=", round((est30$trt.eff[[1]]$ATE.RMT), dig=4), ", 95% CI=[", round((est30$trt.eff[[1]]$ATE.RMT.CI.L), dig=4), ", ", round((est30$trt.eff[[1]]$ATE.RMT.CI.U), dig=4), "]\n", sep="") ``` # References ***