Zhewei Zhang
The sparcc package can be loaded locally using the devtools package.
## load the package
library(devtools)
load_all()
## other necessary packages
library(dplyr)
library(ggplot2)
library(statmod)The sparcc package contains functions to analyze data with a randomly
right-censored covariate using the SPARCC estimator.
The methods implemented are introduced in the paper, “SPARCC: Semi-Parametric Robust Estimation in a Right-Censored Covariate Model,” which is currently under revision.
The code implemented in this package is specific to two scenarios: 1.
Below is a tutorial for how the SPIRE estimator can be used on a data set with a randomly right-censored covariate.
Using the observed data, we can fit four estimators (CC, IPW, MLE, and
SPIRE) for the parameter of interest
To use the CC estimator, use ‘cc’ function in the ‘spire’ package.
There are six arguments in total for the ‘cc’ function:
-
y: Numeric vector of responses. -
w: Numeric vector of observation of censored covariates$W=\min(C,X)$ . -
delta: 0/1 vector: 1 if$X \leq C$ , 0 otherwise. -
z: Numeric vector (one uncensored covariate) or matrix (multiple uncensored covariates) of covariates$Z$ . -
beta_init: Numeric initial value for${\boldsymbol\beta}$ (length p), which could be any reasonable estimator. -
S_beta_fun: Function: Score function based on the model of$Y|X,Z$ , i.e.,$\partial\log f_{Y|X,Z}(y,x,z;{\boldsymbol\beta})/\partial{\boldsymbol\beta}$ .
The cc function returns a list with five items:
-
coef: the estimated model coefficients (or parameter of interest)$\widehat{\boldsymbol\beta}$ (length$p$ ). -
cov: estimated covariance matrix of$\widehat{\boldsymbol\beta}$ ($p\times p$ matrix). -
se: standard errors of$\widehat{\boldsymbol\beta}$ . -
PSI:$p\times n$ matrix of per-obervation scores at$\widehat{\boldsymbol\beta}$ . -
J_hat:$p\times p$ average Jacobian at$\widehat{\boldsymbol\beta}$ .
We can then use the estimated coefficients and the standard errors to obtain 95% confidence interval. For example,
## obtain cc estimator from a give dataset
res_cc <- spire::cc(y=y, w=w, delta=delta, z=z, beta_init = beta, S_beta_fun = S.beta.f)
## calculate 95% confidence interval
cbind(res_cc$coef - qnorm(0.975)*res_cc$se, res_cc$coef + qnorm(0.975)*res_cc$se)In addition, the PSI and J_hat are used in the test for
noninformative covariate censoring, which we are going to introduce
later.
To use the IPW estimator, use ‘ipw’ function in the ‘spire’ package.
In addition to the six arguments y, w, delta, z, beta_init,
S_beta_fun which are already defined in the documentation of cc
function, there several more arguments in the ipw function:
-
dc_cxz: Optional density$f_{C|X,Z}$ . If provided (and pr_fun not), tail probability$P(C\geq x|x,z)$ isintegrate(dc_cxz, lower = x, upper = upper). -
pr_fun: Optional direct tail-prob functionpr(x, z_row)$= P(C\geq x|x,z)$ . If supplied, it takes precedence overdc_cxz. -
lower,upper: Numeric scalar integration bounds used whendc_cxzis provided. Defaults are-InfandInf; set finite bounds if your support is bounded.
You can plug the true dc_cxz, or use any misspecified
version (or its corresponding tail probability via pr_fun), and the
estimator remains consistent.
The ipw function returns the list with the same five items as cc
funtion.
To use the MLE, use ‘mle’ function in the ‘spire’ package.
In addition to the six arguments y, w, delta, z, beta_init,
S_beta_fun which are already defined in the documentation of cc
function, there several more arguments in the mle function:
-
f_x_cz: Function (x, c, z_row) for density$f_{X|C,Z}$ . Ignored whenmethod = "KM". -
f_y_xz: Function (beta, y_i, x, z_row) for density$f_{Y|X,Z}$ . -
method: “continuous”, “discrete”, or “KM” (default “continuous”). -
m: Integer, number of X grid points (only used for “discrete”). Default 30. -
x_grid_range: numeric(2):c(x_min, x_max)for the grid (only for “discrete”). If NULL, usesmean(w) ± 3*sd(w). -
upper: Numeric, upper limit for x-integrals whenmethod="continuous"(defaultInf). -
h: Positive bandwidth for the Gaussian kernel inmethod="KM". Required ifmethod="KM". -
l: Integer grid length for z1 interpolation inmethod="KM"(default 20).
To explain the method argument more specifically:
-
method="continuous"means you treat$X|C,Z$ as a parametric densityf_x_cz. For censored cases ($\delta=0$ ), the estimating equations integrate over$x\in [w_i,\text{upper}]$ usingf_x_cz. It is used when you have a workable model for$f_{X|C,Z}$ and want full precision.
Requires: f_x_cz, upper.
-
method="discrete"means you approximate the same integrals by a finite grid of$x$ -values. The conditional densityf_x_czsupplies weights on the grid. It is used when numerical integration is inconvenient/unstable, or you prefer fixed-cost sums.
Requires: f_x_cz; optional m, x_grid_range to control the
grid.
-
method="KM"means you use nonparametric working model for$X|Z$ : builds a Kaplan-Meier estimate of the distribution of$X|Z$ . The KM mass function is then used in place off_x_czinside the censored-part scores. It is used when you are going to conduct the test for noninformative covariate censoring and have a$(Z_1, Z_2)$ structure when$Z_1$ is continuous and$Z_2$ is discrete.
Requires: h (bandwidth), l (
The mle function returns the list with the same five items as cc
funtion.
To use the SPIRE, use ‘spire’ function in the ‘spire’ package.
In addition to the arguments y, w, delta, z, beta_init,
S_beta_fun, f_x_cz, f_y_xz, method, m, x_grid_range,
upper, l, which are already defined in the documentation of mle
function, there several more arguments in the spire function:
-
h_z1: Bandwidth for KM weighting over$Z_1$ (required whenmethod="KM"). -
h_x: Bandwidth for smoothing KM-estimator of$f_{X|Z}$ to x_grid (required whenmethod="KM").
The spire function returns the list with the same five items as cc
funtion.
To conduct the test for noninfotmative covariate censoring, please use
the noninfo_test function in the spire package.
In addition to the six arguments y, w, delta, z, beta_init,
S_beta_fun which are already defined in the documentation of cc
function, there several more arguments in the mle function:
mle_args: List of extra arguments that get passed tomlefunction.cc_args: List of extra arguments that get passed toccfunction.ipw_args: List of extra arguments that get passed toipwfunction.spire_args: List of extra arguments that get passed tospirefunction.compare_with: “cc”, “ipw”, or “spire”.Specifies which estimator will be compared against the MLE when testing noninformative covariate censoring. “cc” means comparing CC estimator with MLE, “ipw” means compare IPW estimator with MLE, and “spire” means comparing SPIRE with MLE.
All the simulation studies for the paper accompanying this R package can
all be reproduced using the code in simulations/. Specifically:
-
sim1_run.R: An R script to run simulations comparing methods under misspecification in the normal setting. -
sim2_run.R: An R script to run simulations comparing methods under misspecification in the beta setting. -
sim3_run.R: An R script to run simulations calculating the empirical size and powers of the test across different settings. -
sim4_run.R: An R script to run simulations calculating the empirical power of the test in the beta setting.
As mentioned in the paper accompanying this R package, the beta setting
is the synthetic data of the ENROLL-HD data set. Thus, sim2_run.R and
sim4_run.R provide synthetic data analysis to the ENROLL-HD data set.