

censboot(boot)                               R Documentation

_B_o_o_t_s_t_r_a_p _f_o_r _C_e_n_s_o_r_e_d _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n_:

     This function applies types of bootstrap resampling
     which have been suggested to deal with right-censored
     data.  It can also do model-based resampling using a
     Cox regression model.

_U_s_a_g_e_:

     censboot(data, statistic, R, F.surv, G.surv, strata=matrix(1,n,2),
          sim="ordinary", cox=NULL, index=c(1, 2), ...)

_A_r_g_u_m_e_n_t_s_:

    data: The data frame or matrix containing the data.  It
          must have at least two columns, one of which con-
          tains the times and the other the censoring indi-
          cators.  It is allowed to have as many other
          columns as desired (although efficiency is reduced
          for large numbers of columns) except for
          `sim="weird"' when it should only have two columns
          - the times and censoring indicators.  The columns
          of `data' referenced by the components of `index'
          are taken to be the times and censoring indica-
          tors.

statistic: A function which operates on the data frame and
          returns the required statistic.  Its first argu-
          ment must be the data. Any other arguments that it
          requires can be passed using the `...{}' argument.
          In the case of `sim="weird"', the data passed to
          `statistic' only contains the times and censoring
          indicator regardless of the actual number of
          columns in `data'. In all other cases the data
          passed to statistic will be of the same form as
          the original data.  When `sim="weird"', the actual
          number of observations in the resampled data sets
          may not be the same as the number in `data'.  For
          this reason, if `sim="weird"' and `strata' is sup-
          plied, `statistic' should also take a numeric vec-
          tor indicating the strata.  This allows the
          statistic to depend on the strata if required.

       R: The number of bootstrap replicates.

  F.surv: An object returned from a call to `survfit' giving
          the survivor function for the data. This is a
          required argument unless `sim="ordinary"' or
          `sim="model"' and `cox' is missing.

  G.surv: Another object returned from a call to `survfit'
          but with the censoring indicators reversed to give
          the product-limit estimate of the censoring dis-
          tribution.  Note that for consistency the uncen-
          sored times should be reduced by a small amount in
          the call to `survfit'.  This is a required argu-
          ment whenever `sim="cond"' or when `sim="model"'
          and `cox' is supplied.

  strata: The strata used in the calls to `survfit'.  It can
          be a vector or a matrix with 2 columns.  If it is
          a vector then it is assumed to be the strata for
          the survival distribution, and the censoring dis-
          tribution is assumed to be the same for all obser-
          vations.  If it is a matrix then the first column
          is the strata for the survival distribution and
          the second is the strata for the censoring distri-
          bution.  When `sim="weird"' only the strata for
          the survival distribution are used since the cen-
          soring times are considered fixed.  When
          `sim="ordinary"', only one set of strata is used
          to stratify the observations, this is taken to be
          the first column of `strata' when it is a matrix.

     sim: The simulation type.  Possible types are `"ordi-
          nary"' (case resampling), `"model"' (equivalent to
          `"ordinary"' if `cox' is missing, otherwise it is
          model based resampling), `"weird"' (the weird
          bootstrap - this cannot be used if `cox' is sup-
          plied), and `"cond"' (the conditional bootstrap,
          in which censoring times are resampled from the
          conditional censoring distribution).

     cox: An object returned from `coxph'.  If it is sup-
          plied, then `F.surv' should have been generated by
          a call of the form `survfit(cox)'.

   index: A vector of length two giving the positions of the
          columns in `data' which correspond to the times
          and censoring indicators respectively.

     ...: Any other arguments which are passed to `statis-
          tic'.

_D_e_t_a_i_l_s_:

     The various types of resampling are described in Davi-
     son and Hinkley (1997) in sections 3.5 and 7.3.  The
     simplest is case resampling which simply resamples with
     replacement from the observations.

     The conditional bootstrap simulates failure times from
     the estimate of the survival distribution.  Then, for
     each observation its simulated censoring time is equal
     to the observed censoring time if the observation was
     censored and generated from the estimated censoring
     distribution conditional on being greater than the
     observed failure time if the observation was uncen-
     sored.  If the largest value is censored then it is
     given a nominal failure time of `Inf' and conversely if
     it is uncensored it is given a nominal censoring time
     of `Inf'.  This is necessary to allow the largest
     observation to be in the resamples.

     If a Cox regression model is fitted to the data and
     supplied, then the failure times are generated from the
     survival distribution using that model.  In this case
     the censoring times can either be simulated from the
     estimated censoring distribution (`sim="model"') or
     from the conditional censoring distribution as in the
     previous paragraph (`sim="cond"').

     The weird bootstrap holds the censored observations as
     fixed and also the observed failure times.  It then
     generates the number of events at each failure time
     using a binomial distribution with mean 1 and denomina-
     tor the number of failures that could have occurred at
     that time in the original data set.  In our implementa-
     tion we insist that there is a least one simulated
     event in each stratum for every bootstrap dataset.

     When there are strata involved and `sim' is either
     `"model"' or `"cond"' the situation becomes more diffi-
     cult.  Since the strata for the survival and censoring
     distributions are not the same it is possible that for
     some observations both the simulated failure time and
     the simulated censoring time are infinite.  To see this
     consider an observation in stratum 1F for the survival
     distribution and stratum 1G for the censoring distribu-
     tion.  Now if the largest value in stratum 1F is cen-
     sored it is given a nominal failure time of `Inf', also
     if the largest value in stratum 1G is uncensored it is
     given a nominal censoring time of `Inf' and so both the
     simulated failure and censoring times could be infi-
     nite.  When this happens the simulated value is consid-
     ered to be a failure at the time of the largest
     observed failure time in the stratum for the survival
     distribution.

_V_a_l_u_e_:

     An object of class `"boot"' containing the following
     components:

      t0: The value of `statistic' when applied to the orig-
          inal data.

       t: A matrix of bootstrap replicates of the values of
          `statistic'.

       R: The number of bootstrap replicates performed.

     sim: The simulation type used.  This will usually be
          the input value of `sim' unless that was `"model"'
          but `cox' was not supplied, in which case it will
          be `"ordinary"'.

    data: The data used for the bootstrap. This will gener-
          ally be the input value of `data' unless
          `sim="weird"', in which case it will just be the
          columns containing the times and the censoring
          indicators.

    seed: The value of `.Random.seed' when `censboot' was
          called.

statistic: The input value of `statistic'.

  strata: The strata used in the resampling.  When
          `sim="ordinary"' this will be a vector which
          stratifies the observations, when `sim="weird"' it
          is the strata for the survival distribution and in
          all other cases it is a matrix containing the
          strata for the survival distribution and the cen-
          soring distribution.

    call: The original call to `censboot'.

_R_e_f_e_r_e_n_c_e_s_:

     Andersen, P.K., Borgan, O., Gill, R.D. and Keiding, N.
     (1993) Statistical Models Based on Counting Processes.
     Springer-Verlag.

     Burr, D. (1994) A comparison of certain bootstrap con-
     fidence intervals in the Cox model. Journal of the
     American Statistical Association, 89, 1290-1302.

     Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Meth-
     ods and Their Application. Cambridge University Press.

     Efron, B. (1981) Censored data and the bootstrap.
     Journal of the  American Statistical Association, 76,
     312-319.

     Hjort, N.L. (1985) Bootstrapping Cox's regression
     model. Technical report NSF-241, Dept. of Statistics,
     Stanford University.

_S_e_e _A_l_s_o_:

     `boot', `boot.object', `coxph', `survfit'

_E_x_a_m_p_l_e_s_:

     library(survival4)
     # Example 3.9 of Davison and Hinkley (1997) does a bootstrap on some
     # remission times for patients with a type of leukaemia.  The patients
     # were divided into those who received maintenance chemotherapy and
     # those who did not.  Here we are interested in the median remission
     # time for the two groups.
     aml.fun <- function(data) {
          surv <- survfit(Surv(time, cens)~group, data=data)
          out <- NULL
          st <- 1
          for (s in 1:length(surv$strata)) {
               inds <- st:(st+surv$strata[s]-1)
               md <- min(surv$time[inds[1-surv$surv[inds]>=0.5]])
               st <- st+surv$strata[s]
               out <- c(out,md)
          }
     }
     data(aml)
     aml.case <- censboot(aml,aml.fun,R=499,strata=aml$group)

     # Now we will look at the same statistic using the conditional
     # bootstrap and the weird bootstrap.  For the conditional bootstrap
     # the survival distribution is stratified but the censoring
     # distribution is not.

     aml.s1 <- survfit(Surv(time,cens)~group, data=aml)
     aml.s2 <- survfit(Surv(time-0.001*cens,1-cens)~1, data=aml)
     aml.cond <- censboot(aml,aml.fun,R=499,strata=aml$group,
          F.surv=aml.s1,G.surv=aml.s2,sim="cond")

     # For the weird bootstrap we must redefine our function slightly since
     # the data will not contain the group number.
     aml.fun1 <- function(data,str) {
          surv <- survfit(Surv(data[,1],data[,2])~str)
          out <- NULL
          st <- 1
          for (s in 1:length(surv$strata)) {
               inds <- st:(st+surv$strata[s]-1)
               md <- min(surv$time[inds[1-surv$surv[inds]>=0.5]])
               st <- st+surv$strata[s]
               out <- c(out,md)
          }
     }
     aml.wei <- censboot(cbind(aml$time,aml$cens),aml.fun1,R=499,
          strata=aml$group, F.surv=aml.s1,sim="weird")

     # Now for an example where a cox regression model has been fitted
     # the data we will look at the melanoma data of Example 7.6 from
     # Davison and Hinkley (1997).  The fitted model assumes that there
     # is a different survival distribution for the ulcerated and
     # non-ulcerated groups but that the thickness of the tumour has a
     # common effect.  We will also assume that the censoring distribution
     # is different in different age groups.  The statistic of interest
     # is the linear predictor.  This is returned as the values at a
     # number of equally spaced points in the range of interest.
     data(melanoma)
     library(splines)
     library(modreg) # for smooth.spline
     mel.cox <- coxph(Surv(time,status==1)~ns(thickness,df=4)+strata(ulcer),
          data=melanoma)
     mel.surv <- survfit(mel.cox)
     agec <- cut(melanoma$age,c(0,39,49,59,69,100))
     mel.cens <- survfit(Surv(time-0.001*(status==1),status!=1)~
          strata(agec),data=melanoma)
     mel.fun <- function(d) {
          t1 <- ns(d$thickness,df=4)
          cox <- coxph(Surv(d$time,d$status==1) ~ t1+strata(d$ulcer))
          eta <- unique(cox$linear.predictors)
          u <- unique(d$thickness)
          sp <- smooth.spline(u,eta,df=20)
          th <- seq(from=0.25,to=10,by=0.25)
          predict.smooth.spline(sp,th)$y
     }
     mel.str<-cbind(melanoma$ulcer,agec)
     # this is slow!
     mel.mod <- censboot(melanoma,mel.fun,R=999,F.surv=mel.surv,
          G.surv=mel.cens,cox=mel.cox,strata=mel.str,sim="model")
     proc.time()
     # To plot the original predictor and a 95% pointwise envelope for it
     mel.env <- envelope(mel.mod)$point
     plot(seq(0.25,10,by=0.25),mel.env[1,], ylim=c(-2,2),
          xlab="thickness (mm)", ylab="linear predictor",type="n")
     lines(seq(0.25,10,by=0.25),mel.env[1,],lty=2)
     lines(seq(0.25,10,by=0.25),mel.env[2,],lty=2)
     lines(seq(0.25,10,by=0.25),mel.mod$t0,lty=1)

