

survfit(survival5)                           R Documentation

_C_o_m_p_u_t_e _a _S_u_r_v_i_v_a_l _C_u_r_v_e _f_o_r _C_e_n_s_o_r_e_d _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n_:

     Computes an estimate of a survival curve for censored
     data using either the Kaplan-Meier or the Fleming-Har-
     rington method or computes the predicted survivor func-
     tion for a Cox proportional hazards model.

_U_s_a_g_e_:

     survfit(formula, data=sys.parent(), weights, subset, na.action,
             newdata, individual=F, conf.int=.95, se.fit=T,
             type=c("kaplan-meier","fleming-harrington", "fh2"),
             error=c("greenwood","tsiatis"),
             conf.type=c("log","log-log","plain","none"),
             conf.lower=c("usual", "peto", "modified")

_A_r_g_u_m_e_n_t_s_:

 formula: A formula object or a `coxph' object.  If a for-
          mula object is supplied it must have a `Surv'
          object as the response on the left of the `~'
          operator and, if desired, terms separated by +
          operators on the right.  One of the terms may be a
          `strata' object.  For a single survival curve the
          `"~ 1"' part of the formula is not required.

    data: a data frame in which to interpret the variables
          named in the formula, or in the `subset' and the
          `weights' argument.

 weights: The weights must be nonnegative and it is strongly
          recommended that they be strictly positive, since
          zero weights are ambiguous, compared to use of the
          `subset' argument.

  subset: expression saying that only a subset of the rows
          of the data should be used in the fit.

na.action: a missing-data filter function, applied to the
          model frame, after any `subset' argument has been
          used.  Default is `options()$na.action'.

 newdata: a data frame with the same variable names as those
          that appear in the `coxph' formula.  Only applica-
          ble when `formula' is a `coxph' object.  The
          curve(s) produced will be representative of a
          cohort who's covariates correspond to the values
          in `newdata'.  Default is the mean of the covari-
          ates used in the `coxph' fit.

individual: a logical value indicating whether the data
          frame represents different time epochs for only
          one individual (T), or whether multiple rows indi-
          cate multiple individuals (F, the default).  If
          the former only one curve will be produced; if the
          latter there will be one curve per row in `new-
          data'.

conf.int: the level for a two-sided confidence interval on
          the survival curve(s).  Default is 0.95.

  se.fit: a logical value indicating whether standard errors
          should be computed.  Default is `TRUE'.

    type: a character string specifying the type of survival
          curve.  Possible values are `"kaplan-meier"',
          `"fleming-harrington"' or `"fh2"' if a formula is
          given and `"aalen"' or `"kaplan-meier"' if the
          first argument is a `coxph' object, (only the
          first two characters are necessary).  The default
          is `"aalen"' when a `coxph' object is given, and
          it is `"kaplan-meier"' otherwise.

   error: either the string `"greenwood"' for the Greenwood
          formula or `"tsiatis"' for the Tsiatis formula,
          (only the first character is necessary).  The
          default is `"tsiatis"' when a `coxph' object is
          given, and it is `"greenwood"' otherwise.

conf.type: One of `"none"', `"plain"', `"log"' (the
          default), or `"log-log"'.  Only enough of the
          string to uniquely identify it is necessary.  The
          first option causes confidence intervals not to be
          generated.  The second causes the standard inter-
          vals `curve +- k *se(curve)', where k is deter-
          mined from `conf.int'.  The log option calculates
          intervals based on the cumulative hazard or
          log(survival). The last option bases intervals on
          the log hazard or log(-log(survival)).  These last
          will never extend past 0 or 1.

conf.lower: controls modified lower limits to the curve, the
          upper limit remains unchanged.  The modified lower
          limit is based on an 'effective n' argument.  The
          confidence bands will agree with the usual calcu-
          lation at each death time, but unlike the usual
          bands the confidence interval becomes wider at
          each censored observation.  The extra width is
          obtained by multiplying the usual variance by a
          factor m/n, where n is the number currently at
          risk and m is the number at risk at the last death
          time.  (The bands thus agree with the un-modified
          bands at each death time.)  This is especially
          useful for survival curves with a long flat tail.

          The Peto lower limit is based on the same 'effec-
          tive n' argument as the modified limit, but also
          replaces the usual Greenwood variance term with a
          simple approximation.  It is known to be conserva-
          tive.

_D_e_t_a_i_l_s_:

     Actually, the estimates used are the Kalbfleisch-Pren-
     tice (Kalbfleisch and Prentice, 1980, p.86) and the
     Tsiatis/Link/Breslow, which reduce to the Kaplan-Meier
     and Fleming-Harrington estimates, respectively, when
     the weights are unity.  When curves are fit for a Cox
     model, subject weights of `exp(sum(coef*(x-center)))'
     are used, ignoring any value for `weights' input by the
     user.  There is also an extra term in the variance of
     the curve, due to the variance ofthe coefficients and
     hence variance in the computed weights.

     The Greenwood formula for the variance is a sum of
     terms d/(n*(n-m)), where d is the number of deaths at a
     given time point, n is the sum of `weights' for all
     individuals still at risk at that time, and m is the
     sum of `weights' for the deaths at that time.  The jus-
     tification is based on a binomial argument when weights
     are all equal to one; extension to the weighted case is
     ad hoc.  Tsiatis (1981) proposes a sum of terms
     d/(n*n), based on a counting process argument which
     includes the weighted case.

     The two variants of the F-H estimate have to do with
     how ties are handled.  If there were 3 deaths out of 10
     at risk, then the first would increment the hazard by
     3/10 and the second by 1/10 + 1/9 + 1/8.  For curves
     created after a Cox model these correspond to the Bres-
     low and Efron estimates, respectively, and the proper
     choice is made automatically.  The `fh2' method will
     give results closer to the Kaplan-Meier.

     Based on the work of Link (1984), the log transform is
     expected to produce the most accurate confidence inter-
     vals.  If there is heavy censoring, then based on the
     work of Dorey and Korn (1987) the modified estimate
     will give a more reliable confidence band for the tails
     of the curve.

_V_a_l_u_e_:

     a `survfit' object; see the help on `survfit.object'
     for details. Methods defined for `survfit' objects are
     provided for `print', `plot', `lines', and `points'.

_R_e_f_e_r_e_n_c_e_s_:

     Terry Therneau, author of local function.

     Dorey, F. J. and Korn, E. L. (1987).  Effective sample
     sizes for confidence intervals for survival probabili-
     ties.  Statistics in Medicine 6, 679-87.

     Fleming, T. H. and Harrington, D.P. (1984).  Nonpara-
     metric estimation of the survival distribution in cen-
     sored data.  Comm. in Statistics 13, 2469-86.

     Kalbfleisch, J. D. and Prentice, R. L. (1980).  The
     Statistical Analysis of Failure Time Data.  Wiley, New
     York.

     Link, C. L. (1984). Confidence intervals for the sur-
     vival function using Cox's proportional hazards model
     with covariates.  Biometrics 40, 601-610.

     Tsiatis, A. (1981). A large sample study of the esti-
     mate for the integrated hazard function in Cox's
     regression model for survival data. Annals of Statis-
     tics 9, 93-108.

_E_x_a_m_p_l_e_s_:

     #fit a Kaplan-Meier and plot it
     data(aml)
     fit <- survfit(Surv(time, status) ~ x, data=aml)
     plot(fit)

     # plot only 1 of the 2 curves from above
     plot(fit[2])

     #fit a cox proportional hazards model and plot the
     #predicted survival curve
     data(ovarian)
     fit <- coxph( Surv(futime,fustat)~resid.ds+rx+ecog.ps,data=ovarian)
     plot( survfit( fit))

