

empinf(boot)                                 R Documentation

_E_m_p_i_r_i_c_a_l _I_n_f_l_u_e_n_c_e _V_a_l_u_e_s

_D_e_s_c_r_i_p_t_i_o_n_:

     This function calculates the empirical influence values
     for a statistic applied to a data set.  It allows four
     types of calculation, namely the infinitesimal jack-
     knife (using numerical differentiation), the usual
     jackknife estimates, the "positive" jackknife estimates
     and a method which estimates the empirical influence
     values using regression of bootstrap replicates of the
     statistic.  All methods can be used with one or more
     samples.

_U_s_a_g_e_:

     empinf(boot.out=NULL, data=NULL, statistic=NULL,
            type=<<see below>>, stype="w", index=1, t=NULL,
            strata=rep(1, n), eps=0.001, ...)

_A_r_g_u_m_e_n_t_s_:

boot.out=: A bootstrap object created by the function
          `boot'.  If `type' is `"reg"' then this argument
          is required.  For any of the other types it is an
          optional argument.  If it is included when
          optional then the values of `data', `statistic',
          `stype', and `strata' are taken from the compo-
          nents of `boot.out' and any values passed to `emp-
          inf' directly are ignored.

   data=: A vector, matrix or data frame containing the data
          for which empirical influence values are required.
          It is a required argument if `boot.out' is not
          supplied.  If `boot.out' is supplied then `data'
          is set to `boot.out$data' and any value supplied
          is ignored.

statistic=: The statistic for which empirical influence val-
          ues are required.  It must be a function of at
          least two arguments, the data set and a vector of
          weights, frequencies or indices.  The nature of
          the second argument is given by the value of
          `stype'.  Any other arguments that it takes must
          be supplied to `empinf' and will be passed to
          `statistic' unchanged.  This is a required argu-
          ment if `boot.out' is not supplied, otherwise its
          value is taken from `boot.out' and any value sup-
          plied here will be ignored.

   type=: The calculation type to be used for the empirical
          influence values.  Possible values of `type' are
          `"inf"' (infinitesimal jackknife), `"jack"' (usual
          jackknife), `"pos"' (positive jackknife), and
          `"reg"' (regression estimation).  The default
          value depends on the other arguments.  If `t' is
          supplied then the default value of `type' is
          `"reg"' and `boot.out' should be present so that
          its frequency array can be found.  It `t' is not
          supplied then if `stype' is `"w"', the default
          value of `type' is `"inf"'; otherwise, if
          `boot.out' is present the default is `"reg"'.  If
          none of these conditions apply then the default is
          `"jack"'.  Note that it is an error for `type' to
          be `"reg"' if `boot.out' is missing or to be
          `"inf"' if `stype' is not `"w"'.

  stype=: A character variable giving the nature of the sec-
          ond argument to `statistic'.  It can take on three
          values: `"w"' (weights), `"f"' (frequencies), or
          `"i"' (indices).  If `boot.out' is supplied the
          value of `stype' is set to `boot.out$stype' and
          any value supplied here is ignored.  Otherwise it
          is an optional argument which defaults to `"w"'.
          If `type' is `"inf"' then `stype' MUST be `"w"'.

  index=: An integer giving the position of the variable of
          interest in the output of `statistic'.

      t=: A vector of length `boot.out$R' which gives the
          bootstrap replicates of the statistic of interest.
          `t' is used only when `type' is `reg' and it
          defaults to `boot.out$t[,index]'.

 strata=: An integer vector or a factor specifying the
          strata for multi-sample problems.  If `boot.out'
          is supplied  the value of `strata' is set to
          `boot.out$strata'.  Otherwise it is an optional
          argument which has default corresponding to the
          single sample situation.

    eps=: This argument is used only if `type' is `"inf"'.
          In that case the value of epsilon to be used for
          numerical differentiation will be `eps' divided by
          the number of observations in `data'.

     ...: Any other arguments that `statistic' takes.  They
          will be passed unchanged to `statistic' every time
          that it is called.

_D_e_t_a_i_l_s_:

     If `type' is `"inf"' then numerical differentiation is
     used to approximate the empirical influence values.
     This makes sense only for statistics which are written
     in weighted form (i.e. `stype' is `"w"').  If `type' is
     `"jack"' then the usual leave-one-out jackknife esti-
     mates of the empirical influence are returned.  If
     `type' is `"pos"' then the positive (include-one-twice)
     jackknife values are used.  If `type' is `"reg"' then a
     bootstrap object must be supplied.  The regression
     method then works by regressing the bootstrap repli-
     cates of `statistic' on the frequency array from which
     they were derived.  The bootstrap frequency array is
     obtained through a call to `boot.array'.  Further
     details of the methods are given in Section 2.7 of
     Davison and Hinkley (1997).

     Empirical influence values are often used frequently in
     nonparametric bootstrap applications.  For this reason
     many other functions call `empinf' when they are
     required.  Some examples of their use are for nonpara-
     metric delta estimates of variance, BCa intervals and
     finding linear approximations to statistics for use as
     control variates.  They are also used for antithetic
     bootstrap resampling.

_V_a_l_u_e_:

     A vector of the empirical influence values of `statis-
     tic' applied to `data'.  The values will be in the same
     order as the observations in data.

_W_A_R_N_I_N_G_:

     All arguments to `empinf' must be passed using the
     `name=value' convention.  If this is not followed then
     unpredictable errors can occur.

_R_e_f_e_r_e_n_c_e_s_:

     Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Meth-
     ods and Their Application. Cambridge University Press.

     Efron, B. (1982) The Jackknife, the Bootstrap and Other
     Resampling Plans.  CBMS-NSF Regional Conference Series
     in Applied Mathematics, 38, SIAM.

     Fernholtz, L.T. (1983) von Mises Calculus for Statisti-
     cal Functionals.  Lecture Notes in Statistics, 19,
     Springer-Verlag.

_S_e_e _A_l_s_o_:

     `boot', `boot.array', `boot.ci', `control',
     `jack.after.boot', `linear.approx', `var.linear'

_E_x_a_m_p_l_e_s_:

     # The empirical influence values for the ratio of means in
     # the city data.
     data(city)
     ratio <- function(d, w) sum(d$x *w)/sum(d$u*w)
     empinf(data=city,statistic=ratio)
     city.boot <- boot(city,ratio,499,stype="w")
     empinf(boot.out=city.boot,type="reg")

     # A statistic that may be of interest in the difference of means
     # problem is the t-statistic for testing equality of means.  In
     # the bootstrap we get replicates of the difference of means and
     # the variance of that statistic and then want to use this output
     # to get the empirical influence values of the t-statistic.
     data(gravity)
     grav1 <- gravity[as.numeric(gravity[,2])>=7,]
     grav.fun <- function(dat, w)
     {    strata <- tapply(dat[, 2], as.numeric(dat[, 2]))
          d <- dat[, 1]
          ns <- tabulate(strata)
          w <- w/tapply(w, strata, sum)[strata]
          mns <- tapply(d * w, strata, sum)
          mn2 <- tapply(d * d * w, strata, sum)
          s2hat <- sum((mn2 - mns^2)/ns)
          c(mns[2]-mns[1],s2hat)
     }

     grav.boot <- boot(grav1, grav.fun, R=499, stype="w", strata=grav1[,2])

     # Since the statistic of interest is a function of the bootstrap
     # statistics, we must calculate the bootstrap replicates and pass
     # them to empinf using the t argument.
     grav.z <- (grav.boot$t[,1]-grav.boot$t0[1])/sqrt(grav.boot$t[,2])
     empinf(boot.out=grav.boot,t=grav.z)

