

density {base}                               R Documentation

_K_e_r_n_e_l _D_e_n_s_i_t_y _E_s_t_i_m_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n_:

     The function `density' computes kernel density esti-
     mates with the given kernel and bandwidth.

     The generic functions `plot' and `print' have methods
     for density objects.

_U_s_a_g_e_:

     density(x, bw, adjust = 1,
             kernel=c("gaussian", "epanechnikov", "rectangular", "triangular",
                      "biweight", "cosine", "optcosine"),
             window = kernel, width,
             give.Rkern = FALSE,
             n = 512, from, to, cut = 3, na.rm = FALSE)
     print(dobj)
     plot(dobj, main = NULL, xlab = NULL, ylab = "Density", type = "l",
          zero.line = TRUE, ...)

_A_r_g_u_m_e_n_t_s_:

       x: the data from which the estimate is to be com-
          puted.

      bw: the smoothing bandwidth to be used.  The kernels
          are scaled such that this is the standard devia-
          tion of the smoothing kernel.  It defaults to 0.9
          times the minimum of the standard deviation and
          the interquartile range divided by 1.34 times the
          sample size to the negative one-fifth power (=
          Silverman's ``rule of thumb'') unless the quar-
          tiles coincide where `bw > 0' will be guaranteed.
          The specified (or default) value of `bw' is multi-
          plied by `adjust'.

  adjust: the bandwidth used is actually `adjust*bw'.  This
          makes it easy to specify values like ``half the
          default'' bandwidth.

kernel,window: a character string giving the smoothing ker-
          nel to be used.  This must be one of `"gaussian"',
          `"rectangular"', `"triangular"', `"epanechnikov"',
          `"biweight"', `"cosine"' or `"optcosine"', with
          default `"gaussian"', and may be abbreviated to a
          unique prefix (single letter).

          `"cosine"' is smoother than `"optcosine"', which
          is the usual ``cosine'' kernel in the literature
          and almost MSE-efficient.

   width: this exists for compatibility with S; if given,
          and `bw' is not, will set `bw = width/4'.

give.Rkern: logical; if true, no density is estimated, and
          the ``canonical bandwidth'' of the chosen `kernel'
          is returned instead.

       n: the number of equally spaced points at which the
          density is to be estimated.  When `n > 512', it is
          rounded up to the next power of 2 for efficieny
          reasons (`fft').

 from,to: the left and right-most points of the grid at
          which the density is to be estimated.

     cut: by default, the values of `left' and `right' are
          `cut' bandwidths beyond the extremes of the data.
          This allows the estimated density to drop to
          approximately zero at the extremes.

   na.rm: logical; if `TRUE', missing values are removed
          from `x'. If `FALSE' any missing values cause an
          error.

    dobj: a ``density'' object.

main, xlab, ylab, type: plotting parameters with useful
          defaults.

     ...: further plotting parameters.

zero.line: logical; if `TRUE', add a base line at y = 0

_D_e_t_a_i_l_s_:

     The algorithm used in `density' disperses the mass of
     the empirical distribution function over a regular grid
     of at least 512 points and then uses the fast Fourier
     transform to convolve this approximation with a dis-
     cretized version of the kernel and then uses linear
     approximation to evaluate the density at the specified
     points.

     The statistical properties of a kernel are determined
     by sig^2 (K) = int(t^2 K(t) dt) which is always = 1 for
     our kernels (and hence the bandwidth `bw' is the stan-
     dard deviation of the kernel) and R(K) = int(K^2(t)
     dt).
     MSE-equivalent bandwidths (for different kernels) are
     proportional to sig(K) R(K) which is scale invariant
     and for our kernels equal to R(K).  This value is
     returned when `give.Rkern = TRUE'.  See the examples
     for using exact equivalent bandwidths.

     Infinite values in `x' are assumed to correspond to a
     point mass at `+/-Inf' and the density estimate is of
     the sub-density on `(-Inf, +Inf)'.

_V_a_l_u_e_:

     If `give.Rkern' is true, the number R(K), otherwise an
     object with class `"density"' whose underlying struc-
     ture is a list containing the following components.

       x: the `n' coordinates of the points where the den-
          sity is estimated.

       y: the estimated density values.

      bw: the bandwidth used.

       N: the sample size after elimination of missing val-
          ues.

    call: the call which produced the result.

data.name: the deparsed name of the `x' argument.

  has.na: logical, for compatibility (always FALSE).

_R_e_f_e_r_e_n_c_e_s_:

     Silverman, B. W. (1986).  Density Estimation.  London:
     Chapman and Hall.

     Venables, W. N. and B. D. Ripley (1994,7,9).  Modern
     Applied Statistics with S-PLUS.  New York: Springer.

     Scott, D. W. (1992).  Multivariate Density Estimation.
     Theory, Practice and Visualization.  New York: Wiley.

     Sheather, S. J. and M. C. Jones (1991).  ``A reliable
     data-based bandwidth selection method for kernel den-
     sity estimation.  J. Roy. Statist. Soc. B, 683-690.

_S_e_e _A_l_s_o_:

     `convolve', `hist'.

_E_x_a_m_p_l_e_s_:

     plot(density(c(-20,rep(0,98),20)),xlim=c(-4,4))# IQR = 0

     # The Old Faithful geyser data
     data(faithful)
     d <- density(faithful$eruptions, bw=0.15)
     d
     plot(d)

     plot(d, type="n")
     polygon(d, col="wheat")

     ## Missing values:
     x <- xx <- faithful$eruptions
     x[i.out <- sample(length(x), 10)] <- NA
     doR <- density(x, bw=0.15, na.rm = TRUE)
     lines(doR, col="blue")
     points(xx[i.out], rep(.01,10))

     (kernels <- eval(formals(density)$kernel))

     plot (density(0,bw=1))
     for(i in 2:length(kernels))
        lines(density(0,bw=1, kern= kernels[i]), col=i)
     mtext(side=3, "R's density() kernels with bw=1")
     legend(1.5,.4, leg=kernels, col=seq(kernels),lty=1, cex=.8, y.int=1)

     (RKs <- cbind(sapply(kernels, function(k)density(kern=k, give.Rkern=TRUE))))
     100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies

     data(precip)
     plot(density(precip, n=2^13))
     for(i in 2:length(kernels))
        lines(density(precip, kern= kernels[i], n=2^13), col=i)
     mtext(side=3, "same scale bandwidths, 7 different kernels")

     ## Bandwidth Adjustment for "Exactly Equivalent Kernels"
     h.f <- sapply(kernels, function(k)density(kern=k, give.Rkern=TRUE))
     (h.f <- (h.f["gaussian"] / h.f)^ .2)
     ## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..

     plot(density(precip, n=2^13))
     for(i in 2:length(kernels))
        lines(density(precip, adjust = h.f[i], kern= kernels[i], n=2^13), col=i)
     mtext(side=3, "equivalent bandwidths, 7 different kernels")
     legend(55,.035, leg=kernels, col=seq(kernels),lty=1)

