density                 package:base                 R Documentation

_K_e_r_n_e_l _D_e_n_s_i_t_y _E_s_t_i_m_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     The function `density' computes kernel density estimates with the
     given kernel and bandwidth.

     The generic functions `plot' and `print' have methods for density
     objects.

_U_s_a_g_e:

     density(x, bw, adjust = 1,
             kernel=c("gaussian", "epanechnikov", "rectangular", "triangular",
                      "biweight", "cosine", "optcosine"),
             window = kernel, width,
             give.Rkern = FALSE,
             n = 512, from, to, cut = 3, na.rm = FALSE)
     print(dobj)
     plot(dobj, main = NULL, xlab = NULL, ylab = "Density", type = "l",
          zero.line = TRUE, ...)

_A_r_g_u_m_e_n_t_s:

       x: the data from which the estimate is to be computed.

      bw: the smoothing bandwidth to be used.  The kernels are scaled
          such that this is the standard deviation of the smoothing
          kernel. It defaults to 0.9 times the minimum of the standard
          deviation and the interquartile range divided by 1.34 times
          the sample size to the negative one-fifth power (=
          Silverman's ``rule of thumb'') unless the quartiles coincide
          where `bw > 0' will be guaranteed. The specified (or default)
          value of `bw' is multiplied by `adjust'. 

  adjust: the bandwidth used is actually `adjust*bw'. This makes it
          easy to specify values like ``half the default'' bandwidth.

kernel,window: a character string giving the smoothing kernel to be
          used. This must be one of `"gaussian"', `"rectangular"',
          `"triangular"', `"epanechnikov"', `"biweight"', `"cosine"' or
          `"optcosine"', with default `"gaussian"', and may be
          abbreviated to a unique prefix (single letter).

          `"cosine"' is smoother than `"optcosine"', which is the usual
          ``cosine'' kernel in the literature and almost MSE-efficient. 

   width: this exists for compatibility with S; if given, and `bw' is
          not, will set `bw = width/4'.

give.Rkern: logical; if true, no density is estimated, and the
          ``canonical bandwidth'' of the chosen `kernel' is returned
          instead.

       n: the number of equally spaced points at which the density is
          to be estimated.  When `n > 512', it is rounded up to the
          next power of 2 for efficiency reasons (`fft').

 from,to: the left and right-most points of the grid at which the
          density is to be estimated.

     cut: by default, the values of `left' and `right' are `cut'
          bandwidths beyond the extremes of the data. This allows the
          estimated density to drop to approximately zero at the
          extremes.

   na.rm: logical; if `TRUE', missing values are removed from `x'. If
          `FALSE' any missing values cause an error.

    dobj: a ``density'' object.

main, xlab, ylab, type: plotting parameters with useful defaults.

     ...: further plotting parameters.

zero.line: logical; if `TRUE', add a base line at y = 0

_D_e_t_a_i_l_s:

     The algorithm used in `density' disperses the mass of the
     empirical distribution function over a regular grid of at least
     512 points and then uses the fast Fourier transform to convolve
     this approximation with a discretized version of the kernel and
     then uses linear approximation to evaluate the density at the
     specified points.

     The statistical properties of a kernel are determined by sig^2 (K)
     = int(t^2 K(t) dt) which is always = 1 for our kernels (and hence
     the bandwidth `bw' is the standard deviation of the kernel) and
     R(K) = int(K^2(t) dt).
     MSE-equivalent bandwidths (for different kernels) are proportional
     to sig(K) R(K) which is scale invariant and for our kernels equal
     to R(K).  This value is returned when `give.Rkern = TRUE'.  See
     the examples for using exact equivalent bandwidths.

     Infinite values in `x' are assumed to correspond to a point mass
     at `+/-Inf' and the density estimate is of the sub-density on
     `(-Inf, +Inf)'.

_V_a_l_u_e:

     If `give.Rkern' is true, the number R(K), otherwise an object with
     class `"density"' whose underlying structure is a list containing
     the following components. 

       x: the `n' coordinates of the points where the density is
          estimated.

       y: the estimated density values.

      bw: the bandwidth used.

       N: the sample size after elimination of missing values.

    call: the call which produced the result.

data.name: the deparsed name of the `x' argument.

  has.na: logical, for compatibility (always FALSE).

_R_e_f_e_r_e_n_c_e_s:

     Silverman, B. W. (1986) Density Estimation. London: Chapman and
     Hall.

     Venables, W. N. and B. D. Ripley (1994, 7, 9) Modern Applied
     Statistics with S-PLUS. New York: Springer.

     Scott, D. W. (1992) Multivariate Density Estimation. Theory,
     Practice and Visualization. New York: Wiley.

     Sheather, S. J. and Jones M. C. (1991) A reliable data-based
     bandwidth selection method for kernel density estimation. J. Roy.
     Statist. Soc. B, 683-690.

_S_e_e _A_l_s_o:

     `hist'.

_E_x_a_m_p_l_e_s:

     plot(density(c(-20,rep(0,98),20)), xlim = c(-4,4))# IQR = 0

     # The Old Faithful geyser data
     data(faithful)
     d <- density(faithful$eruptions, bw = 0.15)
     d
     plot(d)

     plot(d, type = "n")
     polygon(d, col = "wheat")

     ## Missing values:
     x <- xx <- faithful$eruptions
     x[i.out <- sample(length(x), 10)] <- NA
     doR <- density(x, bw = 0.15, na.rm = TRUE)
     lines(doR, col = "blue")
     points(xx[i.out], rep(.01,10))

     (kernels <- eval(formals(density)$kernel))

     plot (density(0,bw = 1))
     for(i in 2:length(kernels))
        lines(density(0,bw = 1, kern =  kernels[i]), col = i)
     mtext(side = 3, "R's density() kernels with bw = 1")
     legend(1.5,.4, leg = kernels, col = seq(kernels),lty = 1, cex = .8, y.int = 1)

     (RKs <- cbind(sapply(kernels, function(k)density(kern = k, give.Rkern = TRUE))))
     100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies

     data(precip)
     plot(density(precip, n = 2^13))
     for(i in 2:length(kernels))
        lines(density(precip, kern =  kernels[i], n = 2^13), col = i)
     mtext(side = 3, "same scale bandwidths, 7 different kernels")

     ## Bandwidth Adjustment for "Exactly Equivalent Kernels"
     h.f <- sapply(kernels, function(k)density(kern = k, give.Rkern = TRUE))
     (h.f <- (h.f["gaussian"] / h.f)^ .2)
     ## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..

     plot(density(precip, n = 2^13))
     for(i in 2:length(kernels))
        lines(density(precip, adjust = h.f[i], kern =  kernels[i], n = 2^13),
              col = i)
     mtext(side = 3, "equivalent bandwidths, 7 different kernels")
     legend(55,.035, leg = kernels, col = seq(kernels), lty = 1)

