aggregate                package:base                R Documentation

_C_o_m_p_u_t_e _S_u_m_m_a_r_y _S_t_a_t_i_s_t_i_c_s _o_f _D_a_t_a _S_u_b_s_e_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Splits the data into subsets, computes summary statistics for
     each, and returns the result in a convenient form.

_U_s_a_g_e:

     aggregate(x, ...)
     aggregate.default(x, ...)
     aggregate.data.frame(x, by, FUN, ...)
     aggregate.ts(x, nfrequency = 1, FUN = sum, ndeltat = 1)

_A_r_g_u_m_e_n_t_s:

       x: an R object.

      by: a list of grouping elements, each as long as the variables in
          `x'.  Names for the grouping variables are provided if they
          are not given.

     FUN: a scalar function to compute the summary statistics which can
          be applied to all data subsets.

nfrequency: new number of observations per unit of time; must be a
          divisor of the frequency of `x'.

 ndeltat: new fraction of the sampling period between successive
          observations; must be a divisor of the sampling interval of
          `x'.

     ...: further arguments passed to the method used.

_D_e_t_a_i_l_s:

     `aggregate' is a generic functions with methods for data frames
     and time series.

     The default method `aggregate.default' uses the time series method
     if `x' is a time series, and otherwise coerces `x' to a data frame
     and calls the data frame method.

     `aggregate.data.frame' is the data frame method.  If `x' is not a
     data frame, it is coerced to one.  Then, each of the variables
     (columns) in `x' is split into subsets of cases (rows) of
     identical combinations of the components of `by', and `FUN' is
     applied to each such subset with further arguments in `...' passed
     to it. (I.e., `tapply(VAR, by, FUN, ..., simplify = FALSE)' is
     done for each variable `VAR' in `x', conveniently wrapped into one
     call to `lapply()'.) Empty subsets are removed, and the result is
     reformatted into a data frame containing the variables in `by' and
     `x'.  The ones arising from `by' contain the unique combinations
     of grouping values used for determining the subsets, and the ones
     arising from `x' the corresponding summary statistics for the
     subset of the respective variables in `x'.

     `aggregate.ts' is the time series method.  If `x' is not a time
     series, it is coerced to one.  Then, the variables in `x' are
     split into appropriate blocks of length `frequency(x) /
     nfrequency', and `FUN' is applied to each such block.  The result
     returned is a time series with frequency `nfrequency' holding the
     aggregated values.

_A_u_t_h_o_r(_s):

     Kurt Hornik

_S_e_e _A_l_s_o:

     `apply', `lapply', `tapply'.

_E_x_a_m_p_l_e_s:

     data(state)

     ## Compute the averages for the variables in `state.x77', grouped
     ## according to the region (Northeast, South, North Central, West) that
     ## each state belongs to.
     aggregate(state.x77, list(Region = state.region), mean)

     ## Compute the averages according to region and the occurrence of more
     ## than 130 days of frost.
     aggregate(state.x77,
               list(Region = state.region,
                    Cold = state.x77[,"Frost"] > 130),
               mean)
     ## (Note that no state in `South' is THAT cold.)

     data(presidents)
     ## Compute the average annual approval ratings for American presidents.
     aggregate(presidents, nf = 1, FUN = mean)

