

aggregate {base}                             R Documentation

_C_o_m_p_u_t_e _S_u_m_m_a_r_y _S_t_a_t_i_s_t_i_c_s _o_f _D_a_t_a _S_u_b_s_e_t_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Splits the data into subsets, computes summary statis-
     tics for each, and returns the result in a convenient
     form.

_U_s_a_g_e_:

     aggregate(x, ...)
     aggregate.default(x, ...)
     aggregate.data.frame(x, by, FUN, ...)
     aggregate.ts(x, nfrequency = 1, FUN = sum, ndeltat = 1)

_A_r_g_u_m_e_n_t_s_:

       x: an R object.

      by: a list of grouping elements, each as long as the
          variables in `x'.  Names for the grouping vari-
          ables are provided if they are not given.

     FUN: a scalar function to compute the summary statis-
          tics which can be applied to all data subsets.

nfrequency: new number of observations per unit of time;
          must be a divisor of the frequency of `x'.

 ndeltat: new fraction of the sampling period between suc-
          cessive observations; must be a divisor of the
          sampling interval of `x'.

     ...: further arguments passed to the method used.

_D_e_t_a_i_l_s_:

     `aggregate' is a generic functions with methods for
     data frames and time series.

     The default method `aggregate.default' uses the time
     series method if `x' is a time series, and otherwise
     coerces `x' to a data frame and calls the data frame
     method.

     `aggregate.data.frame' is the data frame method.  If
     `x' is not a data frame, it is coerced to one.  Then,
     each of the variables (columns) in `x' is split into
     subsets of cases (rows) of identical combinations of
     the components of `by', and `FUN' is applied to each
     such subset with further arguments in `...' passed to
     it.  (I.e., `tapply(VAR, by, FUN, ..., simplify =
     FALSE)' is done for each variable `VAR' in `x', conve-
     niently wrapped into one call to `lapply()'.)  Empty
     subsets are removed, and the result is reformatted into
     a data frame containing the variables in `by' and `x'.
     The ones arising from `by' contain the unique combina-
     tions of grouping values used for determining the sub-
     sets, and the ones arising from `x' the corresponding
     summary statistics for the subset of the respective
     variables in `x'.

     `aggregate.ts' is the time series method.  If `x' is
     not a time series, it is coerced to one.  Then, the
     variables in `x' are split into appropriate blocks of
     length `frequency(x) / nfrequency', and `FUN' is
     applied to each such block.  The result returned is a
     time series with frequency `nfrequency' holding the
     aggregated values.

_A_u_t_h_o_r_(_s_)_:

     Kurt Hornik

_S_e_e _A_l_s_o_:

     `apply', `lapply', `tapply'.

_E_x_a_m_p_l_e_s_:

     data(state)

     ## Compute the averages for the variables in `state.x77', grouped
     ## according to the region (Northeast, South, North Central, West) that
     ## each state belongs to.
     aggregate(state.x77, list(Region = state.region), mean)

     ## Compute the averages according to region and the occurrence of more
     ## than 130 days of frost.
     aggregate(state.x77,
               list(Region = state.region,
                    Cold = state.x77[,"Frost"] > 130),
               mean)
     ## (Note that no state in `South' is THAT cold.)

     data(presidents)
     ## Compute the average annual approval ratings for American presidents.
     aggregate(presidents, nf = 1, FUN = mean)

