

cut {base}                                   R Documentation

_C_o_n_v_e_r_t _N_u_m_e_r_i_c _t_o _F_a_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n_:

     `cut' divides the range of `x' into intervals and codes
     the values in `x' according to which interval they
     fall.  The leftmost interval corresponds to level one,
     the next leftmost to level two and so on.

_U_s_a_g_e_:

     cut(x, ...)
     cut.default(x, breaks, labels = NULL,
                 include.lowest = FALSE, right = TRUE, dig.lab = 3)

_A_r_g_u_m_e_n_t_s_:

       x: a numeric vector which is to be converted to a
          factor by cutting.

   break: either a vector of cut points or number giving the
          number of intervals which `x' is to be cut into.

  labels: labels for the levels of the resulting category.
          By default, labels are constructed using `"(a,b]"'
          interval notation. If `labels = FALSE', simple
          integer codes are returned instead of a factor.

include.lowest: logical, indicating if an `x[i]' equal to
          the lowest (or highest, for `right = FALSE')
          `breaks' value should be included.

   right: logical, indicating if the intervals should closed
          on the right (and open on the left) or vice versa.

 dig.lab: integer which is used when labels are not given.
          It determines the number of digits used in format-
          ting the break numbers.

_D_e_t_a_i_l_s_:

     If a `labels' parameter is specified, its values are
     used to name the factor levels. If none is specified,
     the factor level labels are constructed as `"(b1,
     b2]"', `"(b2, b3]"' etc. for `right=TRUE' and as `"[b1,
     b2)"', ... if `right=FALSE'.  In this case, `dig.lab'
     indicates how many digits should be used in formatting
     the numbers `b1', `b2', ....

_V_a_l_u_e_:

     A `factor' is returned, unless `labels = FALSE' which
     results in the mere integer level codes.

_N_o_t_e_:

     Instead of `table(cut(x, br))', `hist(x, br, plot =
     FALSE)' is more efficient and less memory hungry.

_S_e_e _A_l_s_o_:

     `split' for splitting a variable according to a group
     factor; `factor', `tabulate', `table'.

_E_x_a_m_p_l_e_s_:

     Z <- rnorm(10000)
     table(cut(Z, br = -6:6))
     system.time(print(sum(table(cut(Z, br = -6:6, labels=FALSE)))))
     system.time(print(sum(   hist  (Z, br = -6:6, plot=FALSE)$counts)))

     cut(rep(1,5),4)#-- dummy
     tx0 <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
     x <- rep(0:8, tx0)
     tx <- table(x)
     all(tx == tx0)
     table( cut(x, b = 8))
     table( cut(x, br = 3*(-2:5)))
     table( cut(x, br = 3*(-2:5), right = F))

     ##--- some values OUTSIDE the breaks :
     table(cx  <- cut(x, br = 2*(0:4)))
     table(cxl <- cut(x, br = 2*(0:4), right = F))
     which(is.na(cx));  x[is.na(cx)]  #-- the first 9  values  0
     which(is.na(cxl)); x[is.na(cxl)] #-- the last  5  values  8

     ## Label construction:
     y <- rnorm(100)
     table(cut(y, breaks = pi/3*(-3:3)))
     table(cut(y, breaks = pi/3*(-3:3), dig.lab=4))

     table(cut(y, breaks =  1*(-3:3), dig.lab=4))# extra digits don't "harm" here
     table(cut(y, breaks =  1*(-3:3), right = F))#- the same, since no exact INT!

