

agnes(cluster)                               R Documentation

_A_g_g_l_o_m_e_r_a_t_i_v_e _N_e_s_t_i_n_g

_D_e_s_c_r_i_p_t_i_o_n_:

     Returns a list representing an agglomerative hierarchi-
     cal clustering of the dataset.

_U_s_a_g_e_:

     agnes(x, diss = F, metric = "euclidean", stand = F, method = "average")

_A_r_g_u_m_e_n_t_s_:

       x: data matrix or dataframe, or dissimilarity matrix,
          depending on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corre-
          sponds to an observation, and each column corre-
          sponds to a variable. All variables must be
          numeric.  Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typi-
          cally the output of `daisy' or `dist'. Also a vec-
          tor with length n*(n-1)/2 is allowed (where n is
          the number of observations), and will be inter-
          preted in the same way as the output of the above-
          mentioned functions. Missing values (NAs) are not
          allowed.

    diss: logical flag: if TRUE, then `x' will be considered
          as a dissimilarity matrix. If FALSE, then `x' will
          be considered as a matrix of observations by vari-
          ables.

  metric: character string specifying the metric to be used
          for calculating dissimilarities between observa-
          tions.  The currently available options are
          "euclidean" and "manhattan".  Euclidean distances
          are root sum-of-squares of differences, and man-
          hattan distances are the sum of absolute differ-
          ences.  If `x' is already a dissimilarity matrix,
          then this argument will be ignored.

   stand: logical flag: if TRUE, then the measurements in
          `x' are standardized before calculating the dis-
          similarities. Measurements are standardized for
          each variable (column), by subtracting the vari-
          able's mean value and dividing by the variable's
          mean absolute deviation.  If `x' is already a dis-
          similarity matrix, then this argument will be
          ignored.

  method: character string defining the clustering method.
          The five methods implemented are "average" (group
          average method), "single" (single linkage), "com-
          plete" (complete linkage), "ward" (Ward's method),
          and "weighted" (weighted average linkage).
          Default is "average".

_D_e_t_a_i_l_s_:

     `agnes' is fully described in chapter 5 of Kaufman and
     Rousseeuw (1990).  Compared to other agglomerative
     clustering methods such as `hclust', `agnes' has the
     following features: (a) it yields the agglomerative
     coefficient (see `agnes.object') which measures the
     amount of clustering structure found; and (b) apart
     from the usual tree it also provides the banner, a
     novel graphical display (see `plot.agnes').

     The `agnes'-algorithm constructs a hierarchy of clus-
     terings.  At first, each observation is a small cluster
     by itself. Clusters are merged until only one large
     cluster remains which contains all the observations.
     At each stage the two "nearest" clusters are combined
     to form one larger cluster. For `method'="average", the
     distance between two clusters is the average of the
     dissimilarities between the points in one cluster and
     the points in the other cluster. In `method'="single",
     we use the smallest dissimilarity between a point in
     the first cluster and a point in the second cluster
     (nearest neighbor method).  When `method'="complete",
     we use the largest dissimilarity between a point in the
     first cluster and a point in the second cluster (fur-
     thest neighbor method).

_V_a_l_u_e_:

     an object of class `"agnes"' representing the cluster-
     ing.  See agnes.object for details.

_B_A_C_K_G_R_O_U_N_D_:

     Cluster analysis divides a dataset into groups (clus-
     ters) of observations that are similar to each other.
     Hierarchical methods like `agnes', `diana', and `mona'
     construct a hierarchy of clusterings, with the number
     of clusters ranging from one to the number of observa-
     tions. Partitioning methods like `pam', `clara', and
     `fanny' require that the number of clusters be given by
     the user.

_R_e_f_e_r_e_n_c_e_s_:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
     in Data: An Introduction to Cluster Analysis.  Wiley,
     New York.

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
     Integrating Robust Clustering Techniques in S-PLUS,
     Computational Statistics and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o_:

     `agnes.object', `daisy', `diana', `dist', `hclust',
     `plot.agnes', `twins.object'.

_E_x_a_m_p_l_e_s_:

     data(votes.repub)
     agn1 <- agnes(votes.repub, metric = "manhattan", stand = TRUE)
     print(agn1)
     plot(agn1)
     agn2 <- agnes(daisy(votes.repub), diss = TRUE, method = "complete")
     plot(agn2)

     data(agriculture)
     ## Plot similar to Figure 7 in ref
     plot(agnes(agriculture), ask = TRUE)

     data(votes.repub)
     agn1 <- agnes(votes.repub, metric = "manhattan", stand = TRUE)
     print(agn1)
     plot(agn1)
     agn2 <- agnes(daisy(votes.repub), diss = TRUE, method = "complete")
     plot(agn2)

     data(agriculture)
     ## Plot similar to Figure 7 in ref
     plot(agnes(agriculture), ask = TRUE)

