

diana(cluster)                               R Documentation

_D_i_v_i_s_i_v_e _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Returns a list representing a divisive hierarchical
     clustering of the dataset.

_U_s_a_g_e_:

     diana(x, diss = F, metric = "euclidean", stand = F)

_A_r_g_u_m_e_n_t_s_:

       x: data matrix or dataframe, or dissimilarity matrix,
          depending on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corre-
          sponds to an observation, and each column corre-
          sponds to a variable. All variables must be
          numeric.  Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typi-
          cally the output of `daisy' or `dist'. Also a vec-
          tor with length n*(n-1)/2 is allowed (where n is
          the number of observations), and will be inter-
          preted in the same way as the output of the above-
          mentioned functions. Missing values (NAs) are not
          allowed.

    diss: logical flag: if TRUE, then `x' will be considered
          as a dissimilarity matrix. If FALSE, then `x' will
          be considered as a matrix of observations by vari-
          ables.

  metric: character string specifying the metric to be used
          for calculating dissimilarities between observa-
          tions.  The currently available options are
          "euclidean" and "manhattan".  Euclidean distances
          are root sum-of-squares of differences, and man-
          hattan distances are the sum of absolute differ-
          ences.  If `x' is already a dissimilarity matrix,
          then this argument will be ignored.

   stand: logical flag: if TRUE, then the measurements in
          `x' are standardized before calculating the dis-
          similarities. Measurements are standardized for
          each variable (column), by subtracting the vari-
          able's mean value and dividing by the variable's
          mean absolute deviation.  If `x' is already a dis-
          similarity matrix, then this argument will be
          ignored.

_D_e_t_a_i_l_s_:

     `diana' is fully described in chapter 6 of Kaufman and
     Rousseeuw (1990).  It is probably unique in computing a
     divisive hierarchy, whereas most other software for
     hierarchical clustering is agglomerative.  Moreover,
     `diana' provides (a) the divisive coefficient (see
     `diana.object') which measures the amount of clustering
     structure found; and (b) the banner, a novel graphical
     display (see `plot.diana').

     The `diana'-algorithm constructs a hierarchy of clus-
     terings, starting with one large cluster containing all
     n observations. Clusters are divided until each cluster
     contains only a single observation.  At each stage, the
     cluster with the largest diameter is selected.  (The
     diameter of a cluster is the largest dissimilarity
     between any two of its observations.)  To divide the
     selected cluster, the algorithm first looks for its
     most disparate observation (i.e., which has the largest
     average dissimilarity to the other observations of the
     selected cluster). This observation initiates the
     "splinter group". In subsequent steps, the algorithm
     reassigns observations that are closer to the "splinter
     group" than to the "old party". The result is a divi-
     sion of the selected cluster into two new clusters.

_V_a_l_u_e_:

     an object of class `"diana"' representing the cluster-
     ing.  See diana.object for details.

_B_A_C_K_G_R_O_U_N_D_:

     Cluster analysis divides a dataset into groups (clus-
     ters) of observations that are similar to each other.
     Hierarchical methods like `agnes', `diana', and `mona'
     construct a hierarchy of clusterings, with the number
     of clusters ranging from one to the number of observa-
     tions. Partitioning methods like `pam', `clara', and
     `fanny' require that the number of clusters be given by
     the user.

_R_e_f_e_r_e_n_c_e_s_:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
     in Data: An Introduction to Cluster Analysis.  Wiley,
     New York.

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
     Integrating Robust Clustering Techniques in S-PLUS,
     Computational Statistics and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o_:

     `agnes', `diana.object', `daisy', `dist', `plot.diana',
     `twins.object'.

_E_x_a_m_p_l_e_s_:

     data(votes.repub)
     dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
     print(dv)
     plot(dv)

     data(agriculture)
     ## Plot similar to Figure 8 in ref
     plot(diana(agriculture), ask = TRUE)

     data(votes.repub)
     dv <- diana(votes.repub, metric = "manhattan", stand = TRUE)
     print(dv)
     plot(dv)

     data(agriculture)
     ## Plot similar to Figure 8 in ref
     plot(diana(agriculture), ask = TRUE)

