

hclust {mva}                                 R Documentation

_H_i_e_r_a_r_c_h_i_c_a_l _C_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n_:

     Performs a hierarchical cluster analysis on a set of
     dissimilarities.

_U_s_a_g_e_:

     hclust(d, method = "complete")

     plot.hclust(hclust.obj, labels, hang = 0.1, ...)

_A_r_g_u_m_e_n_t_s_:

       d: a dissimilarity structure as produced by `dist'.

  method: the agglomeration method to be used. This should
          be (an unambiguous abbreviation of) one of
          `"ward"', `"single"', `"complete"', `"average"',
          `"mcquitty"', `"median"' or `"centroid"'.

hclust.obj: an object of the type produced by `hclust'.

    hang: The fraction of the plot height which labels
          should hang below the rest of the plot.  A nega-
          tive value will cause the labels to hang down from
          0.

  labels: A character vector of of labels for the leaves of
          the tree. By default the row names or row numbers
          of the original data are used. If `labels=FALSE'
          no labels at all are plotted.

_D_e_t_a_i_l_s_:

     This function performs a hierarchical cluster analysis
     using a set of dissimilarities for the n objects being
     clustered.  Initially, each object is assigned to its
     own cluster and then the algorithm proceeds itera-
     tively, at each stage joining the two most similar
     clusters, continuing until there is just a single clus-
     ter.  At each stage distances between clusters are
     recomputed by the Lance-Williams dissimilarity update
     formula according to the particular clustering method
     being used.

     An number of different clustering methods are provided.
     Ward's minimum variance method aims at finding compact,
     spherical clusters.  The complete linkage method finds
     similar clusters. The single linkage method (which is
     closely related to the minimal spanning tree) adopts a
     `friends of friends' clustering strategy.  The other
     methods can be regarded as aiming for clusters with
     characteristics somewhere between the single and com-
     plete link methods.

     In hierarchical cluster displays, a decision is needed
     at each merge to specify which subtree should go on the
     left and which on the right.  Since, for n observations
     there are n-1 merges, there are 2^{(n-1)} possible
     orderings for the leaves in a cluster tree, or dendro-
     gram.  The algorithm used in `hclust' is to order the
     subtree so that the tighter cluster is on the left (the
     last, i.e. most recent, merge of the left subtree is at
     a lower value than the last merge of the right sub-
     tree).  Single observations are the tightest clusters
     possible, and merges involving two observations place
     them in order by their observation sequence number.

_V_a_l_u_e_:

     An object of class hclust which describes the tree pro-
     duced by the clustering process.  The object is a list
     with components:

   merge: an n-1 by 2 matrix.  Row i of `merge' describes
          the merging of clusters at step i of the cluster-
          ing.  If an element j in the row is negative, then
          observation -j was merged at this stage.  If j is
          positive then the merge was with the cluster
          formed at the (earlier) stage j of the algorithm.
          Thus negative entries in `merge' indicate agglom-
          erations of singletons, and positive entries indi-
          cate agglomerations of non-singletons.

  height: a set of n-1 non-decreasing real values.  The
          clustering height: that is, the value of the cri-
          terion associated with the clustering `method' for
          the particular agglomeration.

   order: a vector giving the permutation of the original
          observations suitable for plotting, in the sense
          that a cluster plot using this ordering and matrix
          `merge' will not have crossings of the branches.

  labels: labels for each of the objects being clustered.

_A_u_t_h_o_r_(_s_)_:

     The `hclust' function is based on Fortran code con-
     tributed to STATLIB by F. Murtagh.

_R_e_f_e_r_e_n_c_e_s_:

     Everitt, B. (1974).  Cluster Analysis.  London: Heine-
     mann Educ. Books.

     Hartigan, J. A. (1975).  Clustering  Algorithms.  New
     York: Wiley.

     Sneath, P. H. A. and R. R. Sokal (1973).  Numerical
     Taxonomy.  San Francisco: Freeman.

     Anderberg, M. R. (1973).  Cluster Analysis for Applica-
     tions.  Academic Press: New York.

     Gordon, A. D. (1981).  Classification.  London: Chapman
     and Hall.

     Murtagh, F. (1985).  ``Multidimensional Clustering
     Algorithms'', in COMPSTAT Lectures 4.  Wuerzburg: Phys-
     ica-Verlag (for algorithmic details of algorithms
     used).

_S_e_e _A_l_s_o_:

     `kmeans'.

_E_x_a_m_p_l_e_s_:

     library(mva)
     data(USArrests)
     hc <- hclust(dist(USArrests), "ave")
     plot(hc, hang=-1)
     plot(hc)

