

fanny(cluster)                               R Documentation

_F_u_z_z_y _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Returns a list representing a fuzzy clustering of the
     data into `k' clusters.

_U_s_a_g_e_:

     fanny(x, k, diss = F, metric = "euclidean", stand = F)

_A_r_g_u_m_e_n_t_s_:

       x: data matrix or dataframe, or dissimilarity matrix,
          depending on the value of the `diss' argument.

          In case of a matrix or dataframe, each row corre-
          sponds to an observation, and each column corre-
          sponds to a variable. All variables must be
          numeric.  Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typi-
          cally the output of `daisy' or `dist'. Also a vec-
          tor with length n*(n-1)/2 is allowed (where n is
          the number of observations), and will be inter-
          preted in the same way as the output of the above-
          mentioned functions. Missing values (NAs) are not
          allowed.

       k: integer, the number of clusters.  It is required
          that 0 < k < n/2 where n is the number of observa-
          tions.

    diss: logical flag: if TRUE, then `x' will be considered
          as a dissimilarity matrix. If FALSE, then `x' will
          be considered as a matrix of observations by vari-
          ables.

  metric: character string specifying the metric to be used
          for calculating dissimilarities between observa-
          tions.  The currently available options are
          "euclidean" and "manhattan".  Euclidean distances
          are root sum-of-squares of differences, and man-
          hattan distances are the sum of absolute differ-
          ences.  If `x' is already a dissimilarity matrix,
          then this argument will be ignored.

   stand: logical flag: if TRUE, then the measurements in
          `x' are standardized before calculating the dis-
          similarities. Measurements are standardized for
          each variable (column), by subtracting the vari-
          able's mean value and dividing by the variable's
          mean absolute deviation.  If `x' is already a dis-
          similarity matrix, then this argument will be
          ignored.

_D_e_t_a_i_l_s_:

     In a fuzzy clustering, each observation is "spread out"
     over the various clusters. Denote by u(i,v) the member-
     ship of observation i to cluster v.  The memberships
     are nonnegative, and for a fixed observation i they sum
     to 1.  The particular method `fanny' stems from chapter
     4 of Kaufman and Rousseeuw (1990).  Compared to other
     fuzzy clustering methods, `fanny' has the following
     features: (a) it also accepts a dissimilarity matrix;
     (b) it is more robust to the `spherical cluster'
     assumption; (c) it provides a novel graphical display,
     the silhouette plot (see `plot.partition').

     Fanny aims to minimize the objective function

          SUM_v (SUM_(i,j) u(i,v)^2 u(j,v)^2 d(i,j)) / (2 SUM_j u(j,v)^2)

     where n is the number of observations, k is the number
     of clusters and d(i,j) is the dissimilarity between
     observations i and j.

_V_a_l_u_e_:

     an object of class `"fanny"' representing the cluster-
     ing.  See `fanny.object' for details.

_B_A_C_K_G_R_O_U_N_D_:

     Cluster analysis divides a dataset into groups (clus-
     ters) of observations that are similar to each other.
     Partitioning methods like `pam', `clara', and `fanny'
     require that the number of clusters be given by the
     user.  Hierarchical methods like `agnes', `diana', and
     `mona' construct a hierarchy of clusterings, with the
     number of clusters ranging from one to the number of
     observations.

_R_e_f_e_r_e_n_c_e_s_:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
     in Data: An Introduction to Cluster Analysis.  Wiley,
     New York.

     Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996):
     Clustering in an Object-Oriented Environment.  Journal
     of Statistical Software, 1.  <URL:
     http://www.stat.ucla.edu/journals/jss/>

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
     Integrating Robust Clustering Techniques in S-PLUS,
     Computational Statistics and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o_:

     `fanny.object', `daisy', `partition.object', `plot.par-
     tition', `dist'.

_E_x_a_m_p_l_e_s_:

     ## generate 25 objects, divided into two clusters, and 3 objects lying
     ## between those clusters.
     x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
                cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)),
                cbind(rnorm(3,3.5,0.5), rnorm(3,3.5,0.5)))
     fannyx <- fanny(x, 2)
     fannyx
     summary(fannyx)
     plot(fannyx)

     data(ruspini)
     ## Plot similar to Figure 6 in Stryuf et al (1996)
     plot(fanny(ruspini, 5))

