

mona(cluster)                                R Documentation

_M_o_n_o_t_h_e_t_i_c _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Returns a list representing a divisive hierarchical
     clustering of a dataset with binary variables only.

_U_s_a_g_e_:

     mona(x)

_A_r_g_u_m_e_n_t_s_:

       x: data matrix or dataframe in which each row corre-
          sponds to an observation, and each column corre-
          sponds to a variable. All variables must be
          binary.  A limited number of missing values (NAs)
          is allowed. Every observation must have at least
          one value different from NA. No variable should
          have half of its values missing. There must be at
          least one variable which has no missing values. A
          variable with all its non-missing values identi-
          cal, is not allowed.

_D_e_t_a_i_l_s_:

     `mona' is fully described in chapter 7 of Kaufman and
     Rousseeuw (1990).  It is "monothetic" in the sense that
     each division is based on a single (well-chosen) vari-
     able, whereas most other hierarchical methods (includ-
     ing `agnes' and `diana') are "polythetic", i.e. they
     use all variables together.

     The `mona'-algorithm constructs a hierarchy of cluster-
     ings, starting with one large cluster. Clusters are
     divided until all observations in the same cluster have
     identical values for all variables.  At each stage, all
     clusters are divided according to the values of one
     variable. A cluster is divided into one cluster with
     all observations having value 1 for that variable, and
     another cluster with all observations having value 0
     for that variable.

     The variable used for splitting a cluster is the vari-
     able with the maximal total association to the other
     variables, according to the observations in the cluster
     to be splitted. The association between variables f and
     g is given by a(f,g)*d(f,g) - b(f,g)*c(f,g), where
     a(f,g), b(f,g), c(f,g), and d(f,g) are the numbers in
     the contingency table of f and g.  [That is, a(f,g)
     (resp. d(f,g)) is the number of observations for which
     f and g both have value 0 (resp. value 1); b(f,g)
     (resp. c(f,g)) is the number of observations for which
     f has value 0 (resp. 1) and g has value 1 (resp. 0).]
     The total association of a variable f is the sum of its
     associations to all variables.

     This algorithm does not work with missing values,
     therefore the data are revised, e.g. all missing values
     are filled in. To do this, the same measure of associa-
     tion between variables is used as in the algorithm.
     When variable f has missing values, the variable g with
     the largest absolute association to f is looked up.
     When the association between f and g is positive, any
     missing value of f is replaced by the value of g for
     the same observation. If the association between f and
     g is negative, then any missing value of f is replaced
     by the value of 1-g for the same observation.

_V_a_l_u_e_:

     an object of class `"mona"' representing the cluster-
     ing.  See `mona.object' for details.

_B_A_C_K_G_R_O_U_N_D_:

     Cluster analysis divides a dataset into groups (clus-
     ters) of observations that are similar to each other.
     Hierarchical methods like `agnes', `diana', and `mona'
     construct a hierarchy of clusterings, with the number
     of clusters ranging from one to the number of observa-
     tions. Partitioning methods like `pam', `clara', and
     `fanny' require that the number of clusters be given by
     the user.

_R_e_f_e_r_e_n_c_e_s_:

     Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
     in Data: An Introduction to Cluster Analysis.  Wiley,
     New York.

     Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996):
     Clustering in an Object-Oriented Environment.  Journal
     of Statistical Software, 1.  <URL:
     http://www.stat.ucla.edu/journals/jss/>

     Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
     Integrating Robust Clustering Techniques in S-PLUS,
     Computational Statistics and Data Analysis, 26, 17-37.

_S_e_e _A_l_s_o_:

     `mona.object', `plot.mona'.

_E_x_a_m_p_l_e_s_:

     data(animals)
     ma <- mona(animals)
     ma
     ## Plot similar to Figure 10 in Struyf et al (1996)
     plot(ma)

