

ca(multiv)                                   R Documentation

_C_o_r_r_e_s_p_o_n_d_e_n_c_e _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Finds a new coordinate system for multivariate data
     such that the first coordinate has maximal inertia, the
     second coordinate has maximal inertia subject to being
     orthogonal to the first, etc.  Compared to Principal
     Components Analysis, each row and column point has an
     associated mass (related to the row or column totals);
     and the chi-squared distance takes the place of the
     Euclidean distance.  The issue of how to code the input
     data is important: this takes the place of input data
     transformation in PCA.

_U_s_a_g_e_:

     ca(a)

_A_r_g_u_m_e_n_t_s_:

       a: data matrix to be decomposed, the rows represent-
          ing observations and the columns variables.

      nf: number of factors or axes to be sought; default 7.

   rproj: projections of row points on the factors.

   cproj: projections of column points on the factors.

   evals: eigenvalues associated with the new factors. These
          provide figures of merit for the "inertia
          explained" by the factors.  They are usually
          quoted in terms of percentage of the total, or in
          terms of cumulative percentage of the total.

   evecs: definition of the factors in terms of the original
          variables.  The first column is the linear combi-
          nation of columns of `a' defining the first fac-
          tor, etc.

   rcntr: contributions of observations to the factors.  The
          contributions are mass times projection (on the
          factor) squared.  Since contributions take account
          of the mass, they more accurately indicate influ-
          ential observations for the interpretation of the
          factor, compared to the projections alone.

   ccntr: contributions of variables to the factors. See
          above remark concerning row contributions.

_N_O_T_E_:

     Very small negative eigenvalues, if they arise, are an
     artifact of the SVD algorithm used, and are to be
     treated as zero.

_M_E_T_H_O_D_:

     A singular value decomposition is carried out.

_B_A_C_K_G_R_O_U_N_D_:

     Correspondence analysis defines the axis which provides
     the best fit to both the row points and the column
     points.  A second axis is determined which best fits
     the data subject to being orthogonal to the first.
     Third and subsequent axes are similarly found.  Best
     fit is in the least squares sense, relative to the chi-
     squared distance.  This can be viewed as a weighted
     Euclidean distance between `profiles'.

     The question of `coding' of input data is an important
     one.  For instance, in a matrix of scores, one might
     wish to adjoin extra columns to the input matrix such
     that both the initial score, and the maximum score
     minus it, are included in the observation's set of val-
     ues.  Note that this has the effect that all row masses
     are equal.  Hence the variables alone are differen-
     tially weighted.  This is known as `doubling' the
     observations.  In the case of binary data, such coding
     is known as `complete disjunctive form'.

     Other forms of input data for which correspondence
     analysis can be used include frequencies, or contin-
     gency-type data.  In this case, the totaled chi-squared
     distances of all (row or column) points from the origin
     is the familiar chi-squared statistic. Hence the graph-
     ical output of correspondence analysis allows assess-
     ment of departure from a null hypothesis of no depen-
     dence of rows and columns.

     Supplementary rows or columns are projected into the
     factor space, after carrying out a correspondence anal-
     ysis.  That is to say, such row or column profiles are
     assumed to have zero mass, and their projections are to
     be found under such an assumption.  Functions `supplr'
     and `supplc' may be used for this purpose.  Supplemen-
     tary rows or columns are of a different nature compared
     to the basis data analyzed (e.g. sex in the context of
     a questionnaire); or they are rows or columns which,
     one suspects, would untowardly influence the definition
     of the factors.

_R_e_f_e_r_e_n_c_e_s_:

     Extensive works of J.-P. Benzecri including Correspon-
     dence Analysis Handbook Marcel Dekker, Basel, 1992.

     M.J. Greenacre, Theory and Applications of Correspon-
     dence Analysis Academic Press, New York, 1984.

     L. Lebart, A. Morineau and K.M. Warwick, Multivariate
     Descriptive Statistical Analysis Wiley, New York, 1984.

     S. Nishisato, Analysis of Categorical Data: Dual Scal-
     ing and Its Applications University of Toronto Press,
     Toronto, 1980.

     (An extensive annotated bibliography is to be found in
     Greenacre.)

_S_e_e _A_l_s_o_:

     Supplementary rows and columns: `supplr', `supplc'.
     Initial data coding: `flou', `logique'.  Other related
     functions: `pca', `prcomp', `cancor', `sammon', `cmd-
     scale'.  Plotting tool: `plaxes'.

_E_x_a_m_p_l_e_s_:

     ###
     ### WARNING: Examples cannot be executed!!!
     ###
     # correspondence analysis of the breakfast cereal data,
     # in complete disjunctive form:
     bfpos <- t(cereal.attitude)
     bfneg <- max(bfpos) - bfpos
     bfposneg <- cbind(bfpos, bfneg)
     corr <- ca(bfposneg)
     # plot of first and second factors
     plot(corr$rproj[,1], corr$rproj[,2],type="n")
     text(corr$rproj[,1], corr$rproj[,2], labels=dimnames(bfposneg[[1]]))
     # Place additional axes through x=0 and y=0:
     plaxes(corr$rproj[,1], corr$rproj[,2])
     # check of row contributions
     corr$rcntr
     #
     # Fuzzy coding of input variables, `a', `b', `c':
     a.fuzz <- flou(a)
     b.fuzz <- flou(b)
     c.fuzz <- flou(c)
     newdata <- cbind(a.fuzz, b.fuzz, c.fuzz)
     ca.newdata <- ca(newdata)

