

pca(multiv)                                  R Documentation

_P_r_i_n_c_i_p_a_l _C_o_m_p_o_n_e_n_t_s _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Finds a new coordinate system for multivariate data
     such that the first coordinate has maximal variance,
     the second coordinate has maximal variance subject to
     being orthogonal to the first, etc.

_U_s_a_g_e_:

     pca(a, method=3)

_A_r_g_u_m_e_n_t_s_:

       a: data matrix to be decomposed, the rows represent-
          ing observations and the columns variables.  Miss-
          ing values are not supported.

  method: integer taking values between 1 and 8.  `Method' =
          1 implies no transformation of data matrix.  Hence
          the singular value decomposition (SVD) is carried
          out on a sums of squares and cross-products
          matrix.  `Method' = 2 implies that the observa-
          tions are centered to zero mean.  Hence the SVD is
          carried out on a variance-covariance matrix.
          `Method' = 3 (default) implies that the observa-
          tions are centered to zero mean, and additionally
          reduced to unit standard deviation.  In this case
          the observations are standardized.  Hence the SVD
          is carried out on a correlation matrix. `Method' =
          4 implies that the observations are normalized by
          being range-divided, and then the variance-covari-
          ance matrix is used.  `Method' = 5 implies that
          the SVD is carried out on a Kendall (rank-order)
          correlation matrix.  `Method' = 6 implies that the
          SVD is carried out on a Spearman (rank-order) cor-
          relation matrix.  `Method' = 7 implies that the
          SVD is carried out on the sample covariance
          matrix.  `Method' = 8 implies that the SVD is car-
          ried out on the sample correlation matrix.

_V_a_l_u_e_:

     list describing the principal components analysis:

   rproj: projections of row points on the new axes.

   cproj: projections of column points on the new axes.

   evals: eigenvalues associated with the new axes. These
          provide figures of merit for the `variance
          explained' by the new axes.  They are usually
          quoted in terms of percentage of the total, or in
          terms of cumulative percentage of the total.

   evecs: eigenvectors associated with the new axes. This
          orthogonal matrix describes the rotation.  The
          first column is the linear combination of columns
          of `a' defining the first principal component,
          etc.

_S_i_d_e _E_f_f_e_c_t_s_:

     When carrying out a PCA of a hierarchy object, the par-
     tition is specified bt `lev'.  The level plus the asso-
     ciated number of groups equals the number of observa-
     tions, at all times.

_N_O_T_E_:

     In the case of `method' = 3, if any column point has
     zero standard deviation, then a value of 1 is substi-
     tuted for the standard deviation.

     Up to 7 principal axes are determined.  The inherent
     dimensionality of either of the dual spaces is ordinar-
     ily `min(n,m)' where `n' and `m' are respectively the
     numbers of rows and columns of `a'.  The centering
     transformation which is part of `method's 2 and 3
     introduces a linear dependency causing the inherent
     dimensionality to be `min(n-1,m)'.  Hence the number of
     columns returned in `rproj', `cproj', and `evecs' will
     be the lesser of this inherent dimensionality and 7.

     In the case of `methods' 1 to 4, very small negative
     eigenvalues, if they arise, are an artifact of the SVD
     algorithm used, and may be treated as zero.  In the
     case of PCA using rank-order correlations (`methods' 5
     and 6), negative eigenvalues indicate that a Euclidean
     representation of the data is not possible.  The
     approximate Euclidean representation given by the axes
     associated with the positive eigenvalues can often be
     quite adequate for practical interpretation of the
     data.

     Routine `prcomp' is identical, to within small numeri-
     cal precision differences, to `method' = 7 here.  The
     examples below show how to transform the outputs of the
     present implementation (`pca') onto outputs of
     `prcomp'.

     Note that a very large number of columns in the input
     data matrix will cause dynamic memory problems: the
     matrix to be diagonalized requires O(m^2) storage m is
     the number of variables.

_M_E_T_H_O_D_:

     A singular value decomposition is carried out.

_B_A_C_K_G_R_O_U_N_D_:

     Principal components analysis defines the axis which
     provides the best fit to both the row points and the
     column points.  A second axis is determined which best
     fits the data subject to being orthogonal to the first.
     Third and subsequent axes are similarly found.  Best
     fit is in the least squares sense.  The criterion which
     optimizes the fit of the axes to the points is, by
     virtue of Pythagoras' theorem, simultaneously a crite-
     rion which optimizes the variance of projections on the
     axes.

     Principal components analysis is often used as a data
     reduction technique.  In the pattern recognition field,
     it is often termed the Karhunen-Loeve expansion since
     the data matrix `a' may be written as a series expan-
     sion using the eigenvectors and eigenvalues found.

_R_e_f_e_r_e_n_c_e_s_:

     Many multivariate statistics and data analysis books
     include a discussion of principal components analysis.
     Below are a few examples:

     C. Chatfield and A.J. Collins, `Introduction to Multi-
     variate Analysis', Chapman and Hall, 1980 (a good, all-
     round introduction);

     M. Kendall, `Multivariate Analysis', Griffin, 1980
     (dated in relation to computing techniques, but excep-
     tionally clear and concise in the treatment of many
     practical aspects);

     F.H.C. Marriott, `The Interpretation of Multiple Obser-
     vations', Academic, 1974 (a short, very readable text-
     book);

     L. Lebart, A. Morineau, and K.M. Warwick, `Multivariate
     Descriptive Statistical Analysis', Wiley, 1984 (an
     excellent geometric treatment of PCA);

     I.T. Joliffe, `Principal Component Analysis', Springer,
     1980.

_S_e_e _A_l_s_o_:

     `svd', `prcomp', `cancor'.

_E_x_a_m_p_l_e_s_:

     # principal components of the prim4 data
     pcprim <- pca(prim4)
     # plot of first and second principal components
     plot(pcprim$rproj[,1], pcprim$rproj[,2])
     # To label the points, uses `plot' with parameter `type="n"', followed by
     # `text': cf. examples below.
     # Place additional axes through x=0 and y=0:
     plaxes(pcprim$rproj[,1], pcprim$rproj[,2])
     # variance explained by the principal components
     pcprim$evals*100.0/sum(pcprim$evals)
     # In the implementation of the S function `prcomp', different results are
     # produced.  Here is how to obtain these results, using the function `pca'.
     # Consider the following result of `prcomp':
     old <- prcomp(prim4)
     # With `pca', one would do the following:
     new <- pca(prim4, method=7)
     # Data structures of `prcomp' are defined thus:
     n <- nrow(prim4)
     old$sdev = sqrt(new$evals/(n-1))
     old$rotation = new$evec
     center <- apply(old$x, 2, mean)
     new$rproj[1,] <- old$x[1,] - center[1]
     # One remark: the rotation matrix satisfies:
     # old$x == prim4 %*% old$rotation
     # up to numerical precision.  However, up to 7 principal components only
     # are now determined.
     #

