

lda(MASS)                                    R Documentation

_L_i_n_e_a_r _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n_:

     Linear discriminant analysis.

_U_s_a_g_e_:

     lda(formula, data, prior = proportions, tol = 1.0e-4,
                        subset, na.action = na.fail,
                        method, CV = FALSE, nu)
     lda(x,   grouping, prior = proportions, tol = 1.0e-4,
                        subset, na.action = na.fail,
                        method, CV = FALSE, nu)

_A_r_g_u_m_e_n_t_s_:

 formula: A formula of the form `groups ~ x1 + x2 + ...{}'
          That is, the response is the grouping factor and
          the right hand side specifies the (non-factor)
          discriminators.

    data: Data frame from which variables specified in
          `formula' are preferentially to be taken.

       x: (required if no formula is given as the principal
          argument.)  a matrix or data frame or Matrix con-
          taining the explanatory variables.

grouping: (required if no formula principal argument is
          given.)  a factor specifying the class for each
          observation.

   prior: the prior probabilities of class membership.  If
          unspecified, the class proportions for the train-
          ing set are used.  If present, the probabilities
          should be specified in the order of the factor
          levels.

     tol: A tolerance to decide if a matrix is singular; it
          will reject variables and linear combinations of
          unit-variance variables whose variance is less
          than `tol^2'.

  subset: An index vector specifying the cases to be used in
          the training sample.  (NOTE: If given, this argu-
          ment must be named.)

na.action: A function to specify the action to be taken if
          `NA's are found.  The default action is for the
          procedure to fail.  An alternative is na.omit,
          which leads to rejection of cases with missing
          values on any required variable.  (NOTE: If given,
          this argument must be named.)

  method: `"moment"' for standard estimators of the mean and
          variance, `"mle"' for MLEs, `"mve"' to use
          `cov.mve', or `"t"' for robust estimates based on
          a t distribution.

      CV: If true, returns results (classes and posterior
          probabilities) for leave-out-out cross-validation.
          Note that if the prior is estimated, the propor-
          tions in the whole dataset are used.

      nu: degrees of freedom for `"method = t"'.

_D_e_t_a_i_l_s_:

     The function tries hard to detect if the within-class
     covariance matrix is singular. If any variable has
     within-group variance less than `tol^2' it will stop
     and report the variable as constant.  This could result
     from poor scaling of the problem, but is more likely to
     result from constant variables.

     Specifying the `prior' will affect the classification
     unless over-ridden in `predict.lda'. Unlike in most
     statistical packages, it will also affect the rotation
     of the linear discriminants within their space, as a
     weighted between-groups covariance matrix is used. Thus
     the first few linear discriminants emphasize the dif-
     ferences between groups with the weights given by the
     prior, which may differ from their prevalence in the
     dataset.

_V_a_l_u_e_:

     an object of class `lda' containing the following com-
     ponents:

   prior: the prior probabilities used.

   means: the group means.

 scaling: a matrix which transforms observations to discrim-
          inant functions, normalized so that within groups
          covariance matrix is spherical.

     svd: the singular values, which give the ratio of the
          between- and within-group standard deviations on
          the linear discriminant variables.  Their squares
          are the canonical F-statistics.

       N: The number of observations used.

    call: The (matched) function call.

          unless `CV=TRUE', when the return value is a list
          with components:

   class: The MAP classification (a factor)

posterior: posterior probabilities for the classes

_N_o_t_e_:

     This function may be called giving either a formula and
     optional data frame, or a matrix and grouping factor as
     the first two arguments.  All other arguments are
     optional, but `subset=' and `na.action=', if required,
     must be fully named.

     If a formula is given as the principal argument the
     object may be modified using `update()' in the usual
     way.

_S_e_e _A_l_s_o_:

     `predict.lda', `qda', `predict.qda'

_E_x_a_m_p_l_e_s_:

     data(iris3)
     Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
                        Sp = rep(c("s","c","v"), rep(50,3)))
     train <- sample(1:150, 75)
     table(Iris$Sp[train])
     ## your answer may differ
     ##  c  s  v
     ## 22 23 30
     z <- lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
     predict(z, Iris[-train, ])$class
     ##  [1] s s s s s s s s s s s s s s s s s s s s s s s s s s s c c c
     ## [31] c c c c c c c v c c c c v c c c c c c c c c c c c v v v v v
     ## [61] v v v v v v v v v v v v v v v
     z1 <- update(z, . ~ . - Petal.W.)

