

mda(mda)                                     R Documentation

_M_i_x_t_u_r_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_U_s_a_g_e_:

     mda(formula, data, subclasses, sub.df, tot.df, dimension, eps,
         iter, weights, method, keep.fitted, trace, ...)

_A_r_g_u_m_e_n_t_s_:

 formula: of the form `y~x' it describes the response and
          the predictors. The formula can be more compli-
          cated, such as `y~log(x)+z' etc (type `?formula'
          for more details). The response should be a factor
          or category representing the response variable, or
          any vector that can be coerced to such (such as a
          logical variable).

    data: data frame containing the variables in the formula
          (optional).

subclasses: Number of subclasses per class - default = 3.
          Can be a vector with a number for each class.

  sub.df: If subclass centroid shrinking is performed, what
          is the effective degrees of freedom of the cen-
          troids per class. Can be a scalar, in which case
          the same number is used for each class, else a
          vector.

  tot.df: The total df for all the centroids can be speci-
          fied rather than separately per class.

dimension: The dimension of the reduced model. If we know
          our final model will be confined to a discriminant
          subspace (of the subclass centroids), we can spec-
          ify this in advance and have the EM algorithm
          operate in this subspace.

     eps: A numerical threshold for automatically truncating
          the dimension.

    iter: A limit on the total number of iterations -
          default is 5.

 weights: NOT observation weights! This is a special weight
          structure, which for each class assigns a weight
          (prior probability) to each of the observations in
          that class of belonging to one of the subclasses.
          The default is provided by a call to `mda.start(x,
          g, subclasses, trace, ...)' (by this time x and g
          are known).  See the help for `mda.start()'.
          Arguments for `mda.start()' can be provided via
          the `...{}' argument to mda, and the `weights'
          argument need never be accessed. A previously fit
          `mda' object can be supplied, in which case the
          final subclass `responsibility' weights are used
          for `weights'. This allows the iterations from a
          previous fit to be continued.

  method: regression method used in optimal scaling. Default
          is linear regression via the function `polyreg',
          resulting in the usual mixture model. Other possi-
          bilities are `mars' and `bruto'. For penalized
          mixture discriminant models `gen.ridge' is appro-
          priate.

keep.fitted: a logical variable, which determines whether
          the (sometimes large) component `"fitted.values"'
          of the `"fit"' component of the returned `mda'
          object should be kept.  The default is `TRUE' if
          `n * dimension < 1000'

   trace: if `TRUE', iteration information is printed. Note
          that the deviance reported is for the posterior
          class likelihood, and not the full likelihood,
          which is used to drive the EM algorithm under mda.
          In general the latter is not available.

     ...: additional arguments to `mda.start()' and to
          `method()'.

_V_a_l_u_e_:

     An object of class `c("mda","fda")'. The most useful
     extractor is `predict', which can make many types of
     predictions from this object. It can also be plotted,
     and any functions useful for `"fda"' objects will work
     here too, such as `confusion' and `coef'.

     The object has the following components:

percent.explained: the percent between-group variance
          explained by each dimension (relative to the total
          explained.)

  values: optimal scaling regresssion sum-of-squares for
          each dimension (see reference).

   means: subclass means in the discriminant space. These
          are also scaled versions of the final theta's or
          class scores, and can be used in a subsequent call
          to `mda()' (this only makes sense if some columns
          of theta are omitted--see the references)

theta.mod: (internal) a class scoring matrix which allows
          predict to work properly.

dimension: dimension of discriminant space

sub.prior: subclass membership priors, computed in the fit.
          No effort is currently spent in trying to keep
          these above a threshold.

   prior: class proprotions for the training data

     fit: fit object returned by "method"

    call: the call that created this object (allowing it to
          be `update()'-able)

confusion: confusion matrix when classifying the training
          data

 weights: These are the subclass membership probabilities
          for each member of the training set; see the
          weights argument.

assign.theta: a pointer list which identifies which elements
          of certain lists belong to individual classes.

deviance: The multinomial log-liklihood of the fit. Even
          though the full log-likelihood drives the itera-
          tions, we cannot in general compute it because of
          the flexibility of the method() used.  The
          deviance can increase with the iterations, but
          generally does not.

          The `method' functions are required to take argu-
          ments `x' and `y' where both can be matrices, and
          should produce a matrix of `fitted.values' the
          same size as `y'. They can take additional argu-
          ments `weights' and should all have a `...{}' for
          safety sake.  Any arguments to method() can be
          passed on via the `...{}' argument of `mda()'. The
          default method `polyreg()' has a `degree' argument
          which allows polynomial regression of the required
          total degree.  See the documentation for `pre-
          dict.fda()' for further requirements of `method'.

          The function `mda.start()' creates the starting
          weights; it takes additional arguments which can
          be passed in via the `...{}' argument to `mda'.
          See the documentation for `mda.start'.

_N_o_t_e_:

     This software it is not well-tested, we would like to
     hear of any bugs.

_A_u_t_h_o_r_(_s_)_:

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s_:

     ``Flexible Disriminant Analysis by Optimal Scoring'' by
     Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

     ``Discriminant Analysis by Gaussian Mixtures'' by
     Hastie and Tibshirani, 1994, JRSS-B (in press).

_S_e_e _A_l_s_o_:

     `predict.mda', `mars', `bruto', `polyreg', `gen.ridge',
     `softmax', `confusion'

_E_x_a_m_p_l_e_s_:

     data(iris)
     irisfit <- mda(Species ~ ., data = iris)
     irisfit
     ## Call:
     ## mda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 4
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2     v3     v4
     ##  96.02  98.55  99.90 100.00
     ##
     ## Degrees of Freedom (per dimension): 5
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )
     ##
     ## Deviance: 15.102

     data(glass)
     # random sample of size 100
     samp <- c(1, 3, 4, 11, 12, 13, 14, 16, 17, 18, 19, 20, 27, 28, 31, 38,
     42, 46, 47, 48, 49, 52, 53, 54, 55, 57, 62, 63, 64, 65, 67, 68,
     69, 70, 72, 73, 78, 79, 83, 84, 85, 87, 91, 92, 94, 99, 100,
     106, 107, 108, 111, 112, 113, 115, 118, 121, 123, 124, 125, 126,
     129, 131, 133, 136, 139, 142, 143, 145, 147, 152, 153, 156, 159,
     160, 161, 164, 165, 166, 168, 169, 171, 172, 173, 174, 175, 177,
     178, 181, 182, 185, 188, 189, 192, 195, 197, 203, 205, 211, 212, 214)
     glass.train <- glass[samp,]
     glass.test <- glass[-samp,]
     glass.mda <- mda(Type ~ ., data = glass.train)
     predict(glass.mda, glass.test, type="post") # abbreviations are allowed
     confusion(glass.mda,glass.test)

