

fda(mda)                                     R Documentation

_F_l_e_x_i_b_l_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_U_s_a_g_e_:

     fda(formula, data, weights, theta, dimension, eps, method, ...)

_A_r_g_u_m_e_n_t_s_:

 formula: of the form `y~x' it describes the response and
          the predictors. The formula can be more compli-
          cated, such as `y~log(x)+z' etc (type `?formula'
          for more details). The response should be a factor
          or category representing the response variable, or
          any vector that can be coerced to such (such as a
          logical variable).

    data: data frame containing the variables in the formula
          (optional).

 weights: an optional vector of observation weights.

   theta: an optional matrix of class scores, typically with
          less than `J-1' columns.

dimension: The dimension of the solution, no greater than
          `J-1', where `J' is the number classes. Default is
          `J-1'.

     eps: a threshold for small singular values for exclud-
          ing discriminant variables; default is
          `.Machine$double.eps'.

  method: regression method used in optimal scaling. Default
          is linear regression via the function `polyreg',
          resulting in linear discriminant analysis.  Other
          possibilities are `mars' and `bruto'.  For Penal-
          ized Discriminant analysis `gen.ridge' is appro-
          priate.

keep.fitted: a logical variable, which determines whether
          the (sometimes large) component `"fitted.values"'
          of the `"fit"' component of the returned `fda'
          object should be kept. The default is `TRUE' if `n
          * dimension <               1000'

     ...: additional arguments to `method()'.

_V_a_l_u_e_:

     an object of class `"fda"'. Use `predict' to extract
     discriminant variables, posterior probabilities or pre-
     dicted class memberships. Other extractor functions are
     `coef', `confusion' and `plot'.

     The object has the following components:

percent.explained: the percent between-group variance
          explained by each dimension (relative to the total
          explained.)

  values: optimal scaling regresssion sum-of-squares for
          each dimension (see reference).  The usual dis-
          criminant analysis eigenvalues are given by `val-
          ues/(1-values)', which are used to define `per-
          cent.explained'

   means: class means in the discriminant space. These are
          also scaled versions of the final theta's or class
          scores, and can be used in a subsequent call to
          `fda()' (this only makes sense if some columns of
          theta are omitted--see the references)

theta.mod: (internal) a class scoring matrix which allows
          predict to work properly.

dimension: dimension of discriminant space

   prior: class proprotions for the training data

     fit: fit object returned by "method"

    call: the call that created this object (allowing it to
          be `update()'-able)

confusion: confusion matrix when classifying the training
          data

          The `method' functions are required to take argu-
          ments `x' and `y' where both can be matrices, and
          should produce a matrix of `fitted.values' the
          same size as `y'. They can take additional argu-
          ments `weights' and should all have a `...{}' for
          safety sake.  Any arguments to method() can be
          passed on via the `...{}' argument of `fda()'. The
          default method `polyreg()' has a `degree' argument
          which allows polynomial regression of the required
          total degree.  See the documentation for `pre-
          dict.fda()' for further requirements of `method'.

_N_o_t_e_:

     This software it is not well-tested, we would like to
     hear of any bugs.

_A_u_t_h_o_r_(_s_)_:

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s_:

     ``Flexible Disriminant Analysis by Optimal Scoring''
     by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

_S_e_e _A_l_s_o_:

     `predict.fda', `mars', `bruto', `polyreg', `softmax',
     `confusion',

_E_x_a_m_p_l_e_s_:

     data(iris)
     irisfit <- fda(Species ~ ., data = iris)
     irisfit
     ## fda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 2
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2
     ##  99.12 100.00
     ##
     ## Degrees of Freedom (per dimension): 5
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )

     confusion(irisfit, iris)
     ##            Setosa Versicolor Virginica
     ##     Setosa     50          0         0
     ## Versicolor      0         48         1
     ##  Virginica      0          2        49
     ## attr(, "error"):
     ## [1] 0.02

     plot(irisfit)

     coef(irisfit)
     ##           [,1]        [,2]
     ## [1,] -2.126479 -6.72910343
     ## [2,] -0.837798  0.02434685
     ## [3,] -1.550052  2.18649663
     ## [4,]  2.223560 -0.94138258
     ## [5,]  2.838994  2.86801283

     marsfit <- fda(Species ~ ., data = iris, method = mars)
     marsfit2 <- update(marsfit, degree = 2)
     marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2])
     ## this refits the model, using the fitted means (scaled theta's)
     ## from marsfit to start the iterations

