

stepAIC(MASS)                                R Documentation

_C_h_o_o_s_e _a _m_o_d_e_l _b_y _A_I_C _i_n _a _S_t_e_p_w_i_s_e _A_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n_:

     Performs stepwise model selection by exact AIC.

_U_s_a_g_e_:

     stepAIC(object, scope, scale, direction=c("both", "backward", "forward"),
             trace=1, keep=NULL, steps=1000, use.start=FALSE, k=2, ...)
     extractAIC(fit, scale, k=2, ...)

_A_r_g_u_m_e_n_t_s_:

object fit: an object representing a model of an appropriate
          class.  This is used as the initial model in the
          stepwise search.

   scope: defines the range of models examined in the step-
          wise search.

   scale: used in the definition of the AIC statistic for
          selecting the models, currently only for `lm',
          `aov' and `glm' models.

direction: the mode of stepwise search, can be one of
          `"both"', `"backward"', or `"forward"', with a
          default of `"both"'.  If the `scope' argument is
          missing, the default for `direction' is `"back-
          ward"'.

   trace: if positive, information is printed during the
          running of `stepAIC()'.  Larger values may give
          more information on the fitting process.

    keep: a filter function whose input is a fitted model
          object and the associated `AIC' statistic, and
          whose output is arbitrary.  Typically `keep' will
          select a subset of the components of the object
          and return them. The default is not to keep any-
          thing.

   steps: the maximum number of steps to be considered.  The
          default is 1000 (essentially as many as required).
          It is typically used to stop the process early.

use.start: if true the updated fits are done starting at the
          linear predictor for the currently selected model.
          This may speed up the iterative calculations for
          `glm' (and other fits), but it can also slow them
          down.

       k: the multiple of the number of degrees of freedom
          used for the penalty.  Only `k=2' gives the gen-
          uine AIC: `k = log(n)' is sometimes referred to as
          BIC or SBC.

     ...: any additional arguments to `extractAIC'. (None
          are currently used.)

_D_e_t_a_i_l_s_:

     `stepAIC' differs from `step' and especially `step.glm'
     in using the exact AIC rather than potentially mislead-
     ing one-step approximations.  It is also much more
     widely applicable: all that is required is a method for
     `extractAIC', which should return a vector `c(modeldf,
     AIC)'.  The default method handles linear models (`lm',
     `aov' and `glm' of family `"Gaussian"' with identity
     link) using `addterm.lm' and `dropterm.lm': for these
     the results are similar to `step.glm' except that the
     AIC quoted is Akaike's not Hastie's. (The additive con-
     stant is chosen so that in that case AIC is identical
     to Mallows' Cp if the scale is known.)

     There is a potential problem in using `glm' fits with a
     variable `scale', as in that case the deviance is not
     simply related to the maximized log-likelihood. The
     function `extractAIC.glm' makes the appropriate adjust-
     ment for a `gaussian' family, but may need to be
     amended for other cases. (The `binomial' and `poisson'
     families have fixed `scale' by default and do not cor-
     respond to a particular maximum-likelihood problem for
     variable `scale'.)

     Where a conventional deviance exists (e.g. for `lm',
     `aov' and `glm' fits) this is quoted in the analysis of
     variance table: it is the unscaled deviance.

_V_a_l_u_e_:

     the stepwise-selected model is returned, with up to two
     additional components.  There is an `"anova"' component
     corresponding to the steps taken in the search, as well
     as a `"keep"' component if the `keep=' argument was
     supplied in the call. The `"Resid. Dev"' column of the
     analysis of deviance table refers to a constant minus
     twice the maximized log likelihood: it will be a
     deviance only in cases where a saturated model is well-
     defined (thus excluding `lm', `aov' and `survreg' fits,
     for example).

_S_e_e _A_l_s_o_:

     `addterm', `dropterm', `step'

_E_x_a_m_p_l_e_s_:

     data(quine)
     quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)
     quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)
     quine.stp <- stepAIC(quine.nxt,
         scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1),
         trace = FALSE)
     quine.stp$anova

     data(cpus)
     cpus1 <- cpus
     attach(cpus)
     for(v in names(cpus)[2:6])
       cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])),
                         include.lowest = TRUE)
     detach()
     cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
     cpus.samp <- sample(1:209, 100)
     cpus.lm <- lm(log10(perf) ~ ., data=cpus1[cpus.samp,2:8])
     cpus.lm2 <- stepAIC(cpus.lm, trace=FALSE)
     cpus.lm2$anova

     example(birthwt)
     birthwt.glm <- glm(low ~ ., family=binomial, data=bwt)
     birthwt.step <- stepAIC(birthwt.glm, trace=FALSE)
     birthwt.step$anova
     birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2)
         + I(scale(lwt)^2), trace=FALSE)
     birthwt.step2$anova

     quine.nb <- glm.nb(Days ~ .^4, data=quine)
     quine.nb2 <- stepAIC(quine.nb)
     quine.nb2$anova

