step                  package:base                  R Documentation

_C_h_o_o_s_e _a _m_o_d_e_l _b_y _A_I_C _i_n _a _S_t_e_p_w_i_s_e _A_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     Select a formula-based model by AIC.

_U_s_a_g_e:

     step(object, scope, scale = 0,
          direction = c("both", "backward", "forward"), 
          trace = 1, keep = NULL, steps = 1000, k = 2, ...)

_A_r_g_u_m_e_n_t_s:

  object: an object representing a model of an appropriate class. This
          is used as the initial model in the stepwise search. 

   scope: defines the range of models examined in the stepwise search. 

   scale: used in the definition of the AIC statistic for selecting the
          models, currently only for `lm', `aov' and `glm' models. 

direction: the mode of stepwise search, can be one of `"both"',
          `"backward"', or `"forward"', with a default of `"both"'.  If
          the `scope' argument is missing,  the default for `direction'
          is `"backward"'. 

   trace: if positive, information is printed during the running of
          `step'. Larger values may give more detailed information. 

    keep: a filter function whose input is a fitted model object and
          the  associated `AIC' statistic, and whose output is
          arbitrary.  Typically `keep' will select a subset of the
          components of  the object and return them. The default is not
          to keep anything. 

   steps: the maximum number of steps to be considered.  The default is
          1000 (essentially as many as required).  It is typically used
          to stop the process early. 

       k: the multiple of the number of degrees of freedom used for the
          penalty. Only `k = 2' gives the genuine AIC: `k = log(n)' is
          sometimes referred to as BIC or SBC. 

     ...: any additional arguments to `extractAIC'. 

_D_e_t_a_i_l_s:

     `step' uses `add1' and `drop1' repeatedly; it will work for any
     method for which they work, and that is determined by having a
     valid method for `extractAIC'. When the additive constant can be
     chosen so that AIC is equal to Mallows' Cp, this is done and the
     tables are labelled appropriately.

     There is a potential problem in using `glm' fits with a variable
     `scale', as in that case the deviance is not simply related to the
     maximized log-likelihood. The function `extractAIC.glm' makes the
     appropriate adjustment for a `gaussian' family, but may need to be
     amended for other cases. (The `binomial' and `poisson' families
     have fixed `scale' by default and do not correspond to a
     particular maximum-likelihood problem for variable `scale'.)

_V_a_l_u_e:

     the stepwise-selected model is returned, with up to two additional
     components.  There is an `"anova"' component corresponding to the
     steps taken in the search, as well as a `"keep"' component if the
     `keep=' argument was supplied in the call. The `"Resid. Dev"'
     column of the analysis of deviance table refers to a constant
     minus twice the maximized log likelihood: it will be a deviance
     only in cases where a saturated model is well-defined (thus
     excluding `lm', `aov' and `survreg' fits, for example).

_W_a_r_n_i_n_g:

     The model fitting must apply the models to the same dataset. This
     may be a problem if there are missing values and R's default of
     `na.action = na.omit' is used. We suggest you remove the missing
     values first.

_N_o_t_e:

     This function differs considerably from the function in S, which
     uses a number of approximations and does not compute the correct
     AIC.

_A_u_t_h_o_r(_s):

     B. D. Ripley

_S_e_e _A_l_s_o:

     `add1', `drop1'

_E_x_a_m_p_l_e_s:

     example(lm)
     step(lm.D9)  

     data(swiss)
     summary(lm1 <- lm(Fertility ~ ., data = swiss))
     slm1 <- step(lm1)
     summary(slm1)
     slm1$anova

