

ppr {modreg}                                 R Documentation

_P_r_o_j_e_c_t_i_o_n _P_u_r_s_u_i_t _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n_:

     Fit a projection pursuit regression model.

_U_s_a_g_e_:

     ppr(formula, data = sys.parent(), weights,
         subset, na.action, contrasts = NULL,
         ww = rep(1,q), nterms, max.terms=nterms, optlevel = 2,
         sm.method = c("supsmu", "spline", "gcvspline"),
         bass = 0, span = 0, df = 5, gcvpen = 1)

     ppr(x, y, weights = rep(1,n),
         ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2,
         sm.method = c("supsmu", "spline", "gcvspline"),
         bass = 0, span = 0, df = 5, gcvpen = 1)

_A_r_g_u_m_e_n_t_s_:

 formula: a regression formula specifying one or more
          response variables and the explanatory variables.

       x: matrix of explanatory variables.  Rows represent
          observations, and columns represent variables.
          Missing values are not accepted.

  nterms: number of terms to include in the final model.

    data: Data frame from which variables specified in `for-
          mula' are preferentially to be taken.

 weights: a vector of weights for each case.

      ww: a vector of weights for each response, so the fit
          criterion is the sum over case `i' and responses
          `j' of `w_i ww_j (y_ij - fit_ij)^2' divided by the
          sum of `w_i'.

  subset: An index vector specifying the cases to be used in
          the training sample.  (NOTE: If given, this argu-
          ment must be named.)

na.action: A function to specify the action to be taken if
          `NA's are found. The default action is for the
          procedure to fail.  An alternative is `na.omit',
          which leads to rejection of cases with missing
          values on any required variable.  (NOTE: If given,
          this argument must be named.)

contrasts: the contrasts to be used when any factor explana-
          tory variables are coded.

max.terms: maximum number of terms to choose from when
          building the model.

optlevel: integer from 0 to 3 which determines the thorough-
          ness of an optimization routine in the SMART pro-
          gram. See the Details section.

sm.method: the method used for smoothing the ridge func-
          tions.  The default is to use Friedman's super
          smoother `supsmu'.  The alternatives are to use
          the smoothing spline code underlying
          `smooth.spline', either with a specified (equiva-
          lent) degrees of freedom for each ridge functions,
          or to allow the smoothness to be chosen by GCV.

    bass: super smoother bass tone control used with auto-
          matic span selection (see `supsmu'); the range of
          values is 0 to 10, with larger values resulting in
          increased smoothing.

    span: super smoother span control (see `supsmu').  The
          default, `0', results in automatic span selection
          by local cross validation. `span' can also take a
          value in `(0, 1]'.

      df: if `sm.method' is `"spline"' specifies the smooth-
          ness of each ridge term via the requested equiva-
          lent degrees of freedom.

  gcvpen: if `sm.method' is `"gcvspline"' this is the
          penalty used in the GCV selection for each degree
          of freedom used.

_D_e_t_a_i_l_s_:

     The basic method is given by Friedman (1984), and is
     essentially the same code used by S-PLUS's `ppreg'.
     This code is extremely sensitive to the compiler used.

     The algorithm first adds up to `max.terms' ridge terms
     one at a time; it will use less if it is unable to find
     a term to add that makes sufficient difference.  It
     then removes the least "important" term at each step
     until `nterm' terms are left.

     The levels of optimization (argument `optlevel') differ
     in how thoroughly the models are refitted during this
     process.  At level 0 the existing ridge terms are not
     refitted.  At level 1 the projection directions are not
     refitted, but the ridge functions and the regression
     coefficients are.  Levels 2 and 3 refit all the terms
     and are equivalent for one response; level 3 is more
     careful to re-balance the contributions from each
     regressor at each step and so is a little less likely
     to converge to a saddle point of the sum of squares
     criterion.

_V_a_l_u_e_:

     A list with the following components, many of which are
     for use by the method functions.

    call: the matched call

       p: the number of explanatory variables (after any
          coding)

       q: the number of response variables

      ml: the argument `max.terms'

     gof: the overall residual (weighted) sum of squares for
          the selected model

    gofn: the overall residual (weighted) sum of squares
          against the number of terms, up to `max.terms'.
          Will be invalid (and zero) for less than `nterms'.

      df: the argument `df'

     edf: if `sm.method' is `"spline"' or `"gcvspline"' the
          equivalent number of degrees of freedom for each
          ridge term used.

  xnames: the names of the explanatory variables

  ynames: the names of the response variables

   alpha: a matrix of the projection directions, with a col-
          umn for each ridge term

    beta: a matrix of the coefficients applied for each
          response to the ridge terms: the rows are the
          responses and the columns the ridge terms

      yb: the weighted means of each response

      ys: the overall scale factor used: internally the
          responses are divided by `ys' to have unit total
          weighted sum of squares.

fitted.values: the fitted values, as a matrix if `q > 1'

residuals: the residuals, as a matrix if `q > 1'

    smod: internal work array, which includes the ridge
          functions evaluated at the training set points.

_R_e_f_e_r_e_n_c_e_s_:

     Friedman, J. H. and Stuetzle, W. (1981) Projection pur-
     suit regression.  Journal of the American Statistical
     Association, 76, 817-823.

     Friedman, J. H. (1984) SMART User's Guide.  Laboratory
     for Computational Statistics, Stanford University Tech-
     nical Report No. 1.

_S_e_e _A_l_s_o_:

     `plot.ppr', `supsmu', `smooth.spline'

_E_x_a_m_p_l_e_s_:

     # Note: your numerical values may differ
     data(rock)
     attach(rock)
     area1 <- area/10000; peri1 <- peri/10000
     rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                     data=rock, nterms=2, max.terms=5)
     rock.ppr
     # Call:
     # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
     #     nterms = 2, max.terms = 5)
     #
     # Goodness of fit:
     #  2 terms  3 terms  4 terms  5 terms
     # 8.737806 5.289517 4.745799 4.490378

     summary(rock.ppr)
     # .....  (same as above)
     # .....
     #
     # Projection direction vectors:
     #       term 1      term 2
     # area1  0.34357179  0.37071027
     # peri1 -0.93781471 -0.61923542
     # shape  0.04961846  0.69218595
     #
     # Coefficients of ridge terms:
     #    term 1    term 2
     # 1.6079271 0.5460971

     par(mfrow=c(3,2))# maybe: , pty="s")
     plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)")
     plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)")
     plot(update(rock.ppr, sm.method="gcv", gcvpen=2),
          main = "update(..., sm.method=

