ppr                  package:modreg                  R Documentation

_P_r_o_j_e_c_t_i_o_n _P_u_r_s_u_i_t _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a projection pursuit regression model.

_U_s_a_g_e:

     ppr(formula, data = sys.parent(), weights,
         subset, na.action, contrasts = NULL,
         ww = rep(1,q), nterms, max.terms=nterms, optlevel = 2,
         sm.method = c("supsmu", "spline", "gcvspline"),
         bass = 0, span = 0, df = 5, gcvpen = 1)

     ppr(x, y, weights = rep(1,n),
         ww = rep(1,q), nterms, max.terms = nterms, optlevel = 2,
         sm.method = c("supsmu", "spline", "gcvspline"),
         bass = 0, span = 0, df = 5, gcvpen = 1)

_A_r_g_u_m_e_n_t_s:

 formula: a regression formula specifying one or more response
          variables and the explanatory variables. 

       x: matrix of explanatory variables.  Rows represent
          observations, and columns represent variables.  Missing
          values are not accepted. 

  nterms: number of terms to include in the final model.

    data: Data frame from which variables specified in `formula' are
          preferentially to be taken. 

 weights: a vector of weights `w_i' for each case.

      ww: a vector of weights for each response, so the fit criterion
          is the sum over case `i' and responses `j' of `w_i ww_j (y_ij
          - fit_ij)^2' divided by the sum of `w_i'. 

  subset: An index vector specifying the cases to be used in the
          training sample.  (NOTE: If given, this argument must be
          named.) 

na.action: A function to specify the action to be taken if `NA's are
          found. The default action is for the procedure to fail.  An
          alternative is `na.omit', which leads to rejection of cases
          with missing values on any required variable.  (NOTE: If
          given, this argument must be named.) 

contrasts: the contrasts to be used when any factor explanatory
          variables are coded. 

max.terms: maximum number of terms to choose from when building the
          model. 

optlevel: integer from 0 to 3 which determines the thoroughness of an
          optimization routine in the SMART program. See the Details
          section. 

sm.method: the method used for smoothing the ridge functions.  The
          default is to use Friedman's super smoother `supsmu'.  The
          alternatives are to use the smoothing spline code underlying
          `smooth.spline', either with a specified (equivalent) degrees
          of freedom for each ridge functions, or to allow the
          smoothness to be chosen by GCV. 

    bass: super smoother bass tone control used with automatic span
          selection (see `supsmu'); the range of values is 0 to 10,
          with larger values resulting in increased smoothing. 

    span: super smoother span control (see `supsmu').  The default,
          `0', results in automatic span selection by local cross
          validation. `span' can also take a value in `(0, 1]'. 

      df: if `sm.method' is `"spline"' specifies the smoothness of each
          ridge term via the requested equivalent degrees of freedom. 

  gcvpen: if `sm.method' is `"gcvspline"' this is the penalty used in
          the GCV selection for each degree of freedom used. 

_D_e_t_a_i_l_s:

     The basic method is given by Friedman (1984), and is essentially
     the same code used by S-PLUS's `ppreg'.  This code is extremely
     sensitive to the compiler used.

     The algorithm first adds up to `max.terms' ridge terms one at a
     time; it will use less if it is unable to find a term to add that
     makes sufficient difference.  It then removes the least
     "important" term at each step until `nterm' terms are left.

     The levels of optimization (argument `optlevel') differ in how
     thoroughly the models are refitted during this process. At level 0
     the existing ridge terms are not refitted.  At level 1 the
     projection directions are not refitted, but the ridge functions
     and the regression coefficients are. Levels 2 and 3 refit all the
     terms and are equivalent for one response; level 3 is more careful
     to re-balance the contributions from each regressor at each step
     and so is a little less likely to converge to a saddle point of
     the sum of squares criterion.

_V_a_l_u_e:

     A list with the following components, many of which are for use by
     the method functions.

    call: the matched call

       p: the number of explanatory variables (after any coding)

       q: the number of response variables

      mu: the argument `nterms'

      ml: the argument `max.terms'

     gof: the overall residual (weighted) sum of squares for the
          selected model

    gofn: the overall residual (weighted) sum of squares against the
          number of terms, up to `max.terms'.  Will be invalid (and
          zero) for less than `nterms'.

      df: the argument `df'

     edf: if `sm.method' is `"spline"' or `"gcvspline"' the equivalent
          number of degrees of freedom for each ridge term used.

  xnames: the names of the explanatory variables

  ynames: the names of the response variables

   alpha: a matrix of the projection directions, with a column for each
          ridge term

    beta: a matrix of the coefficients applied for each response to the
          ridge terms: the rows are the responses and the columns the
          ridge terms

      yb: the weighted means of each response

      ys: the overall scale factor used: internally the responses are
          divided by `ys' to have unit total weighted sum of squares.

fitted.values: the fitted values, as a matrix if `q > 1'

residuals: the residuals, as a matrix if `q > 1'

    smod: internal work array, which includes the ridge functions
          evaluated at the training set points.

_R_e_f_e_r_e_n_c_e_s:

     Friedman, J. H. and Stuetzle, W. (1981) Projection pursuit
     regression. Journal of the American Statistical Association, 76,
     817-823.

     Friedman, J. H. (1984) SMART User's Guide. Laboratory for
     Computational Statistics, Stanford University Technical Report No.
     1.

_S_e_e _A_l_s_o:

     `plot.ppr', `supsmu', `smooth.spline'

_E_x_a_m_p_l_e_s:

     # Note: your numerical values may differ
     data(rock)
     attach(rock)
     area1 <- area/10000; peri1 <- peri/10000
     rock.ppr <- ppr(log(perm) ~ area1 + peri1 + shape,
                     data = rock, nterms = 2, max.terms = 5)
     rock.ppr
     # Call:
     # ppr.formula(formula = log(perm) ~ area1 + peri1 + shape, data = rock,
     #     nterms = 2, max.terms = 5)
     #
     # Goodness of fit:
     #  2 terms  3 terms  4 terms  5 terms
     # 8.737806 5.289517 4.745799 4.490378

     summary(rock.ppr)
     # .....  (same as above)
     # .....
     #
     # Projection direction vectors:
     #       term 1      term 2
     # area1  0.34357179  0.37071027
     # peri1 -0.93781471 -0.61923542
     # shape  0.04961846  0.69218595
     #
     # Coefficients of ridge terms:
     #    term 1    term 2
     # 1.6079271 0.5460971

     par(mfrow=c(3,2))# maybe: , pty="s")
     plot(rock.ppr, main="ppr(log(perm)~ ., nterms=2, max.terms=5)")
     plot(update(rock.ppr, bass=5), main = "update(..., bass = 5)")
     plot(update(rock.ppr, sm.method="gcv", gcvpen=2),
          main = "update(..., sm.method=\"gcv\", gcvpen=2)")

