

loglm(MASS)                                  R Documentation

_F_i_t _L_o_g_-_L_i_n_e_a_r _M_o_d_e_l_s _b_y _I_t_e_r_a_t_i_v_e _P_r_o_p_o_r_t_i_o_n_a_l _S_c_a_l_i_n_g

_D_e_s_c_r_i_p_t_i_o_n_:

     This function provides a front-end to the standard
     function, `loglin', to allow log-linear models to be
     specified and fitted in a manner similar to that of
     other fitting functions, such as `glm'.

_U_s_a_g_e_:

     loglm(formula, data=sys.parent(), subset, na.action, ...)

_A_r_g_u_m_e_n_t_s_:

 formula: A linear model formula specifying the log-linear
          model.

          If the left-hand side is empty, the `data' argu-
          ment is required and must be a (complete) array of
          frequencies.  In this case the variables on the
          right-hand side may be the names of the `dimnames'
          attribute of the frequency array, or may be the
          positive integers: 1, 2, 3, ...{} used as alterna-
          tive names for the 1st, 2nd, 3rd, ...{} dimension
          (classifying factor).  If the left-hand side is
          not empty it specifies a vector of frequencies.
          In this case the data argument, if present, must
          be a data frame from which the left-hand side vec-
          tor and the classifying factors on the right-hand
          side are (preferentially) obtained.  The usual
          abbreviation of a `.' to stand for "all other
          variables in the data frame" is allowed.  Any non-
          factors on the right-hand side of the formula are
          coerced to factor.

    data: Numeric array or data frame.  In the first case it
          specifies the array of frequencies; in then second
          it provides the data frame from which the vari-
          ables occurring in the formula are preferentially
          obtained in the usual way.

          This argument may be the result of a call to
          `crosstabs'.

  subset: Specifies a subset of the rows in the data frame
          to be used.  The default is to take all rows.

na.action: Specifies a method for handling missing observa-
          tions.  The default is to fail if missing values
          are present.

keep.frequencies: If `TRUE' specifies that the (possibly
          constructed) array of frequencies is to be
          retained as part of the fitted model object.  The
          default action is to use the same value as that
          used for `fit'.

     ...: May supply other arguments to the function
          `loglin'.

_D_e_t_a_i_l_s_:

     If the left-hand side of the formula is empty the
     `data' argument supplies the frequency array and the
     right-hand side of the formula is used to construct the
     list of fixed faces as required by `loglin'.  Struc-
     tural zeros may be specified by giving a `start' argu-
     ment with those entries set to zero, as described in
     the help information for `loglin'.

     If the left-hand side is not empty, all variables on
     the right-hand side are regarded as classifying factors
     and an array of frequencies is constructed.  If some
     cells in the complete array are not specified they are
     treated as structural zeros.  The right-hand side of
     the formula is again used to construct the list of
     faces on which the observed and fitted totals must
     agree, as required by `loglin'.  Hence terms such as
     `a:b', `a*b' and `a/b' are all equivalent.

_V_a_l_u_e_:

     An object of class `loglm' conveying the results of the
     fitted log-linear model.  Methods exist for the generic
     functions `print', `summary', `deviance', `fitted',
     `coef', `resid', `anova' and `update', which perform
     the expected tasks.  Only log-likelihood ratio tests
     are allowed using `anova'.

     The deviance is simply an alternative name for the log-
     likelihood ratio statistic for testing the current
     model within a saturated model, in accordance with
     standard usage in generalized linear models.

_W_A_R_N_I_N_G_:

     If structural zeros are present, the calculation of
     degrees of freedom may not be correct.  `loglin' itself
     takes no action to allow for structural zeros.  `loglm'
     deducts one degree of freedom for each structural zero,
     but cannot make allowance for gains in error degrees of
     freedom due to loss of dimension in the model space.
     (This would require checking the rank of the model
     matrix, but since iterative proportional scaling meth-
     ods are developed largely to avoid constructing the
     model matrix explicitly, the computation is at least
     difficult.)

     When structural zeros (or zero fitted values) are pre-
     sent the estimated coefficients will not be available
     due to infinite estimates.  The deviances will normally
     continue to be correct, though.

_S_e_e _A_l_s_o_:

     `loglin'

_E_x_a_m_p_l_e_s_:

     # The data frames  Cars93, minn38 and quine are available
     # in the MASS library.

     # Case 1: frequencies specified as an array.
     data(minn38)
     sapply(minn38, function(x) length(levels(x)))
     ## hs phs fol sex f
     ##  3   4   7   2 0
     minn38a <- array(0, c(3,4,7,2), lapply(minn38[, -5], levels))
     minn38a[data.matrix(minn38[,-5])] <- minn38$f
     fm <- loglm(~1 + 2 + 3 + 4, minn38a)  # numerals as names.
     deviance(fm)
     ##[1] 3711.9
     fm1 <- update(fm, .~.^2)
     fm2 <- update(fm, .~.^3, print = TRUE)
     ## 5 iterations: deviation 0.0750732
     anova(fm, fm1, fm2)
     LR tests for hierarchical log-linear models

     Model 1:
       ~  1 + 2 + 3 + 4
     Model 2:
      .  ~  1 + 2 + 3 + 4 + 1:2 + 1:3 + 1:4 + 2:3 + 2:4 + 3:4
     Model 3:
      .  ~  1 + 2 + 3 + 4 + 1:2 + 1:3 + 1:4 + 2:3 + 2:4 + 3:4 +
             1:2:3 + 1:2:4 + 1:3:4 + 2:3:4

               Deviance  df Delta(Dev) Delta(df) P(> Delta(Dev)
       Model 1 3711.915 155
       Model 2  220.043 108   3491.873        47        0.00000
       Model 3   47.745  36    172.298        72        0.00000
     Saturated    0.000   0     47.745        36        0.09114

     # Case 1. An array generated with crosstabs.

     > loglm(~Type + Origin, crosstabs(~Type + Origin, Cars93))
     Call:
     loglm(formula =  ~ Type + Origin, data = crosstabs( ~ Type +
             Origin, Cars93))

     Statistics:
                         X^2 df  P(> X^2)
     Likelihood Ratio 18.362  5 0.0025255
              Pearson 14.080  5 0.0151101

     # Case 2.  Frequencies given as a vector in a data frame
     data(quine)
     names(quine)
     ## [1] "Eth"  "Sex"  "Age"  "Lrn"  "Days"
     fm <- loglm(Days ~ .^2, quine)
     gm <- glm(Days ~ .^2, poisson, quine)  # check glm.
     c(deviance(fm), deviance(gm))          # deviances agree
     ## [1] 1368.7 1368.7
     c(fm$df, gm$df)                        # resid df do not!
     ## [1] 127 128
     # The loglm residual degrees of freedom is wrong because of
     # a non-detectable redundancy in the model matrix.

