cor                   package:base                   R Documentation

_C_o_r_r_e_l_a_t_i_o_n, _V_a_r_i_a_n_c_e _a_n_d _C_o_v_a_r_i_a_n_c_e (_M_a_t_r_i_c_e_s)

_D_e_s_c_r_i_p_t_i_o_n:

     `var', `cov' and `cor' compute the variance of `x' and the
     covariance or correlation of `x' and `y' if these are vectors.  If
     `x' and `y' are matrices then the covariance (correlation) between
     the columns of `x' and the columns of `y' are computed.

_U_s_a_g_e:

     var(x, y = NULL, na.rm = FALSE, use)
     cor(x, y = NULL, use = "all.obs")
     cov(x, y = NULL, use = "all.obs")

_A_r_g_u_m_e_n_t_s:

       x: a numeric vector, matrix or data frame.

       y: `NULL' (default) or a vector, matrix or data frame with
          compatible dimensions to `x'.  The default is equivalent to
          `y = x' (but more efficient).

     use: an optional character string giving a method for computing
          covariances in the presence of missing values.  This must be
          (an abbreviation of) one of the strings `"all.obs"',
          `"complete.obs"' or `"pairwise.complete.obs"'.

_D_e_t_a_i_l_s:

     `var' is just another interface to `cov', where `na.rm' is used to
     determine the default for `use' when that is unspecified.  If
     `na.rm' is `TRUE' then the complete observations (rows) are used
     (`use = "complete"') to compute the variance.  Otherwise (`use =
     "all"'), `var' will give an error if there are missing values.

     If `use' is `"all.obs"', then the presence of missing observations
     will produce an error. If `use' is `"complete.obs"' then missing
     values are handled by casewise deletion.  Finally, if `use' has
     the value `"pairwise.complete.obs"' then the correlation between
     each pair of variables is computed using all complete pairs of
     observations on those variables. This can result in covariance or
     correlation matrices which are not positive semidefinite.

     The denominator n - 1 is used which gives an unbiased estimator of
     the (co)variance for i.i.d. observations. These functions return
     `NA' when there is only one observation.

_S_e_e _A_l_s_o:

     `cov.wt' for weighted covariance computation.

_E_x_a_m_p_l_e_s:

     var(1:10)# 9.166667

     var(1:5,1:5)# 2.5

     ## Two simple vectors
     cor(1:10,2:11)# == 1

     ## var() & cov() are "really the same":
     stopifnot(var(1:5,0:4) == cov(1:5))

     ## Correlation Matrix of Multivariate sample:
     data(longley)
     (Cl <- cor(longley))
     ## Graphical Correlation Matrix:
     symnum(Cl) # highly correlated

     ##--- Missing value treatment:
     data(swiss)
     C1 <- cov(swiss)
     range(eigen(C1, only=TRUE)$val) # 6.19  1921
     swiss[1,2] <- swiss[7,3] <- swiss[25,5] <- NA # create 3 "missing"

      C2 <- cov(swiss) # Error: missing obs...

     C2 <- cov(swiss, use = "complete")
     range(eigen(C2, only=TRUE)$val) # 6.46  1930
     C3 <- cov(swiss, use = "pairwise")
     range(eigen(C3, only=TRUE)$val) # 6.19  1938

