cor {stats} | R Documentation |
Correlation, Variance and Covariance (Matrices)
Description
var
, cov
and cor
compute the variance of x
and the covariance or correlation of x
and y
if these
are vectors. If x
and y
are matrices then the
covariances (or correlations) between the columns of x
and the
columns of y
are computed.
cov2cor
scales a covariance matrix into the corresponding
correlation matrix efficiently.
Usage
var(x, y = NULL, na.rm = FALSE, use)
cov(x, y = NULL, use = "all.obs",
method = c("pearson", "kendall", "spearman"))
cor(x, y = NULL, use = "all.obs",
method = c("pearson", "kendall", "spearman"))
cov2cor(V)
Arguments
x |
a numeric vector, matrix or data frame. |
y |
|
na.rm |
logical. Should missing values be removed? |
use |
an optional character string giving a
method for computing covariances in the presence
of missing values. This must be (an abbreviation of) one of the strings
|
method |
a character string indicating which correlation
coefficient (or covariance) is to be computed. One of
|
V |
symmetric numeric matrix, usually positive definite such as a covariance matrix. |
Details
For cov
and cor
one must either give a matrix or
data frame for x
or give both x
and y
.
var
is just another interface to cov
, where
na.rm
is used to determine the default for use
when that
is unspecified. If na.rm
is TRUE
then the complete
observations (rows) are used (use = "complete"
) to compute the
variance. Otherwise (use = "all"
), var
will give an
error if there are missing values.
If use
is "all.obs"
, then the presence
of missing observations will produce an error.
If use
is "complete.obs"
then missing values
are handled by casewise deletion. Finally, if use
has the
value "pairwise.complete.obs"
then the correlation between
each pair of variables is computed using all complete pairs
of observations on those variables.
This can result in covariance or correlation matrices which are not
positive semidefinite.
The denominator n - 1
is used which gives an unbiased estimator
of the (co)variance for i.i.d. observations.
These functions return NA
when there is only one
observation (whereas S-PLUS has been returning NaN
), and
fail if x
has length zero.
For cor()
, if method
is "kendall"
or
"spearman"
, Kendall's \tau
or Spearman's
\rho
statistic is used to estimate a rank-based measure of
association. These are more robust and have been recommended if the
data do not necessarily come from a bivariate normal distribution.
For cov()
, a non-Pearson method is unusual but available for
the sake of completeness. Note that "spearman"
basically
computes cor(R(x), R(y))
(or cov(.,.)
) where
R(u) := rank(u, na.last="keep")
. Notice also that the ranking
is (currently) done removing only cases that are missing on the
variable itself, which may not be what you expect if you let
use
be "complete.obs"
or "pairwise.complete.obs"
.
Scaling a covariance matrix into a correlation one can be achieved in
many ways, mathematically most appealing by multiplication with a
diagonal matrix from left and right, or more efficiently by using
sweep(.., FUN = "/")
twice. The cov2cor
function
is even a bit more efficient, and provided mostly for didactical
reasons.
Value
For r <- cor(*, use = "all.obs")
, it is now guaranteed that
all(r <= 1)
.
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth \& Brooks/Cole.
See Also
cor.test
for confidence intervals (and tests).
cov.wt
for weighted covariance computation.
sd
for standard deviation (vectors).
Examples
var(1:10)# 9.166667
var(1:5,1:5)# 2.5
## Two simple vectors
cor(1:10,2:11)# == 1
## Correlation Matrix of Multivariate sample:
(Cl <- cor(longley))
## Graphical Correlation Matrix:
symnum(Cl) # highly correlated
## Spearman's rho and Kendall's tau
symnum(clS <- cor(longley, method = "spearman"))
symnum(clK <- cor(longley, method = "kendall"))
## How much do they differ?
i <- lower.tri(Cl)
cor(cbind(P = Cl[i], S = clS[i], K = clK[i]))
## cov2cor() scales a covariance matrix by its diagonal
## to become the correlation matrix.
cov2cor # see the function definition {and learn ..}
stopifnot(all.equal(Cl, cov2cor(cov(longley))),
all.equal(cor(longley, method="kendall"),
cov2cor(cov(longley, method="kendall"))))
##--- Missing value treatment:
C1 <- cov(swiss)
range(eigen(C1, only=TRUE)$val) # 6.19 1921
swM <- swiss
swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"
try(cov(swM)) # Error: missing obs...
C2 <- cov(swM, use = "complete")
range(eigen(C2, only=TRUE)$val) # 6.46 1930
C3 <- cov(swM, use = "pairwise")
range(eigen(C3, only=TRUE)$val) # 6.19 1938
(scM <- symnum(cor(swM, method = "kendall", use = "complete")))
## Kendall's tau doesn't change much: identical symnum codings!
identical(scM, symnum(cor(swiss, method = "kendall")))
all.equal(cov2cor(cov(swM, method = "kendall", use = "pairwise")),
cor(swM, method = "kendall", use = "pairwise"))