density {base} | R Documentation |
Kernel Density Estimation
Description
The function density
computes kernel density estimates
with the given kernel and bandwidth.
The generic functions plot
and print
have
methods for density objects.
Usage
density(x, bw, adjust = 1,
kernel=c("gaussian", "epanechnikov", "rectangular", "triangular",
"biweight", "cosine", "optcosine"),
window = kernel, width,
give.Rkern = FALSE,
n = 512, from, to, cut = 3, na.rm = FALSE)
print(dobj)
plot(dobj, main = NULL, xlab = NULL, ylab = "Density", type = "l",
zero.line = TRUE, ...)
Arguments
x |
the data from which the estimate is to be computed. |
bw |
the smoothing bandwidth to be used. The kernels are scaled
such that this is the standard deviation of the smoothing kernel.
It defaults to 0.9 times the
minimum of the standard deviation and the interquartile range divided by
1.34 times the sample size to the negative one-fifth power
(= Silverman's “rule of thumb”) unless the quartiles
coincide where |
adjust |
the bandwidth used is actually |
kernel , window |
a character string giving the smoothing kernel to be used.
This must be one of
|
width |
this exists for compatibility with S; if given, and
|
give.Rkern |
logical; if true, no density is estimated, and
the “canonical bandwidth” of the chosen |
n |
the number of equally spaced points at which the density
is to be estimated. When |
from , to |
the left and right-most points of the grid at which the density is to be estimated. |
cut |
by default, the values of |
na.rm |
logical; if |
dobj |
a “density” object. |
main , xlab , ylab , type |
plotting parameters with useful defaults. |
... |
further plotting parameters. |
zero.line |
logical; if |
Details
The algorithm used in density
disperses the mass of the
empirical distribution function over a regular grid of at least 512
points and then uses the fast Fourier transform to convolve this
approximation with a discretized version of the kernel and then uses
linear approximation to evaluate the density at the specified points.
The statistical properties of a kernel are determined by
\sigma^2_K = \int t^2 K(t) dt
which is always = 1
for our kernels (and hence the bandwidth
bw
is the standard deviation of the kernel) and
R(K) = \int K^2(t) dt
.
MSE-equivalent bandwidths (for different kernels) are proportional to
\sigma_K R(K)
which is scale invariant and for our
kernels equal to R(K)
. This value is returned when
give.Rkern = TRUE
. See the examples for using exact equivalent
bandwidths.
Infinite values in x
are assumed to correspond to a point mass at
+/-Inf
and the density estimate is of the sub-density on
(-Inf, +Inf)
.
Value
If give.Rkern
is true, the number R(K)
, otherwise
an object with class "density"
whose
underlying structure is a list containing the following components.
x |
the |
y |
the estimated density values. |
bw |
the bandwidth used. |
N |
the sample size after elimination of missing values. |
call |
the call which produced the result. |
data.name |
the deparsed name of the |
has.na |
logical, for compatibility (always FALSE). |
References
Silverman, B. W. (1986) Density Estimation. London: Chapman and Hall.
Venables, W. N. and B. D. Ripley (1994, 7, 9) Modern Applied Statistics with S-PLUS. New York: Springer.
Scott, D. W. (1992) Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.
Sheather, S. J. and Jones M. C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683–690.
See Also
hist
.
Examples
plot(density(c(-20,rep(0,98),20)), xlim = c(-4,4))# IQR = 0
# The Old Faithful geyser data
data(faithful)
d <- density(faithful$eruptions, bw = 0.15)
d
plot(d)
plot(d, type = "n")
polygon(d, col = "wheat")
## Missing values:
x <- xx <- faithful$eruptions
x[i.out <- sample(length(x), 10)] <- NA
doR <- density(x, bw = 0.15, na.rm = TRUE)
lines(doR, col = "blue")
points(xx[i.out], rep(.01,10))
(kernels <- eval(formals(density)$kernel))
plot (density(0,bw = 1))
for(i in 2:length(kernels))
lines(density(0,bw = 1, kern = kernels[i]), col = i)
mtext(side = 3, "R's density() kernels with bw = 1")
legend(1.5,.4, leg = kernels, col = seq(kernels),lty = 1, cex = .8, y.int = 1)
(RKs <- cbind(sapply(kernels, function(k)density(kern = k, give.Rkern = TRUE))))
100*round(RKs["epanechnikov",]/RKs, 4) ## Efficiencies
data(precip)
plot(density(precip, n = 2^13))
for(i in 2:length(kernels))
lines(density(precip, kern = kernels[i], n = 2^13), col = i)
mtext(side = 3, "same scale bandwidths, 7 different kernels")
## Bandwidth Adjustment for "Exactly Equivalent Kernels"
h.f <- sapply(kernels, function(k)density(kern = k, give.Rkern = TRUE))
(h.f <- (h.f["gaussian"] / h.f)^ .2)
## -> 1, 1.01, .995, 1.007,... close to 1 => adjustment barely visible..
plot(density(precip, n = 2^13))
for(i in 2:length(kernels))
lines(density(precip, adjust = h.f[i], kern = kernels[i], n = 2^13),
col = i)
mtext(side = 3, "equivalent bandwidths, 7 different kernels")
legend(55,.035, leg = kernels, col = seq(kernels), lty = 1)