quantile {stats} | R Documentation |
The generic function quantile
produces sample quantiles
corresponding to the given probabilities.
The smallest observation corresponds to a probability of 0 and the
largest to a probability of 1.
quantile(x, ...)
## Default S3 method:
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
names = TRUE, type = 7, ...)
x |
numeric vector whose sample quantiles are wanted.
|
probs |
numeric vector of probabilities with values in
|
na.rm |
logical; if true, any |
names |
logical; if true, the result has a |
type |
an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used. |
... |
further arguments passed to or from other methods. |
A vector of length length(probs)
is returned;
if names = TRUE
, it has a names
attribute.
NA
and NaN
values in probs
are
propagated to the result.
quantile
returns estimates of underlying distribution quantiles
based on one or two order statistics from the supplied elements in
x
at probabilities in probs
. One of the nine quantile
algorithms discussed in Hyndman and Fan (1996), selected by
type
, is employed.
Sample quantiles of type i
are defined by
Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}
where 1 \le i \le 9
,
\frac{j - m}{n} \le p < \frac{j - m + 1}{n}
,
x_{j}
is the j
th order statistic, n
is the
sample size, and m
is a constant determined by the sample
quantile type. Here \gamma
depends on the fractional part
of g = np+m-j
.
For the continuous sample quantile types (4 through 9), the sample
quantiles can be obtained by linear interpolation between the k
th
order statistic and p(k)
:
p(k) = \frac{k - \alpha} {n - \alpha - \beta + 1}
where
\alpha
and \beta
are constants determined by
the type. Further, m = \alpha + p \left( 1 - \alpha - \beta
\right)
, and \gamma = g
.
Discontinuous sample quantile types 1, 2, and 3
Inverse of empirical distribution function.
Similar to type 1 but with averaging at discontinuities.
SAS definition: nearest even order statistic.
Continuous sample quantile types 4 through 9
p(k) = \frac{k}{n}
.
That is, linear interpolation of the empirical cdf.
p(k) = \frac{k - 0.5}{n}
.
That is a piecewise linear function where the knots are the values
midway through the steps of the empirical cdf. This is popular
amongst hydrologists.
p(k) = \frac{k}{n + 1}
.
Thus p(k) = \mbox{E}[F(x_{k})]
.
This is used by Minitab and by SPSS.
p(k) = \frac{k - 1}{n - 1}
.
In this case, p(k) = \mbox{mode}[F(x_{k})]
.
This is used by S.
p(k) = \frac{k - \frac{1}{3}}{n + \frac{1}{3}}
.
Then p(k) \approx \mbox{median}[F(x_{k})]
.
The resulting quantile estimates are approximately median-unbiased
regardless of the distribution of x
.
p(k) = \frac{k - \frac{3}{8}}{n + \frac{1}{4}}
.
The resulting quantile estimates are approximately unbiased for
the expected order statistics if x
is normally distributed.
Hyndman and Fan (1996) recommend type 8. The default method is type 7, as used by S and by R < 2.0.0.
of the version used in R >= 2.0.0, Ivan Frohne and Rob J Hyndman.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician, 50, 361–365.
ecdf
for empirical distributions of which
quantile
is an inverse;
boxplot.stats
and fivenum
for computing
other versions of quartiles, etc.
quantile(x <- rnorm(1001))# Extremes & Quartiles by default
quantile(x, probs=c(.1,.5,1,2,5,10,50, NA)/100)
### Compare different types
p <- c(0.1,0.5,1,2,5,10,50)/100
res <- matrix(as.numeric(NA), 9, 7)
for(type in 1:9) res[type, ] <- y <- quantile(x, p, type=type)
dimnames(res) <- list(1:9, names(y))
round(res, 3)