R: Sample Quantiles

quantile {stats}

R Documentation

Sample Quantiles

Description

The generic function quantile produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.

Usage

quantile(x, ...)

## Default S3 method:
quantile(x, probs = seq(0, 1, 0.25), na.rm = FALSE,
         names = TRUE, type = 7, ...)

Arguments

`x`	numeric vector whose sample quantiles are wanted. `NA` and `NaN` values are not allowed unless `na.rm` is `TRUE`.
`probs`	numeric vector of probabilities with values in `[0,1]`. (As from R 2.8.0 values up to 2e-14 outside that range are accepted and moved to the nearby endpoint.
`na.rm`	logical; if true, any `NA` and `NaN`'s are removed from `x` before the quantiles are computed.
`names`	logical; if true, the result has a `names` attribute. Set to `FALSE` for speedup with many `probs`.
`type`	an integer between 1 and 9 selecting one of the nine quantile algorithms detailed below to be used.
`...`	further arguments passed to or from other methods.

Details

A vector of length length(probs) is returned; if names = TRUE, it has a names attribute.

NA and NaN values in probs are propagated to the result.

Types

quantile returns estimates of underlying distribution quantiles based on one or two order statistics from the supplied elements in x at probabilities in probs. One of the nine quantile algorithms discussed in Hyndman and Fan (1996), selected by type, is employed.

Sample quantiles of type i are defined by

Q_{i}(p) = (1 - \gamma)x_{j} + \gamma x_{j+1}

where 1 \le i \le 9, \frac{j - m}{n} \le p < \frac{j - m + 1}{n}, x_{j} is the jth order statistic, n is the sample size, and m is a constant determined by the sample quantile type. Here \gamma depends on the fractional part of g = np+m-j.

For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the kth order statistic and p(k):

p(k) = \frac{k - \alpha} {n - \alpha - \beta + 1}

where \alpha and \beta are constants determined by the type. Further, m = \alpha + p \left( 1 - \alpha - \beta \right), and \gamma = g.

Discontinuous sample quantile types 1, 2, and 3

Type 1: Inverse of empirical distribution function.
Type 2: Similar to type 1 but with averaging at discontinuities.
Type 3: SAS definition: nearest even order statistic.

Continuous sample quantile types 4 through 9

Type 4: p(k) = \frac{k}{n}. That is, linear interpolation of the empirical cdf.
Type 5: p(k) = \frac{k - 0.5}{n}. That is a piecewise linear function where the knots are the values midway through the steps of the empirical cdf. This is popular amongst hydrologists.
Type 6: p(k) = \frac{k}{n + 1}. Thus p(k) = \mbox{E}[F(x_{k})]. This is used by Minitab and by SPSS.
Type 7: p(k) = \frac{k - 1}{n - 1}. In this case, p(k) = \mbox{mode}[F(x_{k})]. This is used by S.
Type 8: p(k) = \frac{k - \frac{1}{3}}{n + \frac{1}{3}}. Then p(k) \approx \mbox{median}[F(x_{k})]. The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x.
Type 9: p(k) = \frac{k - \frac{3}{8}}{n + \frac{1}{4}}. The resulting quantile estimates are approximately unbiased for the expected order statistics if x is normally distributed.

Hyndman and Fan (1996) recommend type 8. The default method is type 7, as used by S and by R < 2.0.0.

Author(s)

of the version used in R >= 2.0.0, Ivan Frohne and Rob J Hyndman.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician, 50, 361–365.

Examples

quantile(x <- rnorm(1001))# Extremes & Quartiles by default
quantile(x,  probs=c(.1,.5,1,2,5,10,50, NA)/100)

### Compare different types
p <- c(0.1,0.5,1,2,5,10,50)/100
res <- matrix(as.numeric(NA), 9, 7)
for(type in 1:9) res[type, ] <- y <- quantile(x,  p, type=type)
dimnames(res) <- list(1:9, names(y))
round(res, 3)

[Package stats version 2.9.0 ]