| findInterval {base} | R Documentation |
Find Interval Numbers or Indices
Description
Find the indices of x in vec, where vec must be
sorted (non-decreasingly); i.e., if i <- findInterval(x,v),
we have v_{i_j} \le x_j < v_{i_j + 1}
where v_0 := -\infty,
v_{N+1} := +\infty, and N <- length(vec).
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed and all.inside.
Usage
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)
Arguments
x |
numeric. |
vec |
numeric, sorted (weakly) increasingly, of length |
rightmost.closed |
logical; if true, the rightmost interval,
|
all.inside |
logical; if true, the returned indices are coerced
into |
Details
The function findInterval finds the index of one vector x in
another, vec, where the latter must be non-decreasing. Where
this is trivial, equivalent to apply( outer(x, vec, ">="), 1, sum),
as a matter of fact, the internal algorithm uses interval search
ensuring O(n \log N) complexity where
n <- length(x) (and N <- length(vec)). For (almost)
sorted x, it will be even faster, basically O(n).
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X)) is
identical to n F_n(t; X_1,\dots,X_n) where F_n is the empirical distribution
function of X_1,\dots,X_n.
When rightmost.closed = TRUE, the result
for x[j] = vec[N] ( = \max(vec)), is N - 1 as for
all other values in the last interval.
Value
vector of length length(x) with values in 0:N where
N <- length(vec), or values coerced to 1:(N-1) iff
all.inside = TRUE (equivalently coercing all x values
inside the intervals).
Author(s)
Martin Maechler
See Also
approx(*, method = "constant") which is a
generalization of findInterval(), ecdf for
computing the empirical distribution function which is (up to a factor
of n) also basically the same as findInterval(.).
Examples
N <- 100
X <- sort(round(rt(N, df=2), 2))
tt <- c(-100, seq(-2,2, len=201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
## See that this is N * Fn(.) :
tt <- c(tt,X)
eps <- 100 * .Machine$double.eps
require(stepfun)
stopifnot( it[c(1,203)] == c(0, 100),
all.equal(N * ecdf(X)(tt),
findInterval(tt, X), tol = eps),
findInterval(tt,X) == apply( outer(tt, X, ">="), 1, sum)
)