agrep {base} | R Documentation |
Searches for approximate matches to pattern
(the first argument)
within the string x
(the second argument) using the Levenshtein
edit distance.
agrep(pattern, x, ignore.case = FALSE, value = FALSE,
max.distance = 0.1, useBytes = FALSE)
pattern |
a non-empty character string to be matched (not
a regular expression!). Coerced by |
x |
character vector where matches are sought. Coerced by
|
ignore.case |
if |
value |
if |
max.distance |
Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length (will be replaced by the smallest integer not less than the corresponding fraction of the pattern length), or a list with possible components
If |
useBytes |
logical. in a multibyte locale, should the comparison be character-by-chracter (the default) or byte-by-byte. |
The Levenshtein edit distance is used as measure of approximateness: it is the total number of insertions, deletions and substitutions required to transform one string into another.
The function is a simple interface to the apse
library
developed by Jarkko Hietaniemi (also used in the Perl String::Approx
module), modified to work with multibyte character sets. To save
space it only supports the first 65536 characters of UTF-8 (where all
the characters for human languages lie). Note that it can be quite
slow in UTF-8, and useBytes = TRUE
will be much faster.
Either a vector giving the indices of the elements that yielded a
match, or, if value
is TRUE
, the matched elements (after
coercion, preserving names but no other attributes).
Original version by David Meyer, based on C code by Jarkko Hietaniemi.
grep
agrep("lasy", "1 lazy 2")
agrep("lasy", c(" 1 lazy 2", "1 lasy 2"), max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)