This help topic is for R version 1.7.1. For the current version of R, try https://stat.ethz.ch/R-manual/R-patched/library/base/html/agrep.html
agrep {base}R Documentation

Approximate String Matching (Fuzzy Matching)

Description

Searches for approximate matches to pattern (the first argument) within the string x (the second argument) using the Levenshtein edit distance.

Usage

agrep(pattern, x, ignore.case = FALSE, value = FALSE, max.distance = 0.1)

Arguments

pattern

a non-empty character string to be matched (not a regular expression!)

x

character vector where matches are sought.

ignore.case

if FALSE, the pattern matching is case sensitive and if TRUE, case is ignored during matching.

value

if FALSE, a vector containing the (integer) indices of the matches determined is returned and if TRUE, a vector containing the matching elements themselves is returned.

max.distance

Maximum distance allowed for a match. Expressed either as integer, or as a fraction of the pattern length (will be replaced by the smallest integer not less than the corresponding fraction), or a list with possible components

all:

maximal (overall) distance

insertions:

maximum number/fraction of insertions

deletions:

maximum number/fraction of deletions

substitutions:

maximum number/fraction of substitutions

If all is missing, it is set to 10%, the other components default to all. The component names can be abbreviated.

Details

The Levensthein edit distance is used as measure of approximateness: it is the the total number of insertions, deletions and substitutions required to transform one string into another.

The function is a simple interface to the apse library developed by Jarkko Hietaniemi (also used in the Perl String::Approx module).

Value

Either a vector giving the indices of the elements that yielded a match, of, if value is TRUE, the matched elements.

Author(s)

David Meyer David.Meyer@ci.tuwien.ac.at (based on C code by Jarkko Hietaniemi); modifications by Kurt Hornik

See Also

grep

Examples

agrep("lasy", "1 lazy 2")
agrep("lasy", "1 lazy 2", max = list(sub = 0))
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, value = TRUE)
agrep("laysy", c("1 lazy", "1", "1 LAZY"), max = 2, ignore.case = TRUE)

[Package base version 1.7.1 ]