grep {base} | R Documentation |
grep
searches for matches to pattern
(its first
argument) within the character vector x
(second
argument). regexpr
does too, but returns more detail in a
different format.
sub
and gsub
perform replacement of matches determined
by regular expression matching.
grep(pattern, x, ignore.case=FALSE, extended=TRUE, perl=FALSE, value=FALSE)
sub(pattern, replacement, x,
ignore.case=FALSE, extended=TRUE, perl=FALSE)
gsub(pattern, replacement, x,
ignore.case=FALSE, extended=TRUE, perl=FALSE)
regexpr(pattern, text, extended=TRUE, perl=FALSE)
pattern |
character string containing a regular expression to be matched in the given character vector. |
x , text |
a character vector where matches are sought. |
ignore.case |
if |
extended |
if |
perl |
logical. Should perl-compatible regexps be used if
available? Has priority over |
value |
if |
replacement |
a replacement for matched pattern in |
The two *sub
functions differ only in that sub
replaces
only the first occurrence of a pattern
whereas gsub
replaces all occurrences.
The regular expressions used are those specified by POSIX 1003.2,
either extended or basic, depending on the value of the
extended
argument, unless perl = TRUE
when they are
those of PCRE,
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/.
For grep
a vector giving either the indices of the elements of
x
that yielded a match or, if value
is TRUE
, the
matched elements.
For sub
and gsub
a character vector of the same length
as the original.
For regexpr
an integer vector of the same length as text
giving the starting position of the first match, or -1
if there
is none, with attribute "match.length"
giving the length of the
matched text (or -1
for no match).
perl=TRUE
will only be available if R was compiled against
PCRE: this is detected at configure time. All Unix and Windows system
should have it.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth \& Brooks/Cole (grep
)
agrep
for approximate matching.
tolower
, toupper
and chartr
for character translations.
charmatch
, pmatch
, match
.
apropos
uses regexps and has nice examples.
grep("[a-z]", letters)
txt <- c("arm","foot","lefroo", "bafoobar")
if(any(i <- grep("foo",txt)))
cat("`foo' appears at least once in\n\t",txt,"\n")
i # 2 and 4
txt[i]
## Double all 'a' or 'b's; "\" must be escaped, i.e. `doubled'
gsub("([ab])", "\\1_\\1_", "abc and ABC")
txt <- c("The", "licenses", "for", "most", "software", "are",
"designed", "to", "take", "away", "your", "freedom",
"to", "share", "and", "change", "it.",
"", "By", "contrast,", "the", "GNU", "General", "Public", "License",
"is", "intended", "to", "guarantee", "your", "freedom", "to",
"share", "and", "change", "free", "software", "--",
"to", "make", "sure", "the", "software", "is",
"free", "for", "all", "its", "users")
( i <- grep("[gu]", txt) ) # indices
stopifnot( txt[i] == grep("[gu]", txt, value = TRUE) )
(ot <- sub("[b-e]",".", txt))
txt[ot != gsub("[b-e]",".", txt)]#- gsub does "global" substitution
txt[gsub("g","#", txt) !=
gsub("g","#", txt, ignore.case = TRUE)] # the "G" words
regexpr("en", txt)
## trim trailing white space
str = 'Now is the time '
sub(' +$', '', str) ## spaces only
sub('[[:space:]]+$', '', str) ## white space, POSIX-style
if(capabilities("PCRE"))
sub('\\s+$', '', str, perl = TRUE) ## perl-style white space