R: Split the Elements of a Character Vector

strsplit {base}

R Documentation

Split the Elements of a Character Vector

Description

Split the elements of a character vector x into substrings according to the presence of substring split within them.

Usage

strsplit(x, split, extended = TRUE, fixed = FALSE, perl = FALSE)

Arguments

x

character vector, each element of which is to be split.

split

character vector containing regular expression(s) (unless fixed = TRUE) to use as “split”. If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.

extended

logical. if TRUE, extended regular expression matching is used, and if FALSE basic regular expressions are used.

fixed

logical. If TRUE match string exactly, otherwise use regular expressions.

perl

logical. Should perl-compatible regexps be used? Has priority over extended.

Details

Arguments x and split will be coerced to character, so you will see uses with split = NULL to mean split = character(0), including in the examples below.

Note that spltting into single characters can be done via split=character(0) or split=""; the two are equivalent as from R 1.9.0.

A missing value of split does not split the the corresponding element(s) of x at all.

Value

A list of length length(x) the i-th element of which contains the vector of splits of x[i].

Warning

The standard regular expression code has been reported to be very slow or give errors when applied to extremely long character strings (tens of thousands of characters or more): the code used when perl=TRUE seems faster and more reliable for such usages.

Examples

noquote(strsplit("A text I want to display with spaces", NULL)[[1]])

x <- c(as = "asfef", qu = "qwerty", "yuiop[", "b", "stuff.blah.yech")
# split x on the letter e
strsplit(x,"e")

unlist(strsplit("a.b.c", "."))
## [1] "" "" "" "" ""
## Note that 'split' is a regexp!
## If you really want to split on '.', use
unlist(strsplit("a.b.c", "\\."))
## [1] "a" "b" "c"
## or
unlist(strsplit("a.b.c", ".", fixed = TRUE))

## a useful function: rev() for strings
strReverse <- function(x)
	sapply(lapply(strsplit(x, NULL), rev), paste, collapse="")
strReverse(c("abc", "Statistics"))

## get the first names of the members of R-core
a <- readLines(file.path(R.home(),"AUTHORS"))[-(1:8)]
a <- a[(0:2)-length(a)]
(a <- sub(" .*","", a))
# and reverse them
strReverse(a)

[Package base version 2.0.0 ]