| iconv {base} | R Documentation |
Convert Character Vector between Encodings
Description
This uses system facilities to convert a character vector between encodings: the ‘i’ stands for ‘internationalization’.
Usage
iconv(x, from ="", to = "", sub=NA)
iconvlist()
Arguments
x |
A character vector, or an object to be converted to a character
vector by |
from |
A character string describing the current encoding. |
to |
A character string describing the target encoding. |
sub |
character string. If not |
Details
The names of encodings and which ones are available (and indeed, if
any are) is platform-dependent. On all systems that support
iconv you can use "" for the encoding of the current
locale, as well as "latin1" and "UTF-8".
On many platforms iconvlist provides an alphabetical list of
the supported encodings. On others, the information is on the man
page for iconv(5) or elsewhere in the man pages (and beware
that the system command iconv may not support the same set of
encodings as the C functions R calls).
Unfortunately, the names are rarely common across platforms.
Elements of x which cannot be converted (perhaps because they
are invalid or because they cannot be represented in the target
encoding) will be returned as NA unless sub is specified.
Some versions of iconv will allow transliteration by appending
//TRANSLIT to the to encoding: see the examples.
Value
A character vector of the same length and the same attributes as
x (after conversion).
The elements of the result have a declared encoding if from is
"latin1" or "UTF-8", or if from = "" and the
current locale's charset is detected as Latin-1 or UTF-8.
Note
Not all platforms support these functions. See also
capabilities("iconv").
See Also
localeToCharset, file.
Examples
head(iconvlist(), n = 50)
## Not run:
## convert from Latin-2 to UTF-8: two of the glibc iconv variants.
iconv(x, "ISO_8859-2", "UTF-8")
iconv(x, "LATIN2", "UTF-8")
## End(Not run)
## Both x below are in latin1 and will only display correctly in a
## latin1 locale.
x <- "fa\xE7ile"
Encoding(x) <- "latin1"
x
charToRaw(xx <- iconv(x, "latin1", "UTF-8"))
xx
iconv(x, "latin1", "ASCII") # NA
iconv(x, "latin1", "ASCII", "?") # "fa?ile"
iconv(x, "latin1", "ASCII", "") # "faile"
iconv(x, "latin1", "ASCII", "byte") # "fa<e7>ile"
# Extracts from R help files
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x) <- "latin1"
x
try(iconv(x, "latin1", "ASCII//TRANSLIT"))
iconv(x, "latin1", "ASCII", sub="byte")