iconv {base} | R Documentation |
This uses system facilities to convert a character vector between encodings: the ‘i’ stands for ‘internationalization’.
iconv(x, from ="", to = "", sub = NA)
iconvlist()
x |
A character vector, or an object to be converted to a character
vector by |
from |
A character string describing the current encoding. |
to |
A character string describing the target encoding. |
sub |
character string. If not |
The names of encodings and which ones are available (and indeed, if
any are) is platform-dependent. On all systems that support
iconv
you can use ""
for the encoding of the current
locale, as well as "latin1"
and "UTF-8"
. On most
systems (including those using glibc
or libinconv
, Mac
OS X and Windows) case is ignored when specifying an encoding.
On many platforms iconvlist
provides an alphabetical list of
the supported encodings. On others, the information is on the man
page for iconv(5)
or elsewhere in the man pages (and beware
that the system command iconv
may not support the same set of
encodings as the C functions R calls).
Unfortunately, the names are rarely common across platforms.
Elements of x
which cannot be converted (perhaps because they
are invalid or because they cannot be represented in the target
encoding) will be returned as NA
unless sub
is specified.
Most versions of iconv
will allow transliteration by appending
//TRANSLIT
to the to
encoding: see the examples.
Any encoding bits (see Encoding
) on elements of x
are ignored: they will always be translated as if from from
even if declared otherwise.
"UTF8"
will be accepted as meaning the (more correct) "UTF-8"
.
A character vector of the same length and the same attributes as
x
(after conversion).
The elements of the result have a declared encoding if from
is
"latin1"
or "UTF-8"
, or if from = ""
and the
current locale's encoding is detected as Latin-1 or UTF-8.
Not all platforms support these functions, although almost all
support iconv
. See also capabilities("iconv")
.
localeToCharset
, file
.
## not all systems have iconvlist
try(utils::head(iconvlist(), n = 50))
## Not run:
## convert from Latin-2 to UTF-8: two of the glibc iconv variants.
iconv(x, "ISO_8859-2", "UTF-8")
iconv(x, "LATIN2", "UTF-8")
## End(Not run)
## Both x below are in latin1 and will only display correctly in a
## locale that can represent and display latin1.
x <- "fa\xE7ile"
Encoding(x) <- "latin1"
x
charToRaw(xx <- iconv(x, "latin1", "UTF-8"))
xx
iconv(x, "latin1", "ASCII") # NA
iconv(x, "latin1", "ASCII", "?") # "fa?ile"
iconv(x, "latin1", "ASCII", "") # "faile"
iconv(x, "latin1", "ASCII", "byte") # "fa<e7>ile"
# Extracts from R help files
x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
Encoding(x) <- "latin1"
x
try(iconv(x, "latin1", "ASCII//TRANSLIT")) # platform-dependent
iconv(x, "latin1", "ASCII", sub="byte")