| icuSetCollate {base} | R Documentation |
Setup Collation by ICU
Description
Controls the way collation is done by ICU (an optional part of the R build).
Usage
icuSetCollate(...)
Arguments
... |
Named arguments, see ‘Details’. |
Details
Optionally, R can be built to collate character strings by ICU
(http://site.icu-project.org). For such systems,
icuSetCollate can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.
Possible arguments are
locale:A character string such as
"da_DK"giving the country whose collation rules are to be used. If present, this should be the first argument.case_first:"upper","lower"or"default", asking for upper- or lower-case characters to be sorted first. The default is usually lower-case first, but not in all languages (see the Danish example).alternate_handling:Controls the handling of ‘variable’ characters (mainly punctuation and symbols). Possible values are
"non_ignorable"(primary strength) and"shifted"(quaternary strength).strength:Which components should be used? Possible values
"primary","secondary","tertiary"(default),"quaternary"and"identical".french_collation:In a French locale the way accents affect collation is from right to left, whereas in most other locales it is from left to right. Possible values
"on","off"and"default".normalization:Should strings be normalized? Possible values are
"on"and"off"(default). This affects the collation of composite characters.case_level:An additional level between secondary and tertiary, used to distinguish large and small Japanese Kana characters. Possible values
"on"and"off"(default).hiragana_quaternary:Possible values
"on"(sort Hiragana first at quaternary level) and"off".
Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.
Some examples are case_level="on", strength="primary" to ignore
accent differences and alternate_handling="shifted" to ignore
space and punctuation characters.
Note that these settings have no effect if collation is set to the
C locale, unless locale is specified.
Note
As from R 2.9.0, ICU is used by default wherever it is available: this include Mac OS >= 10.4 and many Linux installations.
See Also
Comparison, sort
The ICU user guide chapter on collation (http://userguide.icu-project.org/collation).
Examples
## these examples depend on having ICU available, and on the locale
x <- c("Aarhus", "aarhus", "safe", "test", "Zoo")
sort(x)
icuSetCollate(case_first="upper"); sort(x)
icuSetCollate(case_first="lower"); sort(x)
icuSetCollate(locale="da_DK", case_first="default"); sort(x)
icuSetCollate(locale="et_EE"); sort(x)