factor {base} | R Documentation |
The function factor
is used to encode a vector as a factor (the
names category and enumerated type are also used for factors). If
ordered
is TRUE
, the factor levels are assumed to be ordered.
For compatibility with S there is also a function ordered
.
is.factor
, is.ordered
, as.factor
and as.ordered
are the membership and coercion functions for these classes.
factor(x, levels = sort(unique(x), na.last = TRUE), labels = levels,
exclude = NA, ordered = is.ordered(x))
ordered(x, ...)
is.factor(x)
is.ordered(x)
as.factor(x)
as.ordered(x)
x |
a vector of data, usually taking a small number of distinct values |
levels |
an optional vector of the values that |
labels |
either an optional vector of labels for the
levels (in the same order as |
exclude |
a vector of values to be excluded when forming the
set of levels. This should be of the same type as |
ordered |
logical flag to determine if the levels should be regarded as ordered (in the order given). |
... |
(in |
The type of the vector x
is not restricted.
Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.
The encoding of the vector happens as follows. First all the values
in exclude
are removed from levels
. If x[i]
equals
levels[j]
, then the i
-th element of the result is
j
. If no match is found for x[i]
in levels
,
then the i
-th element of the result is set to NA
.
Normally the ‘levels’ used as an attribute of the result are the
reduced set of levels after removing those in exclude
, but
this can be altered by supplying labels
. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
number appended.
factor(x)
applied to a factor is a no-operation unless there
are unused levels: in that case, a factor with the reduced level set is
returned. If exclude
is used it should also be a factor with
the same level set as x
or a set of codes for the levels to
be excluded.
The codes of a factor may contain NA
. For a numeric
x
, set exclude=NULL
to make NA
an extra
level ("NA"
), by default the last level.
If "NA"
is a level, the way to set a code to be missing is to
use is.na
on the left-hand-side of an assignment.
Under those circumstances missing values are printed as <NA>
.
factor
returns an object of class "factor"
which has a
set of numeric codes the length of x
with a "levels"
attribute of mode character
. If ordered
is true
(or ordered
is used) the result has class
c("ordered", "factor")
.
is.factor
returns TRUE
or FALSE
depending on
whether its argument is of type factor or not. Correspondingly,
is.ordered
returns TRUE
when its
argument is ordered and FALSE
otherwise.
as.factor
coerces its argument to a factor.
It is an abbreviated form of factor
.
as.ordered(x)
returns x
if this is ordered, and
ordered(x)
otherwise.
The interpretation of a factor depends on both the codes and the
"levels"
attribute. Be careful only to compare factors with
the same set of levels (in the same order). In particular,
as.numeric
applied to a factor is meaningless, and may
happen by implicit coercion.
The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.
gl
for construction of “balanced” factors and
C
for factors with specified contrasts.
levels
and nlevels
for accessing the
levels, and codes
to get integer codes.
ff <- factor(substring("statistics", 1:10, 1:10), levels=letters)
ff
codes(ff)
factor(ff)# drops the levels that do not occur
factor(factor(letters[7:10])[2:3]) # exercise indexing and reduction
factor(letters[1:20], label="letter")
class(ordered(4:1))# "ordered", inheriting from "factor"
## suppose you want "NA" as a level, and to allowing missing values.
(x <- factor(c(1, 2, "NA"), exclude = ""))
is.na(x)[2] <- TRUE
x # [1] 1 <NA> NA, <NA> used because NA is a level.
is.na(x)
# [1] FALSE TRUE FALSE