Sometimes it's useful to separate out common elements of x
.
dissect()
chops x
, but puts common elements of x
("spikes")
into separate categories.
Usage
dissect(
x,
breaks,
...,
n = NULL,
prop = NULL,
spike_labels = "{{{l}}}",
exclude_spikes = FALSE
)
tab_dissect(x, breaks, ..., n = NULL, prop = NULL)
Arguments
- x, breaks, ...
Passed to
chop()
.- n, prop
Scalar. Provide either
n
, a number of values, orprop
, a proportion oflength(x)
. Values ofx
which occur at least this often will get their own singleton break.- spike_labels
Glue string for spike labels. Use
"{l}"
for the spike value.- exclude_spikes
Logical. Exclude spikes before chopping
x
? This can affect the location of data-dependent breaks.
Value
dissect()
returns the result of chop()
, but with common values put into
separate factor levels.
tab_dissect()
returns a contingency table()
.
Details
Unlike chop_spikes()
, dissect()
doesn't break up
intervals which contain a spike. As a result, unlike chop_*
functions,
dissect()
does not chop x
into disjoint intervals. See the examples.
If breaks are data-dependent, their labels may be misleading after common
elements have been removed. See the example below. To get round this,
set exclude_spikes
to TRUE
. Then breaks will be calculated after
removing spikes from the data.
Levels of the result are ordered by the minimum element in each level. As
a result, if drop = FALSE
, empty levels will be placed last.
See also
chop_spikes()
for a different approach.
Examples
x <- c(2, 3, 3, 3, 4)
dissect(x, c(2, 4), n = 3)
#> [1] [2, 4] {3} {3} {3} [2, 4]
#> Levels: [2, 4] {3}
dissect(x, brk_width(2), prop = 0.5)
#> [1] [2, 4] {3} {3} {3} [2, 4]
#> Levels: [2, 4] {3}
set.seed(42)
x <- runif(40, 0, 10)
x <- sample(x, 200, replace = TRUE)
# Compare:
table(dissect(x, brk_width(2, 0), prop = 0.05))
#>
#> [0, 2) [2, 4) [4, 6) [6, 8) [8, 10] {9.057}
#> 30 24 36 40 59 11
# Versus:
tab_spikes(x, brk_width(2, 0), prop = 0.05)
#> [0, 2) [2, 4) [4, 6) [6, 8) [8, 9.057) {9.057}
#> 30 24 36 40 22 11
#> (9.057, 10]
#> 37
# Potentially confusing data-dependent breaks:
set.seed(42)
x <- rnorm(99)
x[1:9] <- x[1]
tab_quantiles(x, 1:2/3)
#> [0%, 33.33%) [33.33%, 66.67%) [66.67%, 100%]
#> 33 33 33
tab_dissect(x, brk_quantiles(1:2/3), n = 9)
#> [0%, 33.33%) [33.33%, 66.67%) [66.67%, 100%] {1.371}
#> 33 33 24 9
# Calculate quantiles excluding spikes:
tab_dissect(x, brk_quantiles(1:2/3), n = 9, exclude_spikes = TRUE)
#> [0%, 33.33%) [33.33%, 66.67%) [66.67%, 100%] {1.371}
#> 30 30 30 9