santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Here are some advantages of santoku:

• By default, chop() always covers the whole range of the data, so you won’t get unexpected NA values.

• chop() can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3)) will create a separate factor level for values exactly equal to 2.

• Flexible labelling, including easy ways to label intervals by numerals or letters.

• Convenience functions for creating quantile intervals, evenly-spaced intervals or equal-sized groups.

• Convenience functions for quickly tabulating chopped data.

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

## Examples

library(santoku)

chop returns a factor:

chop(1:8, c(3, 5, 7))
#> [1] [1, 3) [1, 3) [3, 5) [3, 5) [5, 7) [5, 7) [7, 8] [7, 8]
#> Levels: [1, 3) [3, 5) [5, 7) [7, 8]

Include a number twice to match it exactly:

chop(1:8, c(3, 5, 5, 7))
#> [1] [1, 3) [1, 3) [3, 5) [3, 5) {5}    (5, 7) [7, 8] [7, 8]
#> Levels: [1, 3) [3, 5) {5} (5, 7) [7, 8]

Customize output with lbl_* functions:

chop(1:8, c(3, 5, 7), labels = lbl_dash())
#> [1] 1—3 1—3 3—5 3—5 5—7 5—7 7—8 7—8
#> Levels: 1—3 3—5 5—7 7—8

Chop into fixed-width intervals:

chop_width(runif(10), 0.1)
#>  [1] [0.8278, 0.9278)  [0.8278, 0.9278)  [0.8278, 0.9278)  [0.3278, 0.4278)
#>  [5] [0.7278, 0.8278)  [0.2278, 0.3278)  [0.9278, 1.028)   [0.02781, 0.1278)
#>  [9] [0.9278, 1.028)   [0.02781, 0.1278)
#> 6 Levels: [0.02781, 0.1278) [0.2278, 0.3278) ... [0.9278, 1.028)

Or into fixed-size groups:

chop_n(1:10, 5)
#>  [1] [1, 6)  [1, 6)  [1, 6)  [1, 6)  [1, 6)  [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]

Chop dates by calendar month, then tabulate:

library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#>     date, intersect, setdiff, union

tab_width(as.Date("2021-12-31") + 1:90, months(1),
labels = lbl_discrete(fmt = "%d %b")
)
#> 01 Jan—31 Jan 01 Feb—28 Feb 01 Mar—31 Mar
#>            31            28            31