Skip to contents

santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Advantages

Here are some advantages of santoku:

  • By default, chop() always covers the whole range of the data, so you won’t get unexpected NA values.

  • chop() can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3)) will create a separate factor level for values exactly equal to 2.

  • Flexible labelling, including easy ways to label intervals by numerals or letters.

  • Convenience functions for creating quantile intervals, evenly-spaced intervals or equal-sized groups.

  • Convenience functions for quickly tabulating chopped data.

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Usage

library(santoku)

# chop returns a factor:
chop(1:10, c(3, 5, 7))
#>  [1] [1, 3)  [1, 3)  [3, 5)  [3, 5)  [5, 7)  [5, 7)  [7, 10] [7, 10] [7, 10]
#> [10] [7, 10]
#> Levels: [1, 3) [3, 5) [5, 7) [7, 10]

# Include a number twice to match it exactly;
# Use `labels = lbl_discrete()` for integer data:
chop(1:10, c(3, 5, 5, 7), labels = lbl_discrete())
#>  [1] 1 - 2  1 - 2  3 - 4  3 - 4  5      6      7 - 10 7 - 10 7 - 10 7 - 10
#> Levels: 1 - 2 3 - 4 5 6 7 - 10

loadNamespace("lubridate")
#> <environment: namespace:lubridate>

# chop dates by calendar month, then tabulate:
tab_width(as.Date("2021-12-31") + 1:90, 
            months(1), 
            labels = lbl_discrete(fmt = "%d %b")
          )
#> x
#> 01 Jan - 31 Jan 01 Feb - 28 Feb 01 Mar - 31 Mar 
#>              31              28              31

For more information, see the vignette.