santoku is a versatile cutting tool for R. It provides chop(), a replacement for base::cut().

Advantages

Here are some advantages of santoku:

  • By default, chop() always covers the whole range of the data, so you won’t get unexpected NA values.

  • chop() can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3)) will create a separate factor level for values exactly equal to 2.

  • Flexible labelling, including easy ways to label intervals by numerals or letters.

  • Convenience functions for creating quantile intervals, evenly-spaced intervals or equal-sized groups.

  • Convenience functions for quickly tabulating chopped data.

These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.

Installation

You can install the development version from GitHub with:

More ways to chop

To chop into fixed-width intervals, starting at the minimum value, use chop_width():

To chop into exactly groups fixed-with intervals, use chop_evenly():

To chop into groups with a fixed number of members, use chop_n():

To chop into a fixed number of equal-sized groups, use chop_equally():

To chop data up by quantiles, use chop_quantiles():

To chop data by standard deviations around the mean, use chop_mean_sd():

tab_n(), tab_width(), tab_evenly() and tab_mean_sd() act similarly to tab(), calling the related chop_ function and then table().

Advanced usage

You can change factor labels with the labels argument:

You need as many labels as there are intervals - one fewer than length(breaks) if your data doesn’t extend beyond breaks, one more than length(breaks) if it does.

To label intervals with a dash, use lbl_dash():

For arbitrary formatting use lbl_format() and sprintf-style format strings:

To label intervals in order use lbl_seq():

You can use numerals or even roman numerals:

By default, chop() extends breaks if necessary. If you don’t want that, set extend = FALSE:

Data outside the range of breaks will become NA.

By default, intervals are closed on the left, i.e. they include their left endpoints. If you want right-closed intervals, use brk_right():

The last finite interval is right-closed (or if you use brk_right, the first finite interval is left-closed). If you don’t want that, use brk_left() explicitly and set close_end = FALSE:

If you want to chop repeatedly with the same arguments, create your own knife: