santoku - a visual introduction

David Hugh-Jones

2023-10-11

Santoku

A Japanese kitchen knife.

chopping skills

{santoku}

An R package for cutting data.

santoku logo

Some data

head(pts)
##      x   y
## 1  581 326
## 2  724 271
## 3 1244 291
## 4 1314 283
## 5  697 169
## 6  696 119
x <- pts$x

Some data

plot_the_fish()

chop()

chop() is a replacement for base R’s cut() function.

chop()

chop(x, c(300, 600, 900))

extend = FALSE

chop(x, c(300, 600, 900), extend = FALSE)

Chopping by a single value

chop(x, c(300, 500, 500, 800))

chop_width()

Chops fixed-width intervals

chop_width(x, width = 200)

chop_evenly()

Chops intervals equal-width intervals

chop_evenly(x, intervals = 5)

chop_proportions()

Chops intervals by proportions of the data range

chop_proportions(x, proportions = c(0.2, 0.8))

chop_equally()

Chops intervals with an equal number of elements

chop_equally(x, groups = 5)

chop_n()

Chops intervals with a fixed number of elements

chop_n(x, 50)

chop_quantiles()

chop_quantiles(x, c(0.2, 0.8))

Summary

Chop by: / Size means: number of elements width
Fixed size chop_n() chop_width()
Fixed no. of groups chop_equally() chop_evenly()
Specific sizes chop_quantiles() chop_proportions()

chop_mean_sd()

chop_mean_sd(x)

Quick tables

tab(x, c(300, 600, 900))
##   [70, 300)  [300, 600)  [600, 900) [900, 1390] 
##          10          55          59         128
tab_mean_sd(x)
## [-3 sd, -2 sd) [-2 sd, -1 sd)  [-1 sd, 0 sd)   [0 sd, 1 sd)   [1 sd, 2 sd) 
##              4             51             66             76             55

Changing labels

You need one more labels than breaks:

chop(x, c(300, 600, 900), labels = LETTERS[1:4])

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_seq())

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_seq("(i)"))

Changing labels

Not sure how many intervals you will have?

Use a lbl_* function.

chop_width(x, 200, labels = lbl_dash())

Left-closed and right-closed

Breaks are closed on the left by default.

chop(x, c(200, 500, 800))

Left-closed and right-closed

For right-closed breaks use left = FALSE:

chop(x, c(200, 500, 800), left = FALSE)

Errors

chopping fail

Errors

Sometimes it’s impossible to create the breaks you want.

chop_quantiles(c(-Inf, Inf), c(0.25, 0.75))
## [1] [-Inf, Inf ] [-Inf, Inf ]
## Levels: [-Inf, Inf ]

When the problem comes from the data (x), santoku will try to carry on (e.g. by returning a single interval).

When the problem comes from other parameters, e.g. breaks or extend, santoku will give an error.

chop_quantiles(1:5, c(0.25, NA))
## Error: probs contains 1 missing values

Happy chopping!

https://hughjonesd.github.io/santoku

devtools::install_github("hughjonesd/santoku")

Chopping