Dealing with masses of data

Sometimes there is a lot of data to present. For example, you might have a big table of regressions, or sales figures from several business units. You want to show the data, while highlighting certain key elements.

As an example, we’ll use the mtcars dataset of 32 1970s automobiles.

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

I will use the huxtable library to create a formatted table, and the dplyr pipe command to simplify transforming data.

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21   6 160   110 3.9  2.62 16.5 0 1 4 4
Mazda RX4 Wag 21   6 160   110 3.9  2.88 17   0 1 4 4
Datsun 710 22.8 4 108   93 3.85 2.32 18.6 1 1 4 1
Hornet 4 Drive 21.4 6 258   110 3.08 3.21 19.4 1 0 3 1
Hornet Sportabout 18.7 8 360   175 3.15 3.44 17   0 0 3 2
Valiant 18.1 6 225   105 2.76 3.46 20.2 1 0 3 1
Duster 360 14.3 8 360   245 3.21 3.57 15.8 0 0 3 4
Merc 240D 24.4 4 147   62 3.69 3.19 20   1 0 4 2
Merc 230 22.8 4 141   95 3.92 3.15 22.9 1 0 4 2
Merc 280 19.2 6 168   123 3.92 3.44 18.3 1 0 4 4
Merc 280C 17.8 6 168   123 3.92 3.44 18.9 1 0 4 4
Merc 450SE 16.4 8 276   180 3.07 4.07 17.4 0 0 3 3
Merc 450SL 17.3 8 276   180 3.07 3.73 17.6 0 0 3 3
Merc 450SLC 15.2 8 276   180 3.07 3.78 18   0 0 3 3
Cadillac Fleetwood 10.4 8 472   205 2.93 5.25 18   0 0 3 4
Lincoln Continental 10.4 8 460   215 3    5.42 17.8 0 0 3 4
Chrysler Imperial 14.7 8 440   230 3.23 5.34 17.4 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.2  19.5 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1
Toyota Corona 21.5 4 120   97 3.7  2.46 20   1 0 3 1
Dodge Challenger 15.5 8 318   150 2.76 3.52 16.9 0 0 3 2
AMC Javelin 15.2 8 304   150 3.15 3.44 17.3 0 0 3 2
Camaro Z28 13.3 8 350   245 3.73 3.84 15.4 0 0 3 4
Pontiac Firebird 19.2 8 400   175 3.08 3.85 17.1 0 0 3 2
Fiat X1-9 27.3 4 79   66 4.08 1.94 18.9 1 1 4 1
Porsche 914-2 26   4 120   91 4.43 2.14 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
Ford Pantera L 15.8 8 351   264 4.22 3.17 14.5 0 1 5 4
Ferrari Dino 19.7 6 145   175 3.62 2.77 15.5 0 1 5 6
Maserati Bora 15   8 301   335 3.54 3.57 14.6 0 1 5 8
Volvo 142E 21.4 4 121   109 4.11 2.78 18.6 1 1 4 2

This dataset contains many numbers and columns. It’s easy to get lost. To start, let’s give alternate rows a different background. That will help readers to navigate through the data.

Huxtable 4.3.0 introduces map_ commands. These let you map different data cells to different display properties. Let’s use map_background_color to set the background colour.

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
... ...  ... ... ... ...   ...   ...  ... ... ... ...

Huxtable’s built-in theme_grey will do that for us, and also set a few other sensible cosmetic defaults. We’ll use that as a basis from now on.

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
... ...  ... ... ... ...   ...   ...  ... ... ... ...

Mapping data with map_ commands

map_ commands take a huxtable, an optional row and column specifier, and a “mapping function” which states how to map table data to output. Mapping functions start with by_. For example, by_rows just says: apply this output to rows in sequence.

Let’s use another mapping function to highlight the cars with the best and worst fuel efficiency.

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21   6 160   110 3.9  2.62 16.5 0 1 4 4
Mazda RX4 Wag 21   6 160   110 3.9  2.88 17   0 1 4 4
Datsun 710 22.8 4 108   93 3.85 2.32 18.6 1 1 4 1
Hornet 4 Drive 21.4 6 258   110 3.08 3.21 19.4 1 0 3 1
Hornet Sportabout 18.7 8 360   175 3.15 3.44 17   0 0 3 2
Valiant 18.1 6 225   105 2.76 3.46 20.2 1 0 3 1
Duster 360 14.3 8 360   245 3.21 3.57 15.8 0 0 3 4
Merc 240D 24.4 4 147   62 3.69 3.19 20   1 0 4 2
Merc 230 22.8 4 141   95 3.92 3.15 22.9 1 0 4 2
Merc 280 19.2 6 168   123 3.92 3.44 18.3 1 0 4 4
Merc 280C 17.8 6 168   123 3.92 3.44 18.9 1 0 4 4
Merc 450SE 16.4 8 276   180 3.07 4.07 17.4 0 0 3 3
Merc 450SL 17.3 8 276   180 3.07 3.73 17.6 0 0 3 3
Merc 450SLC 15.2 8 276   180 3.07 3.78 18   0 0 3 3
Cadillac Fleetwood 10.4 8 472   205 2.93 5.25 18   0 0 3 4
Lincoln Continental 10.4 8 460   215 3    5.42 17.8 0 0 3 4
Chrysler Imperial 14.7 8 440   230 3.23 5.34 17.4 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.2  19.5 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1
Toyota Corona 21.5 4 120   97 3.7  2.46 20   1 0 3 1
Dodge Challenger 15.5 8 318   150 2.76 3.52 16.9 0 0 3 2
AMC Javelin 15.2 8 304   150 3.15 3.44 17.3 0 0 3 2
Camaro Z28 13.3 8 350   245 3.73 3.84 15.4 0 0 3 4
Pontiac Firebird 19.2 8 400   175 3.08 3.85 17.1 0 0 3 2
Fiat X1-9 27.3 4 79   66 4.08 1.94 18.9 1 1 4 1
Porsche 914-2 26   4 120   91 4.43 2.14 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
Ford Pantera L 15.8 8 351   264 4.22 3.17 14.5 0 1 5 4
Ferrari Dino 19.7 6 145   175 3.62 2.77 15.5 0 1 5 6
Maserati Bora 15   8 301   335 3.54 3.57 14.6 0 1 5 8
Volvo 142E 21.4 4 121   109 4.11 2.78 18.6 1 1 4 2

by_quantiles takes a vector of quantiles, and a vector of colours. Here, we’ve coloured the bottom 20% of data red, and the top 20% green. Notice that we used one more colour name than quantile.

This example also showed how we can select rows or columns to map, using the row and column specifier arguments to a map_ command. The row specifier was everywhere, which is just huxtable shorthand for 1:nrow(car_hux). The column specifier was the column name.

Map numbers in a range with by_colorspace

To map a continuous range of numbers, use by_colorspace. This takes a vector of colours, then maps the data continuously between them. Here, we’ll highlight the “qsec” column, which reports the car’s time to reach a quarter of a mile. Lower is better (if you’re an adrenalin junkie), so let’s make low numbers pop out with a yellow background:

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21   6 160   110 3.9  2.62 16.5 0 1 4 4
Mazda RX4 Wag 21   6 160   110 3.9  2.88 17   0 1 4 4
Datsun 710 22.8 4 108   93 3.85 2.32 18.6 1 1 4 1
Hornet 4 Drive 21.4 6 258   110 3.08 3.21 19.4 1 0 3 1
Hornet Sportabout 18.7 8 360   175 3.15 3.44 17   0 0 3 2
Valiant 18.1 6 225   105 2.76 3.46 20.2 1 0 3 1
Duster 360 14.3 8 360   245 3.21 3.57 15.8 0 0 3 4
Merc 240D 24.4 4 147   62 3.69 3.19 20   1 0 4 2
Merc 230 22.8 4 141   95 3.92 3.15 22.9 1 0 4 2
Merc 280 19.2 6 168   123 3.92 3.44 18.3 1 0 4 4
Merc 280C 17.8 6 168   123 3.92 3.44 18.9 1 0 4 4
Merc 450SE 16.4 8 276   180 3.07 4.07 17.4 0 0 3 3
Merc 450SL 17.3 8 276   180 3.07 3.73 17.6 0 0 3 3
Merc 450SLC 15.2 8 276   180 3.07 3.78 18   0 0 3 3
Cadillac Fleetwood 10.4 8 472   205 2.93 5.25 18   0 0 3 4
Lincoln Continental 10.4 8 460   215 3    5.42 17.8 0 0 3 4
Chrysler Imperial 14.7 8 440   230 3.23 5.34 17.4 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.2  19.5 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.61 18.5 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1
Toyota Corona 21.5 4 120   97 3.7  2.46 20   1 0 3 1
Dodge Challenger 15.5 8 318   150 2.76 3.52 16.9 0 0 3 2
AMC Javelin 15.2 8 304   150 3.15 3.44 17.3 0 0 3 2
Camaro Z28 13.3 8 350   245 3.73 3.84 15.4 0 0 3 4
Pontiac Firebird 19.2 8 400   175 3.08 3.85 17.1 0 0 3 2
Fiat X1-9 27.3 4 79   66 4.08 1.94 18.9 1 1 4 1
Porsche 914-2 26   4 120   91 4.43 2.14 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
Ford Pantera L 15.8 8 351   264 4.22 3.17 14.5 0 1 5 4
Ferrari Dino 19.7 6 145   175 3.62 2.77 15.5 0 1 5 6
Maserati Bora 15   8 301   335 3.54 3.57 14.6 0 1 5 8
Volvo 142E 21.4 4 121   109 4.11 2.78 18.6 1 1 4 2

Highlight text with by_regex

map_text_color is another map_ command. There is one of these commands for every huxtable cell property. Another one is map_bold. Let’s use this to pick out the Mercedes models.

For character data, we can use by_regex to match certain cells:

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
Merc 240D  24.4 147  62  3.69 3.19 20   2
Merc 230  22.8 141  95  3.92 3.15 22.9 2
Merc 280  19.2 168  123  3.92 3.44 18.3 4
Merc 280C  17.8 168  123  3.92 3.44 18.9 4
Merc 450SE  16.4 276  180  3.07 4.07 17.4 3
Merc 450SL  17.3 276  180  3.07 3.73 17.6 3
Merc 450SLC  15.2 276  180  3.07 3.78 18   3
... ...  ... ... ... ...   ...   ...  ... ... ... ...

The syntax of by_regex is "match1" = value2, "match2" = value2, .... Each “match” is a regular expression. You can read more about them in ?regexpr. Here, we just used a simple regular expression, which matched every cell containing the string "Merc".

Highlight a column with set_background_color and set_outer_borders

Sometimes you might want to highlight an entire column of data, for example as you flip through a report. Here’s one way to do this:

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
... ...  ... ... ... ...   ...   ...  ... ... ... ...

While map_ commands map different cells to different values, set_ commands set all the cells in a range to the same value. Here again, we’ve used a row specifier everywhere to select all rows, and picked out the horsepower column “hp” by name.

We can do the same trick to highlight rows. Here, we’ll highlight the “Hornet” cars:

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
... ...  ... ... ... ...   ...   ...  ... ... ... ...

Another way to pick out a column is to set a border around it. We can do this using the set_outer_borders function.

Model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4  21   160  110  3.9  2.62 16.5 4
Mazda RX4 Wag  21   160  110  3.9  2.88 17   4
Datsun 710  22.8 108  93  3.85 2.32 18.6 1
Hornet 4 Drive  21.4 258  110  3.08 3.21 19.4 1
Hornet Sportabout  18.7 360  175  3.15 3.44 17   2
Valiant  18.1 225  105  2.76 3.46 20.2 1
Duster 360  14.3 360  245  3.21 3.57 15.8 4
... ...  ... ... ... ...   ...   ...  ... ... ... ...

set_outer_borders sets borders around a square of cells. One wrinkle is that there is not (yet) a set_outer_border_colors function. Instead, we had to set borders to 0 in the "disp" column, then set all border colours to dark red.

More ways to map

To recap, here are some ways to highlight data using huxtable:

  • Use by_rows to set properties on alternating rows.
  • Use by_quantiles to highlight top and bottom numbers.
  • by_regex picks out cells matching certain text.
  • set_background_color and set_outer_borders can highlight whole rows and columns.

There are several other ways to map data. For example, by_cols sets cell properties by columns. by_cases and by_function are more general solutions to pick out cells using more complex criteria.

Huxtable 4.3.0 is available on CRAN. Happy mapping!