Regression Tables with huxreg

Regression tables with `huxreg`

Huxtable includes the function huxreg to build a table of regressions.

You call huxreg with a list of models. These models can be of any class which has a tidy method defined in the broom package. The method should return a list of regression coefficients with names term, estimate, std.error and p.value. That covers most standard regression packages.

Let’s start by running some regressions to predict a diamond’s price.

data(diamonds, package = "ggplot2")
diamonds <- diamonds[1:100,]

lm1 <- lm(price ~ carat + depth, diamonds)
lm2 <- lm(price ~ depth + factor(color, ordered = FALSE), diamonds)
lm3 <- lm(log(price) ~ carat + depth, diamonds)

Now, we use huxreg to display the regression output side by side.

huxreg(lm1, lm2, lm3)

	(1)	(2)	(3)
(Intercept)	981.607	900.067	6.269 ***
	(720.175)	(2431.815)	(0.782)
carat	4328.324 ***		3.531 ***
	(136.755)		(0.149)
depth	-27.785 *	-6.804	-0.019
	(11.656)	(39.293)	(0.013)
factor(color, ordered = FALSE)E		449.490
		(239.388)
factor(color, ordered = FALSE)F		391.705
		(290.880)
factor(color, ordered = FALSE)G		583.111
		(308.513)
factor(color, ordered = FALSE)H		126.916
		(256.367)
factor(color, ordered = FALSE)I		-47.220
		(253.092)
factor(color, ordered = FALSE)J		-123.430
		(269.157)
N	100	100	100
R2	0.912	0.123	0.854
logLik	-675.703	-790.788	6.822
AIC	1359.405	1599.576	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

The basic output includes estimates, standard errors and summary statistics.

Some of those variable names are hard to read. We can change them by providing a named vector of variables in the coefs argument.

color_names <- grep("factor", names(coef(lm2)), value = TRUE)
names(color_names) <- gsub(".*)(.)", "Color: \\1", color_names)

huxreg(lm1, lm2, lm3, coefs = c("Carat" = "carat", "Depth" = "depth", color_names))

	(1)	(2)	(3)
Carat	4328.324 ***		3.531 ***
	(136.755)		(0.149)
Depth	-27.785 *	-6.804	-0.019
	(11.656)	(39.293)	(0.013)
Color: E		449.490
		(239.388)
Color: F		391.705
		(290.880)
Color: G		583.111
		(308.513)
Color: H		126.916
		(256.367)
Color: I		-47.220
		(253.092)
Color: J		-123.430
		(269.157)
N	100	100	100
R2	0.912	0.123	0.854
logLik	-675.703	-790.788	6.822
AIC	1359.405	1599.576	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

Or, since the output from huxreg is just a huxtable, we could just edit its contents directly.

diamond_regs <- huxreg(lm1, lm2, lm3)
diamond_regs[seq(8, 18, 2), 1] <- paste("Color:", LETTERS[5:10])

# prints the same as above

Of course, we aren’t limited to just changing names. We can also make our table prettier. Let’s put our footnote in italic, add a caption, and highlight the cell background of significant coefficients. All of these are just standard huxtable commands.

suppressPackageStartupMessages(library(dplyr))

diamond_regs |> 
      map_background_color(-1, -1, by_regex(
        "\\*" = "yellow"
      )) |> 
      set_italic(final(1), 1) |> 
      set_caption("Linear regressions of diamond prices")

Linear regressions of diamond prices
	(1)	(2)	(3)
(Intercept)	981.607	900.067	6.269 ***
	(720.175)	(2431.815)	(0.782)
carat	4328.324 ***		3.531 ***
	(136.755)		(0.149)
depth	-27.785 *	-6.804	-0.019
	(11.656)	(39.293)	(0.013)
Color: E		449.490
		(239.388)
Color: F		391.705
		(290.880)
Color: G		583.111
		(308.513)
Color: H		126.916
		(256.367)
Color: I		-47.220
		(253.092)
Color: J		-123.430
		(269.157)
N	100	100	100
R2	0.912	0.123	0.854
logLik	-675.703	-790.788	6.822
AIC	1359.405	1599.576	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

By default, standard errors are shown below coefficient estimates. To display them in a column to the right, use error_pos = "right":

huxreg(lm1, lm3, error_pos = "right")

	(1)		(2)
(Intercept)	981.607	(720.175)	6.269 ***	(0.782)
carat	4328.324 ***	(136.755)	3.531 ***	(0.149)
depth	-27.785 *	(11.656)	-0.019	(0.013)
N	100		100
R2	0.912		0.854
logLik	-675.703		6.822
AIC	1359.405		-5.644
* p < 0.001; p < 0.01; * p < 0.05.

This will give column headings a column span of 2.

To display standard errors in the same cell as estimates, use error_pos = "same":

huxreg(lm1, lm3, error_pos = "same")

	(1)	(2)
(Intercept)	981.607 (720.175)	6.269 *** (0.782)
carat	4328.324 *** (136.755)	3.531 *** (0.149)
depth	-27.785 * (11.656)	-0.019 (0.013)
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

You can change the default column headings by naming the model arguments:

huxreg("Price" = lm1, "Log price" = lm3)

	Price	Log price
(Intercept)	981.607	6.269 ***
	(720.175)	(0.782)
carat	4328.324 ***	3.531 ***
	(136.755)	(0.149)
depth	-27.785 *	-0.019
	(11.656)	(0.013)
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

To display a particular row of summary statistics, use the statistics parameter. This should be a character vector. Valid values are anything returned from your models by broom::glance:

gl <- as_hux(broom::glance(lm1))

gl |> 
      restack_down(cols = 3, on_remainder = "fill") |> 
      set_bold(odds, everywhere)

r.squared	adj.r.squared	sigma
0.912	0.91	211
statistic	p.value	df
504	5.65e-52	2
logLik	AIC	BIC
-676	1.36e+03	1.37e+03
deviance	df.residual	nobs
4.33e+06	97	100

Another value you can use is "nobs", which returns the number of observations from the regression. If the statistics vector has names, these will be used for row headings:

huxreg(lm1, lm3, statistics = c("N. obs." = "nobs", 
      "R squared" = "r.squared", "F statistic" = "statistic",
      "P value" = "p.value"))

	(1)	(2)
(Intercept)	981.607	6.269 ***
	(720.175)	(0.782)
carat	4328.324 ***	3.531 ***
	(136.755)	(0.149)
depth	-27.785 *	-0.019
	(11.656)	(0.013)
N. obs.	100	100
R squared	0.912	0.854
F statistic	504.082	283.881
P value	0.000	0.000
* p < 0.001; p < 0.01; * p < 0.05.

By default, huxreg displays significance stars. You can alter the symbols used and significance levels with the stars parameter, or set stars = NULL to turn off significance stars completely.

huxreg(lm1, lm3, stars = c(`*` = 0.1, `**` = 0.05, `***` = 0.01)) # a little boastful?

	(1)	(2)
(Intercept)	981.607	6.269 ***
	(720.175)	(0.782)
carat	4328.324 ***	3.531 ***
	(136.755)	(0.149)
depth	-27.785 **	-0.019
	(11.656)	(0.013)
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.1; p < 0.05; * p < 0.01.

You aren’t limited to displaying standard errors of the estimates. If you prefer, you can display t statistics or p values, using the error_format option. Any column from tidy can be used by putting it in curly brackets:

# Another useful column: p.value
huxreg(
        lm1, lm3, 
        error_format = "[{statistic}]", 
        note         = "{stars}. T statistics in brackets."
      )

	(1)	(2)
(Intercept)	981.607	6.269 ***
	[1.363]	[8.016]
carat	4328.324 ***	3.531 ***
	[31.650]	[23.773]
depth	-27.785 *	-0.019
	[-2.384]	[-1.499]
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.001; p < 0.01; * p < 0.05. T statistics in brackets.

Here we also changed the footnote, using note. If note contains the string "{stars}" it will be replaced by a description of the significance stars used. If you don’t want a footnote, just set note = NULL.

Alternatively, you can display confidence intervals. Use ci_level to set the confidence level for the interval, then use {conf.low} and {conf.high} in error_format:

huxreg(lm1, lm3, ci_level = .99, error_format = "({conf.low} -- {conf.high})")

	(1)	(2)
(Intercept)	981.607	6.269 ***
	(-910.629 -- 2873.844)	(4.214 -- 8.324)
carat	4328.324 ***	3.531 ***
	(3969.004 -- 4687.643)	(3.140 -- 3.921)
depth	-27.785 *	-0.019
	(-58.411 -- 2.842)	(-0.052 -- 0.014)
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

To change number formatting, set the number_format parameter. This works the same as the number_format property for a huxtable - if it is numeric, numbers will be rounded to that many decimal places; if it is character, it will be taken as a format to the base R sprintf function. huxreg tries to be smart and to format summary statistics like nobs as integers.

huxreg(lm1, lm3, number_format = 2)

	(1)	(2)
(Intercept)	981.61	6.27 ***
	(720.17)	(0.78)
carat	4328.32 ***	3.53 ***
	(136.75)	(0.15)
depth	-27.78 *	-0.02
	(11.66)	(0.01)
N	100	100
R2	0.91	0.85
logLik	-675.70	6.82
AIC	1359.41	-5.64
* p < 0.001; p < 0.01; * p < 0.05.

Lastly, if you want to bold all significant coefficients, set the parameter bold_signif to a maximum significance level:

huxreg(lm1, lm3, bold_signif = 0.05)

	(1)	(2)
(Intercept)	981.607	6.269 ***
	(720.175)	(0.782)
carat	4328.324 ***	3.531 ***
	(136.755)	(0.149)
depth	-27.785 *	-0.019
	(11.656)	(0.013)
N	100	100
R2	0.912	0.854
logLik	-675.703	6.822
AIC	1359.405	-5.644
* p < 0.001; p < 0.01; * p < 0.05.

Altering data

Sometimes, you want to report different statistics for a model. For example, you might want to use robust standard errors.

One way to do this is to pass a tidy-able test object into huxreg. The function coeftest in the “lmtest” package has tidy methods defined:

library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(sandwich)
lm_robust <- coeftest(lm1, vcov = vcovHC, save = TRUE)
huxreg("Normal SEs" = lm1, "Robust SEs" = lm_robust)

	Normal SEs	Robust SEs
(Intercept)	981.607	981.607
	(720.175)	(1117.654)
carat	4328.324 ***	4328.324 ***
	(136.755)	(293.929)
depth	-27.785 *	-27.785
	(11.656)	(17.995)
N	100	100
R2	0.912	0.912
logLik	-675.703	-675.703
AIC	1359.405	1359.405
* p < 0.001; p < 0.01; * p < 0.05.

If that is not possible, you can compute statistics yourself and add them to your model using the tidy_override function:

lm_fixed <- tidy_override(lm1, p.value = c(0.5, 0.2, 0.06))
huxreg("Normal p values" = lm1, "Supplied p values" = lm_fixed)

	Normal p values	Supplied p values
(Intercept)	981.607	981.607
	(720.175)	(720.175)
carat	4328.324 ***	4328.324
	(136.755)	(136.755)
depth	-27.785 *	-27.785
	(11.656)	(11.656)
N	100	100
R2	0.912	0.912
logLik	-675.703	-675.703
AIC	1359.405	1359.405
* p < 0.001; p < 0.01; * p < 0.05.

You can override any statistics returned by tidy or glance.

If you want to completely replace the output of tidy, use the tidy_replace() function. For example, here’s how to print different coefficients for a multinomial model.

mnl <- nnet::multinom(gear ~ mpg, mtcars)

## # weights:  9 (4 variable)
## initial  value 35.155593 
## iter  10 value 23.131901
## final  value 23.129234 
## converged

tidied <- broom::tidy(mnl)
models <- list()
models[["4 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 4, ])
models[["5 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 5, ])
huxreg(models, statistics = "AIC")

	4 gears	5 gears
(Intercept)	-9.502 **	-7.691 *
	(3.262)	(3.232)
mpg	0.475 **	0.358 *
	(0.168)	(0.168)
AIC	54.258	54.258
* p < 0.001; p < 0.01; * p < 0.05.

Regression tables with huxreg

Altering data

Regression tables with `huxreg`