huxreg
Huxtable includes the function huxreg
to build a table
of regressions.
You call huxreg
with a list of models. These models can
be of any class which has a tidy
method defined in the broom package. The
method should return a list of regression coefficients with names
term
, estimate
, std.error
and
p.value
. That covers most standard regression packages.
Let’s start by running some regressions to predict a diamond’s price.
data(diamonds, package = "ggplot2")
diamonds <- diamonds[1:100,]
lm1 <- lm(price ~ carat + depth, diamonds)
lm2 <- lm(price ~ depth + factor(color, ordered = FALSE), diamonds)
lm3 <- lm(log(price) ~ carat + depth, diamonds)
Now, we use huxreg
to display the regression output side
by side.
huxreg(lm1, lm2, lm3)
(1) | (2) | (3) | |
---|---|---|---|
(Intercept) | 981.607 | 900.067 | 6.269 *** |
(720.175) | (2431.815) | (0.782) | |
carat | 4328.324 *** | 3.531 *** | |
(136.755) | (0.149) | ||
depth | -27.785 * | -6.804 | -0.019 |
(11.656) | (39.293) | (0.013) | |
factor(color, ordered = FALSE)E | 449.490 | ||
(239.388) | |||
factor(color, ordered = FALSE)F | 391.705 | ||
(290.880) | |||
factor(color, ordered = FALSE)G | 583.111 | ||
(308.513) | |||
factor(color, ordered = FALSE)H | 126.916 | ||
(256.367) | |||
factor(color, ordered = FALSE)I | -47.220 | ||
(253.092) | |||
factor(color, ordered = FALSE)J | -123.430 | ||
(269.157) | |||
N | 100 | 100 | 100 |
R2 | 0.912 | 0.123 | 0.854 |
logLik | -675.703 | -790.788 | 6.822 |
AIC | 1359.405 | 1599.576 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
The basic output includes estimates, standard errors and summary statistics.
Some of those variable names are hard to read. We can change them by
providing a named vector of variables in the coefs
argument.
color_names <- grep("factor", names(coef(lm2)), value = TRUE)
names(color_names) <- gsub(".*)(.)", "Color: \\1", color_names)
huxreg(lm1, lm2, lm3, coefs = c("Carat" = "carat", "Depth" = "depth", color_names))
(1) | (2) | (3) | |
---|---|---|---|
Carat | 4328.324 *** | 3.531 *** | |
(136.755) | (0.149) | ||
Depth | -27.785 * | -6.804 | -0.019 |
(11.656) | (39.293) | (0.013) | |
Color: E | 449.490 | ||
(239.388) | |||
Color: F | 391.705 | ||
(290.880) | |||
Color: G | 583.111 | ||
(308.513) | |||
Color: H | 126.916 | ||
(256.367) | |||
Color: I | -47.220 | ||
(253.092) | |||
Color: J | -123.430 | ||
(269.157) | |||
N | 100 | 100 | 100 |
R2 | 0.912 | 0.123 | 0.854 |
logLik | -675.703 | -790.788 | 6.822 |
AIC | 1359.405 | 1599.576 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Or, since the output from huxreg
is just a huxtable, we
could just edit its contents directly.
diamond_regs <- huxreg(lm1, lm2, lm3)
diamond_regs[seq(8, 18, 2), 1] <- paste("Color:", LETTERS[5:10])
# prints the same as above
Of course, we aren’t limited to just changing names. We can also make our table prettier. Let’s put our footnote in italic, add a caption, and highlight the cell background of significant coefficients. All of these are just standard huxtable commands.
suppressPackageStartupMessages(library(dplyr))
diamond_regs |>
map_background_color(-1, -1, by_regex(
"\\*" = "yellow"
)) |>
set_italic(final(1), 1) |>
set_caption("Linear regressions of diamond prices")
(1) | (2) | (3) | |
---|---|---|---|
(Intercept) | 981.607 | 900.067 | 6.269 *** |
(720.175) | (2431.815) | (0.782) | |
carat | 4328.324 *** | 3.531 *** | |
(136.755) | (0.149) | ||
depth | -27.785 * | -6.804 | -0.019 |
(11.656) | (39.293) | (0.013) | |
Color: E | 449.490 | ||
(239.388) | |||
Color: F | 391.705 | ||
(290.880) | |||
Color: G | 583.111 | ||
(308.513) | |||
Color: H | 126.916 | ||
(256.367) | |||
Color: I | -47.220 | ||
(253.092) | |||
Color: J | -123.430 | ||
(269.157) | |||
N | 100 | 100 | 100 |
R2 | 0.912 | 0.123 | 0.854 |
logLik | -675.703 | -790.788 | 6.822 |
AIC | 1359.405 | 1599.576 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
By default, standard errors are shown below coefficient estimates. To
display them in a column to the right, use
error_pos = "right"
:
huxreg(lm1, lm3, error_pos = "right")
(1) | (2) | |||
---|---|---|---|---|
(Intercept) | 981.607 | (720.175) | 6.269 *** | (0.782) |
carat | 4328.324 *** | (136.755) | 3.531 *** | (0.149) |
depth | -27.785 * | (11.656) | -0.019 | (0.013) |
N | 100 | 100 | ||
R2 | 0.912 | 0.854 | ||
logLik | -675.703 | 6.822 | ||
AIC | 1359.405 | -5.644 | ||
*** p < 0.001; ** p < 0.01; * p < 0.05. |
This will give column headings a column span of 2.
To display standard errors in the same cell as estimates, use
error_pos = "same"
:
huxreg(lm1, lm3, error_pos = "same")
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 (720.175) | 6.269 *** (0.782) |
carat | 4328.324 *** (136.755) | 3.531 *** (0.149) |
depth | -27.785 * (11.656) | -0.019 (0.013) |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
You can change the default column headings by naming the model arguments:
huxreg("Price" = lm1, "Log price" = lm3)
Price | Log price | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
(720.175) | (0.782) | |
carat | 4328.324 *** | 3.531 *** |
(136.755) | (0.149) | |
depth | -27.785 * | -0.019 |
(11.656) | (0.013) | |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
To display a particular row of summary statistics, use the
statistics
parameter. This should be a character vector.
Valid values are anything returned from your models by
broom::glance
:
gl <- as_hux(broom::glance(lm1))
gl |>
restack_down(cols = 3, on_remainder = "fill") |>
set_bold(odds, everywhere)
r.squared | adj.r.squared | sigma |
---|---|---|
0.912 | 0.91 | 211 |
statistic | p.value | df |
504 | 5.65e-52 | 2 |
logLik | AIC | BIC |
-676 | 1.36e+03 | 1.37e+03 |
deviance | df.residual | nobs |
4.33e+06 | 97 | 100 |
Another value you can use is "nobs"
, which returns the
number of observations from the regression. If the
statistics
vector has names, these will be used for row
headings:
huxreg(lm1, lm3, statistics = c("N. obs." = "nobs",
"R squared" = "r.squared", "F statistic" = "statistic",
"P value" = "p.value"))
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
(720.175) | (0.782) | |
carat | 4328.324 *** | 3.531 *** |
(136.755) | (0.149) | |
depth | -27.785 * | -0.019 |
(11.656) | (0.013) | |
N. obs. | 100 | 100 |
R squared | 0.912 | 0.854 |
F statistic | 504.082 | 283.881 |
P value | 0.000 | 0.000 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
By default, huxreg
displays significance stars. You can
alter the symbols used and significance levels with the
stars
parameter, or set stars = NULL
to turn
off significance stars completely.
huxreg(lm1, lm3, stars = c(`*` = 0.1, `**` = 0.05, `***` = 0.01)) # a little boastful?
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
(720.175) | (0.782) | |
carat | 4328.324 *** | 3.531 *** |
(136.755) | (0.149) | |
depth | -27.785 ** | -0.019 |
(11.656) | (0.013) | |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.01; ** p < 0.05; * p < 0.1. |
You aren’t limited to displaying standard errors of the estimates. If
you prefer, you can display t statistics or p values, using the
error_format
option. Any column from tidy
can
be used by putting it in curly brackets:
# Another useful column: p.value
huxreg(
lm1, lm3,
error_format = "[{statistic}]",
note = "{stars}. T statistics in brackets."
)
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
[1.363] | [8.016] | |
carat | 4328.324 *** | 3.531 *** |
[31.650] | [23.773] | |
depth | -27.785 * | -0.019 |
[-2.384] | [-1.499] | |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. T statistics in brackets. |
Here we also changed the footnote, using note
. If
note
contains the string "{stars}"
it will be
replaced by a description of the significance stars used. If you don’t
want a footnote, just set note = NULL
.
Alternatively, you can display confidence intervals. Use
ci_level
to set the confidence level for the interval, then
use {conf.low}
and {conf.high}
in
error_format
:
huxreg(lm1, lm3, ci_level = .99, error_format = "({conf.low} -- {conf.high})")
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
(-910.629 -- 2873.844) | (4.214 -- 8.324) | |
carat | 4328.324 *** | 3.531 *** |
(3969.004 -- 4687.643) | (3.140 -- 3.921) | |
depth | -27.785 * | -0.019 |
(-58.411 -- 2.842) | (-0.052 -- 0.014) | |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
To change number formatting, set the number_format
parameter. This works the same as the number_format
property for a huxtable - if it is numeric, numbers will be rounded to
that many decimal places; if it is character, it will be taken as a
format to the base R sprintf
function. huxreg
tries to be smart and to format summary statistics like
nobs
as integers.
huxreg(lm1, lm3, number_format = 2)
(1) | (2) | |
---|---|---|
(Intercept) | 981.61 | 6.27 *** |
(720.17) | (0.78) | |
carat | 4328.32 *** | 3.53 *** |
(136.75) | (0.15) | |
depth | -27.78 * | -0.02 |
(11.66) | (0.01) | |
N | 100 | 100 |
R2 | 0.91 | 0.85 |
logLik | -675.70 | 6.82 |
AIC | 1359.41 | -5.64 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Lastly, if you want to bold all significant coefficients, set the
parameter bold_signif
to a maximum significance level:
huxreg(lm1, lm3, bold_signif = 0.05)
(1) | (2) | |
---|---|---|
(Intercept) | 981.607 | 6.269 *** |
(720.175) | (0.782) | |
carat | 4328.324 *** | 3.531 *** |
(136.755) | (0.149) | |
depth | -27.785 * | -0.019 |
(11.656) | (0.013) | |
N | 100 | 100 |
R2 | 0.912 | 0.854 |
logLik | -675.703 | 6.822 |
AIC | 1359.405 | -5.644 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
Sometimes, you want to report different statistics for a model. For example, you might want to use robust standard errors.
One way to do this is to pass a tidy
-able test object
into huxreg
. The function coeftest
in the
“lmtest” package has tidy
methods defined:
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(sandwich)
lm_robust <- coeftest(lm1, vcov = vcovHC, save = TRUE)
huxreg("Normal SEs" = lm1, "Robust SEs" = lm_robust)
Normal SEs | Robust SEs | |
---|---|---|
(Intercept) | 981.607 | 981.607 |
(720.175) | (1117.654) | |
carat | 4328.324 *** | 4328.324 *** |
(136.755) | (293.929) | |
depth | -27.785 * | -27.785 |
(11.656) | (17.995) | |
N | 100 | 100 |
R2 | 0.912 | 0.912 |
logLik | -675.703 | -675.703 |
AIC | 1359.405 | 1359.405 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
If that is not possible, you can compute statistics yourself and add
them to your model using the tidy_override
function:
lm_fixed <- tidy_override(lm1, p.value = c(0.5, 0.2, 0.06))
huxreg("Normal p values" = lm1, "Supplied p values" = lm_fixed)
Normal p values | Supplied p values | |
---|---|---|
(Intercept) | 981.607 | 981.607 |
(720.175) | (720.175) | |
carat | 4328.324 *** | 4328.324 |
(136.755) | (136.755) | |
depth | -27.785 * | -27.785 |
(11.656) | (11.656) | |
N | 100 | 100 |
R2 | 0.912 | 0.912 |
logLik | -675.703 | -675.703 |
AIC | 1359.405 | 1359.405 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
You can override any statistics returned by tidy
or
glance
.
If you want to completely replace the output of tidy, use the
tidy_replace()
function. For example, here’s how to print
different coefficients for a multinomial model.
mnl <- nnet::multinom(gear ~ mpg, mtcars)
## # weights: 9 (4 variable)
## initial value 35.155593
## iter 10 value 23.131901
## final value 23.129234
## converged
tidied <- broom::tidy(mnl)
models <- list()
models[["4 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 4, ])
models[["5 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 5, ])
huxreg(models, statistics = "AIC")
4 gears | 5 gears | |
---|---|---|
(Intercept) | -9.502 ** | -7.691 * |
(3.262) | (3.232) | |
mpg | 0.475 ** | 0.358 * |
(0.168) | (0.168) | |
AIC | 54.258 | 54.258 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |