Skip to contents

Regression tables with huxreg

Huxtable includes the function huxreg to build a table of regressions.

You call huxreg with a list of models. These models can be of any class which has a tidy method defined in the broom package. The method should return a list of regression coefficients with names term, estimate, std.error and p.value. That covers most standard regression packages.

Let’s start by running some regressions to predict a diamond’s price.

data(diamonds, package = "ggplot2")
diamonds <- diamonds[1:100,]

lm1 <- lm(price ~ carat + depth, diamonds)
lm2 <- lm(price ~ depth + factor(color, ordered = FALSE), diamonds)
lm3 <- lm(log(price) ~ carat + depth, diamonds)

Now, we use huxreg to display the regression output side by side.

huxreg(lm1, lm2, lm3)
(1) (2) (3)
(Intercept) 981.607     900.067  6.269 ***
(720.175)    (2431.815) (0.782)   
carat 4328.324 ***       3.531 ***
(136.755)          (0.149)   
depth -27.785 *   -6.804  -0.019    
(11.656)    (39.293) (0.013)   
factor(color, ordered = FALSE)E          449.490          
         (239.388)         
factor(color, ordered = FALSE)F          391.705          
         (290.880)         
factor(color, ordered = FALSE)G          583.111          
         (308.513)         
factor(color, ordered = FALSE)H          126.916          
         (256.367)         
factor(color, ordered = FALSE)I          -47.220          
         (253.092)         
factor(color, ordered = FALSE)J          -123.430          
         (269.157)         
N 100         100      100        
R2 0.912     0.123  0.854    
logLik -675.703     -790.788  6.822    
AIC 1359.405     1599.576  -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

The basic output includes estimates, standard errors and summary statistics.

Some of those variable names are hard to read. We can change them by providing a named vector of variables in the coefs argument.

color_names <- grep("factor", names(coef(lm2)), value = TRUE)
names(color_names) <- gsub(".*)(.)", "Color: \\1", color_names)

huxreg(lm1, lm2, lm3, coefs = c("Carat" = "carat", "Depth" = "depth", color_names))
(1) (2) (3)
Carat 4328.324 ***       3.531 ***
(136.755)          (0.149)   
Depth -27.785 *   -6.804  -0.019    
(11.656)    (39.293) (0.013)   
Color: E          449.490          
         (239.388)         
Color: F          391.705          
         (290.880)         
Color: G          583.111          
         (308.513)         
Color: H          126.916          
         (256.367)         
Color: I          -47.220          
         (253.092)         
Color: J          -123.430          
         (269.157)         
N 100         100      100        
R2 0.912     0.123  0.854    
logLik -675.703     -790.788  6.822    
AIC 1359.405     1599.576  -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Or, since the output from huxreg is just a huxtable, we could just edit its contents directly.

diamond_regs <- huxreg(lm1, lm2, lm3)
diamond_regs[seq(8, 18, 2), 1] <- paste("Color:", LETTERS[5:10])

# prints the same as above

Of course, we aren’t limited to just changing names. We can also make our table prettier. Let’s put our footnote in italic, add a caption, and highlight the cell background of significant coefficients. All of these are just standard huxtable commands.

suppressPackageStartupMessages(library(dplyr))

diamond_regs |> 
      map_background_color(-1, -1, by_regex(
        "\\*" = "yellow"
      )) |> 
      set_italic(final(1), 1) |> 
      set_caption("Linear regressions of diamond prices")
Linear regressions of diamond prices
(1) (2) (3)
(Intercept) 981.607     900.067  6.269 ***
(720.175)    (2431.815) (0.782)   
carat 4328.324 ***       3.531 ***
(136.755)          (0.149)   
depth -27.785 *   -6.804  -0.019    
(11.656)    (39.293) (0.013)   
Color: E          449.490          
         (239.388)         
Color: F          391.705          
         (290.880)         
Color: G          583.111          
         (308.513)         
Color: H          126.916          
         (256.367)         
Color: I          -47.220          
         (253.092)         
Color: J          -123.430          
         (269.157)         
N 100         100      100        
R2 0.912     0.123  0.854    
logLik -675.703     -790.788  6.822    
AIC 1359.405     1599.576  -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

By default, standard errors are shown below coefficient estimates. To display them in a column to the right, use error_pos = "right":

huxreg(lm1, lm3, error_pos = "right")
(1) (2)
(Intercept) 981.607     (720.175) 6.269 *** (0.782)
carat 4328.324 *** (136.755) 3.531 *** (0.149)
depth -27.785 *   (11.656) -0.019     (0.013)
N 100               100              
R2 0.912           0.854          
logLik -675.703           6.822          
AIC 1359.405           -5.644          
*** p < 0.001; ** p < 0.01; * p < 0.05.

This will give column headings a column span of 2.

To display standard errors in the same cell as estimates, use error_pos = "same":

huxreg(lm1, lm3, error_pos = "same")
(1) (2)
(Intercept) 981.607 (720.175) 6.269 *** (0.782)
carat 4328.324 *** (136.755) 3.531 *** (0.149)
depth -27.785 * (11.656) -0.019 (0.013)
N 100      100     
R2 0.912  0.854 
logLik -675.703  6.822 
AIC 1359.405  -5.644 
*** p < 0.001; ** p < 0.01; * p < 0.05.

You can change the default column headings by naming the model arguments:

huxreg("Price" = lm1, "Log price" = lm3)
Price Log price
(Intercept) 981.607     6.269 ***
(720.175)    (0.782)   
carat 4328.324 *** 3.531 ***
(136.755)    (0.149)   
depth -27.785 *   -0.019    
(11.656)    (0.013)   
N 100         100        
R2 0.912     0.854    
logLik -675.703     6.822    
AIC 1359.405     -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

To display a particular row of summary statistics, use the statistics parameter. This should be a character vector. Valid values are anything returned from your models by broom::glance:

gl <- as_hux(broom::glance(lm1))

gl |> 
      restack_down(cols = 3, on_remainder = "fill") |> 
      set_bold(odds, everywhere)
r.squared adj.r.squared sigma
0.912    0.91     211       
statistic p.value df
504        5.65e-52 2       
logLik AIC BIC
-676        1.36e+03 1.37e+03
deviance df.residual nobs
4.33e+06 97 100

Another value you can use is "nobs", which returns the number of observations from the regression. If the statistics vector has names, these will be used for row headings:

huxreg(lm1, lm3, statistics = c("N. obs." = "nobs", 
      "R squared" = "r.squared", "F statistic" = "statistic",
      "P value" = "p.value"))
(1) (2)
(Intercept) 981.607     6.269 ***
(720.175)    (0.782)   
carat 4328.324 *** 3.531 ***
(136.755)    (0.149)   
depth -27.785 *   -0.019    
(11.656)    (0.013)   
N. obs. 100         100        
R squared 0.912     0.854    
F statistic 504.082     283.881    
P value 0.000     0.000    
*** p < 0.001; ** p < 0.01; * p < 0.05.

By default, huxreg displays significance stars. You can alter the symbols used and significance levels with the stars parameter, or set stars = NULL to turn off significance stars completely.

huxreg(lm1, lm3, stars = c(`*` = 0.1, `**` = 0.05, `***` = 0.01)) # a little boastful?
(1) (2)
(Intercept) 981.607     6.269 ***
(720.175)    (0.782)   
carat 4328.324 *** 3.531 ***
(136.755)    (0.149)   
depth -27.785 **  -0.019    
(11.656)    (0.013)   
N 100         100        
R2 0.912     0.854    
logLik -675.703     6.822    
AIC 1359.405     -5.644    
* p < 0.1; ** p < 0.05; *** p < 0.01.

You aren’t limited to displaying standard errors of the estimates. If you prefer, you can display t statistics or p values, using the error_format option. Any column from tidy can be used by putting it in curly brackets:

# Another useful column: p.value
huxreg(
        lm1, lm3, 
        error_format = "[{statistic}]", 
        note         = "{stars}. T statistics in brackets."
      )
(1) (2)
(Intercept) 981.607     6.269 ***
[1.363]    [8.016]   
carat 4328.324 *** 3.531 ***
[31.650]    [23.773]   
depth -27.785 *   -0.019    
[-2.384]    [-1.499]   
N 100         100        
R2 0.912     0.854    
logLik -675.703     6.822    
AIC 1359.405     -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05. T statistics in brackets.

Here we also changed the footnote, using note. If note contains the string "{stars}" it will be replaced by a description of the significance stars used. If you don’t want a footnote, just set note = NULL.

Alternatively, you can display confidence intervals. Use ci_level to set the confidence level for the interval, then use {conf.low} and {conf.high} in error_format:

huxreg(lm1, lm3, ci_level = .99, error_format = "({conf.low} -- {conf.high})")
(1) (2)
(Intercept) 981.607     6.269 ***
(-910.629 -- 2873.844)    (4.214 -- 8.324)   
carat 4328.324 *** 3.531 ***
(3969.004 -- 4687.643)    (3.140 -- 3.921)   
depth -27.785 *   -0.019    
(-58.411 -- 2.842)    (-0.052 -- 0.014)   
N 100         100        
R2 0.912     0.854    
logLik -675.703     6.822    
AIC 1359.405     -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

To change number formatting, set the number_format parameter. This works the same as the number_format property for a huxtable - if it is numeric, numbers will be rounded to that many decimal places; if it is character, it will be taken as a format to the base R sprintf function. huxreg tries to be smart and to format summary statistics like nobs as integers.

huxreg(lm1, lm3, number_format = 2)
(1) (2)
(Intercept) 981.61     6.27 ***
(720.17)    (0.78)   
carat 4328.32 *** 3.53 ***
(136.75)    (0.15)   
depth -27.78 *   -0.02    
(11.66)    (0.01)   
N 100        100       
R2 0.91     0.85    
logLik -675.70     6.82    
AIC 1359.41     -5.64    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Lastly, if you want to bold all significant coefficients, set the parameter bold_signif to a maximum significance level:

huxreg(lm1, lm3, bold_signif = 0.05)
(1) (2)
(Intercept) 981.607     6.269 ***
(720.175)    (0.782)   
carat 4328.324 *** 3.531 ***
(136.755)    (0.149)   
depth -27.785 *   -0.019    
(11.656)    (0.013)   
N 100         100        
R2 0.912     0.854    
logLik -675.703     6.822    
AIC 1359.405     -5.644    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Altering data

Sometimes, you want to report different statistics for a model. For example, you might want to use robust standard errors.

One way to do this is to pass a tidy-able test object into huxreg. The function coeftest in the “lmtest” package has tidy methods defined:

library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(sandwich)
lm_robust <- coeftest(lm1, vcov = vcovHC, save = TRUE)
huxreg("Normal SEs" = lm1, "Robust SEs" = lm_robust)
Normal SEs Robust SEs
(Intercept) 981.607     981.607    
(720.175)    (1117.654)   
carat 4328.324 *** 4328.324 ***
(136.755)    (293.929)   
depth -27.785 *   -27.785    
(11.656)    (17.995)   
N 100         100        
R2 0.912     0.912    
logLik -675.703     -675.703    
AIC 1359.405     1359.405    
*** p < 0.001; ** p < 0.01; * p < 0.05.

If that is not possible, you can compute statistics yourself and add them to your model using the tidy_override function:

lm_fixed <- tidy_override(lm1, p.value = c(0.5, 0.2, 0.06))
huxreg("Normal p values" = lm1, "Supplied p values" = lm_fixed)
Normal p values Supplied p values
(Intercept) 981.607     981.607 
(720.175)    (720.175)
carat 4328.324 *** 4328.324 
(136.755)    (136.755)
depth -27.785 *   -27.785 
(11.656)    (11.656)
N 100         100     
R2 0.912     0.912 
logLik -675.703     -675.703 
AIC 1359.405     1359.405 
*** p < 0.001; ** p < 0.01; * p < 0.05.

You can override any statistics returned by tidy or glance.

If you want to completely replace the output of tidy, use the tidy_replace() function. For example, here’s how to print different coefficients for a multinomial model.

mnl <- nnet::multinom(gear ~ mpg, mtcars)
## # weights:  9 (4 variable)
## initial  value 35.155593 
## iter  10 value 23.131901
## final  value 23.129234 
## converged
tidied <- broom::tidy(mnl)
models <- list()
models[["4 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 4, ])
models[["5 gears"]] <- tidy_replace(mnl, tidied[tidied$y.level == 5, ])
huxreg(models, statistics = "AIC")
4 gears 5 gears
(Intercept) -9.502 ** -7.691 *
(3.262)   (3.232) 
mpg 0.475 ** 0.358 *
(0.168)   (0.168) 
AIC 54.258    54.258  
*** p < 0.001; ** p < 0.01; * p < 0.05.