'LaTeX' equation for R models — extract

Extract the variable names from a model to produce a 'LaTeX' equation. Supports any model where there is a broom::tidy() method. This is a generic function with methods for lmerMod objects obtained with lme4::lmer(), glmerMod objects with lme4::glmer(), forecast_ARIMA with forecast::Arima() and default, with the later further covering most "base" R models implemented in broom::tidy() like lm objects with stats::lm(), glm objects with stats::glm() or polr objects with MASS::polr(). The default method also supports clm objects obtained with ordinal::clm().

extract_eq(
  model,
  intercept = "alpha",
  greek = "beta",
  greek_colors = NULL,
  subscript_colors = NULL,
  var_colors = NULL,
  var_subscript_colors = NULL,
  raw_tex = FALSE,
  swap_var_names = NULL,
  swap_subscript_names = NULL,
  ital_vars = FALSE,
  label = NULL,
  index_factors = FALSE,
  show_distribution = FALSE,
  wrap = FALSE,
  terms_per_line = 4,
  operator_location = "end",
  align_env = "aligned",
  use_coefs = FALSE,
  coef_digits = 2,
  fix_signs = TRUE,
  font_size = NULL,
  mean_separate = NULL,
  return_variances = FALSE,
  se_subscripts = FALSE,
  ...
)

Arguments

model: A fitted model
intercept: How should the intercept be displayed? Default is "alpha", but can also accept "beta", in which case the it will be displayed as beta zero.
greek: What notation should be used for coefficients? Currently only accepts "beta" (with plans for future development). Can be used in combination with raw_tex to use any notation, e.g., "\hat{\beta}".
greek_colors: The colors of the greek notation in the equation. Must be a single color (named or HTML hex code) or a vector of colors (which will be recycled if smaller than the number of terms in the model). When rendering to PDF, I suggest using HTML hex codes, as not all named colors are recognized by LaTeX, but equatiomatic will internally create the color definitions for you if HTML codes are supplied. Note that this is not yet implemented for mixed effects models (lme4).
subscript_colors: The colors of the subscripts for the greek notation. The argument structure is equivalent to greek_colors (i.e., see above for more detail).
var_colors: The color of the variable names. This takes a named vector of the form c("variable" = "color"). For example c("bill_length_mm" = "#00d4fa", "island" = "#00fa85"). Colors can be names (e.g., "red") or HTML hex codes, as shown in the example.
var_subscript_colors: The colors of the factor subscripts for categorical variables. The interface for this is equivalent to var_colors, and all subscripts for a given variable will be displayed in the provided color. For example, the code c("island" = "green") would result in the subscripts for "Dream" and "Torgersen" being green (assuming "Biscoe" was the reference group).
raw_tex: Logical. Is the greek code being passed to denote coefficients raw tex code?
swap_var_names: A vector of the form c("old_var_name" = "new name"). For example: c("bill_length_mm" = "Bill Length (MM)").
swap_subscript_names: A vector of the form c("old_subscript_name" = "new name"). For example: c("f" = "Female").
ital_vars: Logical, defaults to FALSE. Should the variable names not be wrapped in the \operatorname{} command?
label: A label for the equation, which can then be used for in-text references. See example here. Note that this only works for PDF output. The in-text references also must match the label exactly, and must be formatted as \ref{eq: label}, where label is a place holder for the specific label. Notice the space after the colon before the label. This also must be there, or the cross-reference will fail.
index_factors: Logical, defaults to FALSE. Should the factors be indexed, rather than using subscripts to display all levels?
show_distribution: Logical. When fitting a logistic or probit regression, should the binomial distribution be displayed? Defaults to FALSE.
wrap: Logical, defaults to FALSE. Should the terms on the right-hand side of the equation be split into multiple lines? This is helpful with models with many terms.
terms_per_line: Integer, defaults to 4. The number of right-hand side terms to include per line. Used only when wrap is TRUE.
operator_location: Character, one of “end” (the default) or “start”. When terms are split across multiple lines, they are split at mathematical operators like +. If set to “end”, each line will end with a trailing operator (+ or -). If set to “start”, each line will begin with an operator.
align_env: TeX environment to wrap around equation. Must be one of aligned, aligned*, align, or align*. Defaults to aligned.
use_coefs: Logical, defaults to FALSE. Should the actual model estimates be included in the equation instead of math symbols?
coef_digits: Integer, defaults to 2. The number of decimal places to round to when displaying model estimates.
fix_signs: Logical, defaults to TRUE. If disabled, coefficient estimates that are negative are preceded with a "+" (e.g. 5(x) + -3(z)). If enabled, the "+ -" is replaced with a "-" (e.g. 5(x) - 3(z)).
font_size: The font size of the equation. Defaults to default of the output format. Takes any of the standard LaTeX arguments (see here).
mean_separate: Currently only support for lmer models. Should the mean structure be inside or separated from the normal distribution? Defaults to NULL, in which case it will become TRUE if there are more than three fixed-effect parameters. If TRUE, the equation will be displayed as, for example, outcome ~ N(mu, sigma); mu = alpha + beta_1(wave). If FALSE, this same equation would be outcome ~ N(alpha + beta, sigma).
return_variances: Logical. When use_coefs = TRUE with a mixed effects model (e.g., lme4::lmer()), should the variances and co-variances be returned? If FALSE (the default) standard deviations and correlations are returned instead.
se_subscripts: Logical. If se_subscripts = TRUE then the equation will include the standard errors below each coefficient. This is supported for lm and glm models.
...: Additional arguments (for future development; not currently used).

Value

A character of class “equation”.

Details

The different methods all use the same arguments, but not all arguments are suitable to all models. Check here above to determine if a feature is implemented for a given model.

Examples

# Simple model
mod1 <- lm(mpg ~ cyl + disp, mtcars)
extract_eq(mod1)
#> $$
#> \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \epsilon
#> $$

# Include all variables
mod2 <- lm(mpg ~ ., mtcars)
extract_eq(mod2)
#> $$
#> \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \beta_{3}(\operatorname{hp}) + \beta_{4}(\operatorname{drat}) + \beta_{5}(\operatorname{wt}) + \beta_{6}(\operatorname{qsec}) + \beta_{7}(\operatorname{vs}) + \beta_{8}(\operatorname{am}) + \beta_{9}(\operatorname{gear}) + \beta_{10}(\operatorname{carb}) + \epsilon
#> $$

# Works for categorical variables too, putting levels as subscripts
data("penguins", package = "equatiomatic")
mod3 <- lm(body_mass_g ~ bill_length_mm + species, penguins)
extract_eq(mod3)
#> $$
#> \operatorname{body\_mass\_g} = \alpha + \beta_{1}(\operatorname{bill\_length\_mm}) + \beta_{2}(\operatorname{species}_{\operatorname{Chinstrap}}) + \beta_{3}(\operatorname{species}_{\operatorname{Gentoo}}) + \epsilon
#> $$

set.seed(8675309)
d <- data.frame(
  cat1 = rep(letters[1:3], 100),
  cat2 = rep(LETTERS[1:3], each = 100),
  cont1 = rnorm(300, 100, 1),
  cont2 = rnorm(300, 50, 5),
  out = rnorm(300, 10, 0.5)
)
mod4 <- lm(out ~ ., d)
extract_eq(mod4)
#> $$
#> \operatorname{out} = \alpha + \beta_{1}(\operatorname{cat1}_{\operatorname{b}}) + \beta_{2}(\operatorname{cat1}_{\operatorname{c}}) + \beta_{3}(\operatorname{cat2}_{\operatorname{B}}) + \beta_{4}(\operatorname{cat2}_{\operatorname{C}}) + \beta_{5}(\operatorname{cont1}) + \beta_{6}(\operatorname{cont2}) + \epsilon
#> $$

# Don't italicize terms
extract_eq(mod1, ital_vars = FALSE)
#> $$
#> \operatorname{mpg} = \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \epsilon
#> $$

# Wrap equations in an "aligned" environment
extract_eq(mod2, wrap = TRUE)
#> $$
#> \begin{aligned}
#> \operatorname{mpg} &= \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \beta_{3}(\operatorname{hp})\ + \\
#> &\quad \beta_{4}(\operatorname{drat}) + \beta_{5}(\operatorname{wt}) + \beta_{6}(\operatorname{qsec}) + \beta_{7}(\operatorname{vs})\ + \\
#> &\quad \beta_{8}(\operatorname{am}) + \beta_{9}(\operatorname{gear}) + \beta_{10}(\operatorname{carb}) + \epsilon
#> \end{aligned}
#> $$

# Wider equation wrapping
extract_eq(mod2, wrap = TRUE, terms_per_line = 4)
#> $$
#> \begin{aligned}
#> \operatorname{mpg} &= \alpha + \beta_{1}(\operatorname{cyl}) + \beta_{2}(\operatorname{disp}) + \beta_{3}(\operatorname{hp})\ + \\
#> &\quad \beta_{4}(\operatorname{drat}) + \beta_{5}(\operatorname{wt}) + \beta_{6}(\operatorname{qsec}) + \beta_{7}(\operatorname{vs})\ + \\
#> &\quad \beta_{8}(\operatorname{am}) + \beta_{9}(\operatorname{gear}) + \beta_{10}(\operatorname{carb}) + \epsilon
#> \end{aligned}
#> $$

# Include model estimates instead of Greek letters
extract_eq(mod2, wrap = TRUE, terms_per_line = 2, use_coefs = TRUE)
#> $$
#> \begin{aligned}
#> \operatorname{\widehat{mpg}} &= 12.3 - 0.11(\operatorname{cyl})\ + \\
#> &\quad 0.01(\operatorname{disp}) - 0.02(\operatorname{hp})\ + \\
#> &\quad 0.79(\operatorname{drat}) - 3.72(\operatorname{wt})\ + \\
#> &\quad 0.82(\operatorname{qsec}) + 0.32(\operatorname{vs})\ + \\
#> &\quad 2.52(\operatorname{am}) + 0.66(\operatorname{gear})\ - \\
#> &\quad 0.2(\operatorname{carb})
#> \end{aligned}
#> $$

# Don't fix doubled-up "+ -" signs
extract_eq(mod2, wrap = TRUE, terms_per_line = 4, use_coefs = TRUE, fix_signs = FALSE)
#> $$
#> \begin{aligned}
#> \operatorname{\widehat{mpg}} &= 12.3 + -0.11(\operatorname{cyl}) + 0.01(\operatorname{disp}) + -0.02(\operatorname{hp})\ + \\
#> &\quad 0.79(\operatorname{drat}) + -3.72(\operatorname{wt}) + 0.82(\operatorname{qsec}) + 0.32(\operatorname{vs})\ + \\
#> &\quad 2.52(\operatorname{am}) + 0.66(\operatorname{gear}) + -0.2(\operatorname{carb})
#> \end{aligned}
#> $$

# Use indices for factors instead of subscripts
extract_eq(mod2, wrap = TRUE, terms_per_line = 4, index_factors = TRUE)
#> $$
#> \begin{aligned}
#> \operatorname{mpg} &= \alpha + \operatorname{cyl} + \operatorname{disp} + \operatorname{hp}\ + \\
#> &\quad \operatorname{drat} + \operatorname{wt} + \operatorname{qsec} + \operatorname{vs}\ + \\
#> &\quad \operatorname{am} + \operatorname{gear} + \operatorname{carb} + \epsilon
#> \end{aligned}
#> $$

# Use other model types, like glm
set.seed(8675309)
d <- data.frame(
  out = sample(0:1, 100, replace = TRUE),
  cat1 = rep(letters[1:3], 100),
  cat2 = rep(LETTERS[1:3], each = 100),
  cont1 = rnorm(300, 100, 1),
  cont2 = rnorm(300, 50, 5)
)
mod5 <- glm(out ~ ., data = d, family = binomial(link = "logit"))
extract_eq(mod5, wrap = TRUE)
#> $$
#> \begin{aligned}
#> \log\left[ \frac { P( \operatorname{out} = \operatorname{1} ) }{ 1 - P( \operatorname{out} = \operatorname{1} ) } \right] &= \alpha + \beta_{1}(\operatorname{cat1}_{\operatorname{b}}) + \beta_{2}(\operatorname{cat1}_{\operatorname{c}}) + \beta_{3}(\operatorname{cat2}_{\operatorname{B}})\ + \\
#> &\quad \beta_{4}(\operatorname{cat2}_{\operatorname{C}}) + \beta_{5}(\operatorname{cont1}) + \beta_{6}(\operatorname{cont2})
#> \end{aligned}
#> $$