## Dynamic variable names in R regressions

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually `!!rlang::sym()` solves this kind of problem for me just fine, but it somehow fails in regressions. A minimal example would be the following:

```y= runif(1000)
x1 = runif(1000)
x2 = runif(1000)

df2= data.frame(y,x1,x2)
summary(lm(y ~ x1+x2, data=df2)) ## works

var = "x1"
summary(lm(y ~ !!rlang::sym(var)) +x2, data=df2) # gives an error
```

My understanding was that `!!rlang::sym(var))` takes the values of `var` (namely x1) and puts that in the code in a way that R thinks this is a variable (not a char). BUt I seem to be wrong. Can anyone enlighten me?

Personally, I like to do this with some computing on the language. For me, a combination of `bquote` with `eval` is easiest (to remember).

```var <- as.symbol(var)
eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
#Call:
#lm(formula = y ~ x1 + x2, data = df2)
#
#Residuals:
#     Min       1Q   Median       3Q      Max
#-0.49298 -0.26248 -0.00046  0.24111  0.51988
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)  0.50244    0.02480  20.258   <2e-16 ***
#x1          -0.01468    0.03161  -0.464    0.643
#x2          -0.01635    0.03227  -0.507    0.612
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.2878 on 997 degrees of freedom
#Multiple R-squared:  0.0004708,    Adjusted R-squared:  -0.001534
#F-statistic: 0.2348 on 2 and 997 DF,  p-value: 0.7908
```

I find this superior to any approach that doesn't show the same call as `summary(lm(y ~ x1+x2, data=df2))`.

The bang-bang operator `!!` only works with "tidy" functions. It's not a part of the core R language. A base R function like `lm()` has no idea how to expand such operators. Instead, you need to wrap those in functions that can do the expansion. `rlang::expr` is one such example

```rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2)))
# summary(lm(y ~ x1 + x2, data = df2))
```

Then you need to use `rlang::eval_tidy` to actually evaluate it

```rlang::eval_tidy(rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2))))

# Call:
# lm(formula = y ~ x1 + x2, data = df2)
#
# Residuals:
#     Min       1Q   Median       3Q      Max
# -0.49178 -0.25482  0.00027  0.24566  0.50730
#
# Coefficients:
#               Estimate Std. Error t value Pr(>|t|)
# (Intercept)  0.4953683  0.0242949  20.390   <2e-16 ***
# x1          -0.0006298  0.0314389  -0.020    0.984
# x2          -0.0052848  0.0318073  -0.166    0.868
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2882 on 997 degrees of freedom
# Multiple R-squared:  2.796e-05,   Adjusted R-squared:  -0.001978
# F-statistic: 0.01394 on 2 and 997 DF,  p-value: 0.9862
```

You can see this version preserves the expanded formula in the model object.

1) Just use `lm(df2)` or if `lm` has additional columns beyond what is shown in the question but we just want to regress on `x1` and `x2` then

```df3 <- df2[c("y", var, "x2")]
lm(df3)
```

The following are optional and only apply if it is important that the formula appear in the output as if it had been explicitly given. Compute the formula `fo` using the first line below and then run `lm` as in the second line:

```fo <- formula(model.frame(df3))
fm <- do.call("lm", list(fo, quote(df3)))
```

or just run `lm` as in the first line below and then write the formula into it as in the second line:

```fm <- lm(df3)
fm\$call <- formula(model.frame(df3))
```

Either one gives this:

```> fm
Call:
lm(formula = y ~ x1 + x2, data = df3)

Coefficients:
(Intercept)           x1           x2
0.44752      0.04278      0.05011
```

2) character string `lm` accepts a character string for the formula so this also works. The `fn\$` causes substitution to occur in the character arguments.

```library(gsubfn)

fn\$lm("y ~ \$var + x2", quote(df2))
```

or at the expense of more involved code, without gsubfn:

```do.call("lm", list(sprintf("y ~ %s + x2", var), quote(df2)))
```

or if you don't care that the formula displays without `var` substituted then just:

```lm(sprintf("y ~ %s + x2", var), df2)
```

