dplyr mutate new dynamic variables with case_when

dplyr::case_when multiple conditions
case when with multiple conditions r
case_when str_detect
case_when with vector
case when in r tidyverse
when statement in r
r tidyverse switch
case_when else true

I'm aware of similar questions here and here, but I haven't been able to figure out the right solution for my specific situation. Some of what I'm finding are solutions which use mutate_, etc but I understand these are now obsolete. I'm new to dynamic usages of dplyr.

I have a dataframe which includes some variables with two different prefixes, alpha and beta:

df <- data.frame(alpha.num = c(1, 3, 5, 7),
             alpha.char = c("a", "c", "e", "g"),
             beta.num = c(2, 4, 6, 8),
             beta.char = c("b", "d", "f", "h"),
             which.to.use = c("alpha", "alpha", "beta", "beta"))

I want to create new variables with the prefix "chosen." which are copies of either the "alpha" or "beta" columns depending on which is named for that row in the "which.to.use" column. The desired output would be:

desired.df <- data.frame(alpha.num = c(1, 3, 5, 7),
                     alpha.char = c("a", "c", "e", "g"),
                     beta.num = c(2, 4, 6, 8),
                     beta.char = c("b", "d", "f", "h"),
                     which.to.use = c("alpha", "alpha", "beta", "beta"),
                     chosen.num = c(1, 3, 6, 8),
                     chosen.char = c("a", "c", "f", "h"))

My failed attempt:

varnames <- c("num", "char")
df %<>%
  mutate(as.name(paste0("chosen.", varnames)) := case_when(
    which.to.use == "alpha" ~ paste0("alpha.", varnames),
    which.to.use == "beta" ~ pasteo("beta.", varnames)
  ))

I'd prefer a pure dplyr solution, and even better would be one which could be included in a longer pipe modifying the df (i.e. no need to stop to create "varnames"). Thanks for your help.

This is a nest()/map() strategy that should be pretty fast. It stays in the tidyverse, but doesn't go into rlang land.

library(tidyverse)

df %>% 
    nest(-which.to.use) %>%
    mutate(new_data = map2(data, which.to.use,
                       ~ select(..1, matches(..2)) %>%
                           rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1        alpha         1          a        2         b           1            a
2        alpha         3          c        4         d           3            c
3         beta         5          e        6         f           6            f
4         beta         7          g        8         h           8            h

It grabs all columns, not just num and char, that are not which.to.use. But that seems like what you (I) would want IRL. You could add a select(matches('(var1|var2|etc')) line before you call nest() if you wanted to pull only specific variables.

EDIT: My original suggestion of using select() to drop unneeded columns would result in doing a join to bring them back later. If instead you adjust the nest parameters, you can acheive this on only certain columns.

I added new bool columns here, but they will be ignored for the "choosen" selection:

new_df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 alpha.bool = FALSE,
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 beta.bool = TRUE,
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = FALSE)

new_df %>% 
    nest(matches("num|char")) %>% # only columns that match this pattern get nested, allows you to save others
    mutate(new_data = map2(data, which.to.use,
                           ~ select(..1, matches(..2)) %>%
                               rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  alpha.bool beta.bool which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1      FALSE      TRUE        alpha         1          a        2         b           1            a
2      FALSE      TRUE        alpha         3          c        4         d           3            c
3      FALSE      TRUE         beta         5          e        6         f           6            f
4      FALSE      TRUE         beta         7          g        8         h           8            h

r - dplyr mutate new dynamic variables with case_when, This is a nest()/map() strategy that should be pretty fast. It stays in the tidyverse , but doesn't go into rlang land. library(tidyverse) df  The name of each argument will be the name of a new variable, and the value will be its corresponding value. Use a NULL value in mutate to drop a variable. New variables overwrite existing variables of the same name. The arguments in are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing.

Using some fun rlang stuff & purrr:

library(rlang)
library(purrr)
library(dplyr)

df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = F)

c("num", "char") %>% 
    map(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
      case_when(
          which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
          which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                ))) %>% 
    reduce(full_join)

Result:

  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Without reduce(full_join):

c("num", "char") %>% 
  map_dfc(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
                 case_when(
                   which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
                   which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                 ))) %>% 
  select(-ends_with("1"))



alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Explanation: (Note: I do not fully or even kind of get rlang. Maybe others can give a better explanation ;).)

Using paste0 by itself produces a string, when we need a bare name for mutate to know it is referring to a variable name.

If we wrap paste0 in sym, it evaluates to a bare name:

> x <- varrnames[1]
> sym(paste0("alpha.", x))
  alpha.num

But mutate does not know to evaluate and instead read it as a symbol:

> typeof(sym(paste0("alpha.", x)))
[1] "symbol"

The "bang bang" !! operator evaluates the sym function. Compare:

> expr(mutate(df, var = sym(paste0("alpha.", x))))
mutate(df, var = sym(paste0("alpha.", x)))

> expr(mutate(df, var = !!sym(paste0("alpha.", x))))
mutate(df, var = alpha.num)

So with !!sym we can use paste to dynamically called variable names with dplyr.

A general vectorised if, It is an R equivalent of the SQL CASE WHEN statement. is particularly useful inside mutate when you want to # create a new variable that relies on a complex​  It is an R equivalent of the SQL CASE WHEN statement. If no cases match, NA is returned. case_when: A general vectorised if in dplyr: A Grammar of Data Manipulation

A base R approach using apply with margin = 1 where we select columns for each row based on the value in which.to.use column and get the value from corresponding column for the row.

df[c("chosen.num", "chosen.char")] <- 
          t(apply(df, 1, function(x) x[grepl(x["which.to.use"], names(df))]))

df
#  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
#1         1          a        2         b        alpha          1           a
#2         3          c        4         d        alpha          3           c
#3         5          e        6         f         beta          6           f
#4         7          g        8         h         beta          8           h

Create or transform variables, Both functions preserve the number of rows of the input. New variables overwrite existing variables of the same name. mutate(.data, ) transmute  # case_when is particularly useful inside mutate when you want to # create a new variable that relies on a complex combination of existing # variables starwars %>% select (name: mass, gender, species) %>% mutate ( type = case_when ( height > 200 | mass > 200 ~ "large", species == "Droid" ~ "robot", TRUE ~ "other") )

You can also try a gather/spread approach

df %>% 
  rownames_to_column() %>% 
  gather(k,v,-which.to.use,-rowname) %>% 
  separate(k,into = c("k1", "k2"), sep="[.]") %>% 
  filter(which.to.use == k1) %>% 
  mutate(k1="chosen") %>% 
  unite(k, k1, k2,sep=".") %>% 
  spread(k,v) %>%
  select(.,chosen.num, chosen.char) %>% 
  bind_cols(df, .)
    alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
 1         1          a        2         b        alpha          1           a
 2         3          c        4         d        alpha          3           c
 3         5          e        6         f         beta          6           f
 4         7          g        8         h         beta          8           h

How can you use dplyr::case_when progromatically with wrapr::let , I want to be able use dplyr::case_when to dynamically cut a database return(​function(data){ dplyr::mutate(data, !!s_new_column_name  This function allows you to vectorise multiple if and else if statements. It is an R equivalent of the SQL CASE WHEN statement. A sequence of two-sided formulas. The left hand side (LHS) determines which values match this case. The right hand side (RHS) provides the replacement value. The LHS must evaluate to a logical vector.

dplyr/case_when.R at master · tidyverse/dplyr · GitHub, manipulation. Contribute to tidyverse/dplyr development by creating an account on GitHub. <[`dynamic-dots`][rlang::dyn-dots]> A sequence of two-​sided formulas. The left hand case_when is particularly useful inside mutate when you want to. #' # create a new variable that relies on a complex combination of existing. Adding New Variables in R. The following functions from the dplyr library can be used to add new variables to a data frame: mutate() – adds new variables to a data frame while preserving existing variables. transmute() – adds new variables to a data frame and drops existing variables.

dplyr recode variables if a condition met - tidyverse, Then I want to create 5 new variables. I use this code: step2 <- step1 %>% mutate( Dealer.Rec.Score = case_when( A2 >= 10 & A2 <= 80 ~ 0,  However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5). How can I get mutate() to use my dynamic name as variable name?

dplyr - mutate: use dynamic variable names, To dynamically assign variable names using the dplyr package, you can do the following: Create a function to dynamically assign variable  Creating new variables is often required for statistical modeling. A teacher, for example, may have a data frame with numeric variables (quiz scores, final grade, etc.) but wants to perform a logistic regression model with a binary variable. mutate () and ifelse () will get it done. mutate () is a basic verb from the dplyr package, and numerous

Comments
  • I only want to do this on specific variables. Adding a select statement would reduce the df, so I'd have to do a join later to bring back the other variables (if I understand correctly). There must be a better way, right?
  • I made an edit to show a better way with just adjusting the nest() parameters, let me know if that's what you had in mind
  • Awesome! Works great. I don't quite understand everything going on here, but that's homework for me. Thanks.
  • Thanks. My actual data are very large (both tall and wide), and the full_join brings things to a halt... I'm wondering if there's a way to do it using mutate_at or mutate_if...
  • check out my nest()/map() answer for a faster tidyverse option
  • @Nate nice. Using map_dfc lets you get rid of the full_join.