Passing strings as arguments in dplyr verbs

dplyr use string as variable name
dplyr filter
filter in r
dplyr in function
dplyr filter incorrect length
dplyr::filter not working
error cannot arrange column of class 'function' at position 1
dplyr filter vector

I would like to be able to define arguments for dplyr verbs

condition <- "dist > 50"

and then use these strings in dplyr functions :

require(ggplot2)
ds <- cars
ds1 <- ds %>%
   filter (eval(condition))
ds1

But it throws in error

Error: filter condition does not evaluate to a logical vector. 

The code should evaluate as:

  ds1<- ds %>%
     filter(dist > 50)
  ds1

Resulting in :

ds1

   speed dist
1     14   60
2     14   80
3     15   54
4     18   56
5     18   76
6     18   84
7     19   68
8     20   52
9     20   56
10    20   64
11    22   66
12    23   54
13    24   70
14    24   92
15    24   93
16    24  120
17    25   85
Question:

How to pass a string as an argument in a dplyr verb?

Since these 2014 answers, two new ways are possible using rlang's quasiquotation.

Conventional hard-coded filter statement. For the sake of comparison, the statement dist > 50 is included directly in dplyr::filter().

library(magrittr)

# The filter statement is hard-coded inside the function.
cars_subset_0 <- function( ) {
  cars %>%
    dplyr::filter(dist > 50)
}
cars_subset_0()

results:

   speed dist
1     14   60
2     14   80
3     15   54
4     18   56
...
17    25   85

rlang approach with NSE (nonstandard evaluation). As described in the Programming with dplyr vignette, the statement dist > 50 is processed by rlang::enquo(), which "uses some dark magic to look at the argument, see what the user typed, and return that value as a quosure". Then rlang's !! unquotes the input "so that it’s evaluated immediately in the surrounding context".

# The filter statement is evaluated with NSE.
cars_subset_1 <- function( filter_statement ) {
  filter_statement_en <- rlang::enquo(filter_statement)
  message("filter statement: `", filter_statement_en, "`.")

  cars %>%
    dplyr::filter(!!filter_statement_en)
}
cars_subset_1(dist > 50)

results:

filter statement: `~dist > 50`.
<quosure>
expr: ^dist > 50
env:  global
   speed dist
1     14   60
2     14   80
3     15   54
4     18   56
17    25   85

rlang approach passing a string. The statement "dist > 50" is passed to the function as an explicit string, and parsed as an expression by rlang::parse_expr(), then unquoted by !!.

# The filter statement is passed a string.
cars_subset_2 <- function( filter_statement ) {
  filter_statement_expr <- rlang::parse_expr(filter_statement)
  message("filter statement: `", filter_statement_expr, "`.")

  cars %>%
    dplyr::filter(!!filter_statement_expr)
}
cars_subset_2("dist > 50")

results:

filter statement: `>dist50`.
   speed dist
1     14   60
2     14   80
3     15   54
4     18   56
...
17    25   85

Things are simpler with dplyr::select(). Explicit strings need only !!.

# The select statement is passed a string.
cars_subset_2b <- function( select_statement ) {
  cars %>%
    dplyr::select(!!select_statement)
}
cars_subset_2b("dist")

Passing named list to mutate (and probably other dplyr verbs , Hi, I want to write a function that is given a named list which is then passed on to mutate() in a way that each element of the list is an argument  Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous." – Arthur Yip Apr 14 '18 at 13:04

In the next version of dplyr, it will probably work like this:

condition <- quote(dist > 50)

mtcars %>%
   filter_(condition)

Programming with dplyr, Most dplyr verbs use tidy evaluation in some way. To determine whether a function argument uses data masking or tidy selection, look at the documentation: in  parse_quosure: parses the supplied string and converts it into a quosure !!: unquotes a quosure so it can be evaluated by tidyeval verbs . Note that parse_quosure() was soft-deprecated and renamed to parse_quo() in rlang 0.2.0 per its documentation.

While they're working on that, here is a workaround using if:

library(dplyr)
library(magrittr)

ds <- data.frame(attend = c(1:5,NA,7:9,NA,NA,12))

filter_na <- FALSE

filtertest <- function(x,filterTF = filter_na){
  if(filterTF) x else !(x)
}

ds %>%
  filter(attend %>% is.na %>% filtertest)

  attend
1      1
2      2
3      3
4      4
5      5
6      7
7      8
8      9
9     12

filter_na <- TRUE
ds %>%
  filter(attend %>% is.na %>% filtertest)

  attend
1     NA
2     NA
3     NA

Dynamic column/variable names with dplyr using Standard , The SE-versions of dplyr verbs always end with an underscore, we need to use the special .dots argument and pass it a list of strings: # this is  group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp() , which is defined in the lazyeval package.

Programming with dplyr, The first statement returns a quoted string and the other two return quoted code in When you pass variables to quoting functions they get quoted along with the are armed with quasiquotation, let's try to program with the dplyr verb mutate() . what was wrong with the above code? (no underscore dplyr verbs please..) In ggplot, I know we can use aes_string(), but in my case, only one of the parameter in the aes is passed from function argument. Thanks in advance.

14 Tidy evaluation, Note that dplyr verbs only quote the arguments that supply column names. our string example and use {} is that when dplyr functions quote their arguments, are objects and so can be passed around, carrying their environment with them. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more Pass a string as variable name in dplyr::filter

Documentation question: how does one convert a string to a , I have skimmed the dplyr/tidyeval/rlang documentation and tutorials A couple SO answers which illustrate how one can use quosures and parse_quosure() to pass strings to dplyr, as we once did with the deprecated underscore verbs: Though it's better to take the environment as argument in case your  plyr contains mainly 5 verbs and these verbs make up the majority of data manipulation. These 5 verbs are: Select, Filter, Mutate, Arrange, Summarize.

Comments
  • As I understand it, this is a work in progress
  • And now it's completed and part of the standard dplyr installation.
  • can't wait. dplyr keeps surprising me with its intuitiveness and intelligence. Thanks, Hadley!
  • And what if one wants to pass multiple arguments? Passing a list like list("dist > 50", "speed > 10") returns Error: Can't convert a list to a quosure
  • EDIT: found it: paste(list('dist > 50', 'speed > 10'), collapse=" & ")
  • thanks, @AndrewMacDonald! sorry, for not offering a reproducible example earlier
  • great, thanks @AndrewMacDonald, this works and on top gives me a simple example of using function with dplyr - something i wanted to have for reference. Thanks again!
  • glad it was useful! I edited it above very slightly (one shouldn't use $ within filter!
  • I didn't know about $ and didn't notice malfunction (or didn't realize it was due to that?). Thanks for the heads up, i'll keep this mind.