Pass arguments to dplyr functions

r pass column name to function dplyr
dplyr dynamic column name
programming with dplyr
dplyr programmatically
dplyr quasiquotation
dplyr select function argument
dplyr group by variable name
writing dplyr functions

I want to parameterise the following computation using dplyr that finds which values of Sepal.Length are associated with more than one value of Sepal.Width:

library(dplyr)

iris %>%
    group_by(Sepal.Length) %>%
    summarise(n.uniq=n_distinct(Sepal.Width)) %>%
    filter(n.uniq > 1)

Normally I would write something like this:

not.uniq.per.group <- function(data, group.var, uniq.var) {
    iris %>%
        group_by(group.var) %>%
        summarise(n.uniq=n_distinct(uniq.var)) %>%
        filter(n.uniq > 1)
}

However, this approach throws errors because dplyr uses non-standard evaluation. How should this function be written?

You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package. Concretely:

library(dplyr)
library(lazyeval)

not.uniq.per.group <- function(df, grp.var, uniq.var) {
    df %>%
        group_by_(grp.var) %>%
        summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
        filter(n_uniq > 1)
}

not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")

Note that in recent versions of dplyr the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.

See the Programming with dplyr vignette for more information on working with non-standard evaluation.

r - Pass arguments to dplyr functions, You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_ ) and pass strings to  I'd like to pass argument to a function which use dplyr. The idea is to transform the data of a specific column (my argument) of a data frame. Here is an illustrative example: example = function(x){ df %>% mutate(paste0(x, '_with_noise') = x + rnorm(n(), 0, 0.01)) }

Like the old dplyr versions up to 0.5, the new dplyr has facilities for both standard evaluation (SE) and nonstandard evaluation (NSE). But they are expressed differently than before.

If you want an NSE function, you pass bare expressions and use enquo to capture them as quosures. If you want an SE function, just pass quosures (or symbols) directly, then unquote them in the dplyr calls. Here is the SE solution to the question:

library(tidyverse)
library(rlang)

f1 <- function(df, grp.var, uniq.var) {
   df %>%
       group_by(!!grp.var) %>%
       summarise(n_uniq = n_distinct(!!uniq.var)) %>%
       filter(n_uniq > 1)  
}

a <- f1(iris, quo(Sepal.Length), quo(Sepal.Width))
b <- f1(iris, sym("Sepal.Length"), sym("Sepal.Width"))
identical(a, b)
#> [1] TRUE

Note how the SE version enables you to work with string arguments - just turn them into symbols first using sym(). For more information, see the programming with dplyr vignette.

Data frame columns as arguments to dplyr functions, You would like to pass a column as this function's argument. Something like: data​(cars) convertToKmh <- function(dataset, col_name){  Answer: You need to use the standard evaluation versions of the dplyr functions (just append '_' to the function names, ie. group_by_ & summarise_) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp(), which is defined in the lazyeval package.

In the devel version of dplyr (soon to be released 0.6.0), we can also make use of slightly different syntax for passing the variables.

f1 <- function(df, grp.var, uniq.var) {
   grp.var <- enquo(grp.var)
   uniq.var <- enquo(uniq.var)

   df %>%
       group_by(!!grp.var) %>%
       summarise(n_uniq = n_distinct(!!uniq.var)) %>%
       filter(n_uniq >1)  


}

res2 <- f1(iris, Sepal.Length, Sepal.Width) 
res1 <- not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
identical(res1, res2)
#[1] TRUE

Here enquo takes the arguments and returns the value as a quosure (similar to substitute in base R) by evaluating the function arguments lazily and inside the summarise, we ask it to unquote (!! or UQ) so that it gets evaluated.

Programming with dplyr, This makes it hard to create functions with arguments that change how dplyr verbs are computed. dplyr code is ambiguous. Depending on what variables are​  require(dplyr) set.seed(8) df <- data.frame( A=sample(c(1:3), 10, replace=T), B=sample(c(1:3), 10, replace=T)) If want to get rows where column A is 1 or 2 I can do: df %>% filter(A %in% c(1,2))

In the current version of dplyr (0.7.4) the use of the standard evaluation function versions (appended '_' to the function name, e.g. group_by_) is deprecated. Instead you should rely on tidyeval when writing functions.

Here's an example of how your function would look then:

# definition of your function
not.uniq.per.group <- function(data, group.var, uniq.var) {
  # enquotes variables to be used with dplyr-functions
  group.var <- enquo(group.var)
  uniq.var <- enquo(uniq.var)

  # use '!!' before parameter names in dplyr-functions
  data %>%
    group_by(!!group.var) %>%
    summarise(n.uniq=n_distinct(!!uniq.var)) %>%
    filter(n.uniq > 1)
}

# call of your function
not.uniq.per.group(iris, Sepal.Length, Sepal.Width)

If you want to learn all about the details, there's an excellent vignette by the dplyr-team on how this works.

Dynamic column/variable names with dplyr using Standard , To pass a dynamically specified set of arguments to a SE-enabled dplyr function, we need to use the special .dots argument and pass it a list of  Sources: apart from the documents above, the following stackoverflow threads helped me out quite a lot: In R: pass column name as argument and use it in function with dplyr::mutate () and lazyeval::interp () and Non-standard evaluation (NSE) in dplyr’s filter_ & pulling data from MySQL.

Here's the way to do it from rlang 0.4 using curly curly {{ pseudo operator :

library(dplyr)

not.uniq.per.group <- function(data, group.var, uniq.var) {
  data %>%
    group_by({{group.var}}) %>%
    summarise(n.uniq=n_distinct({{uniq.var}})) %>%
    filter(n.uniq > 1)
}

iris %>% not.uniq.per.group(Sepal.Length, Sepal.Width)
#> # A tibble: 25 x 2
#>    Sepal.Length n.uniq
#>           <dbl>  <int>
#>  1          4.4      3
#>  2          4.6      4
#>  3          4.8      3
#>  4          4.9      5
#>  5          5        8
#>  6          5.1      6
#>  7          5.2      4
#>  8          5.4      4
#>  9          5.5      6
#> 10          5.6      5
#> # ... with 15 more rows

When writing functions involving dplyr, how do you pass variable , When writing functions involving dplyr, how do you pass variable names to filter​()? value_col = compat_as_lazy(enquo(value)), : unused argument (-NAME)". Data frame columns as arguments to dplyr functions. Suppose that you would like to create a function which does a series of computations on a data frame. You would like to pass a column as this function’s argument.

5 Dealing with multiple arguments, In the first chapter we have created grouped_mean() , a function that takes You'​d like to change the argument names (which become column names in dplyr::​mutate() You'll then pass the quoted arguments to other quoting functions by  This is more like it. It’s easy to read how the data flows. Starting from mtcars, that is then grouped by cyl, and then the mean is taken from the result of this grouping. The reasons the %>% operator is very friendly with dplyr, is that the first argument to all functions is a data frame to operate on.

Pass column name as parameter to a function using dplyr, An R function with a parameter that accepts a data.frame column can't evaluate the column argument until it is first 'quoted', followed by an  The R package dplyr is an extremely useful resource for data cleaning, manipulation, visualisation and analysis. It contains a large number of very useful functions and is, without doubt, one of my top 3 R packages today (ggplot2 and reshape2 being the others). When I was learning how to use dplyr for the first time,… Continue reading Useful dplyr Functions (w/examples) →

Passing named list to mutate (and probably other dplyr verbs , Hi, I want to write a function that is given a named list which is then passed on to mutate() in a way that each element of the list is an argument  ggplot like dplyr uses NSE by default to understand the data you are passing it in the aes section of the function (see?ggplot2::aes_). That makes our life much easier for interactive work but necessitates aes_ in this case so we can weave it into our function. aaa and bbb are already enquoted and using aes_ ensures that ggplot accounts for that.

Comments
  • As a matter of style, I would advise against using the dot in names in modern R, except in S3 generics. It’s terribly confusing. The naming convention used (amongst others) by dplyr is much nicer: names_with_underscores.
  • I'm aware that Hadley Wickham's style guide recommends the underscore notation, but the Google R style guide promotes the period (although not for functions, which I have done here). In other languages the period is used for member access (eg. myArray.length in javascript), is there another conflict in R?
  • Google style guides are often terrible. In this particular instance, the problem is that it leads do ambiguities with S3 methods: is some.class.method a method some of class class.method or is it a method some.class of class method? Furthermore, it leads to inconsistent names when parts of your code are implemented in C(++), since that doesn’t support dots in names, necessitating mapping the backend function names to different R names.
  • if I am getting the Sepal.Length and Sepal.Width from a list it doesn't work as it will be in the format of "Sepal.Length" and "Sepal.Width" what can I do then?
  • @KillerSnail You should post a new question as this solution is specific to the problem mentioned in the OP's post
  • The question above from @KillerSnail is essentially the question I just asked here: stackoverflow.com/questions/46310123/…