filtering within the summarise function of dplyr

dplyr summarize
dplyr summarise keep columns
dplyr summarise without dropping columns
dplyr filter
filter within summarise dplyr
dplyr summarize count
dplyr count if
dplyr filter not in

I am struggling a little with dplyr because I want to do two things at one and wonder if it is possible.

I want to calculate the mean of values and at the same time the mean for the values which have a specific value in an other column.

library(dplyr)
set.seed(1234)
df <- data.frame(id=rep(1:10, each=14),
                 tp=letters[1:14],
                 value_type=sample(LETTERS[1:3], 140, replace=TRUE),
                 values=runif(140))

df %>%
  group_by(id, tp) %>%
  summarise(
    all_mean=mean(values),
    A_mean=mean(values), # Only the values with value_type A
    value_count=sum(value_type == 'A')
  )

So the A_mean column should calculate the mean of values where value_count == 'A'.

I would normally do two separate commands and merge the results later, but I guess there is a more handy way and I just don't get it.

Thanks in advance.

We can try

 df %>%
     group_by(id, tp) %>%
     summarise(all_mean = mean(values), 
                A_mean = mean(values[value_type=="A"]),
                value_count=sum(value_type == 'A'))

Aggregating and analyzing data with dplyr, Packages in R are basically sets of additional functions that let you do more stuff in dplyr functions: select() , filter() , mutate() , group_by() , and summarize() . In fact, there are only 5 primary functions in the dplyr toolkit: filter() … for filtering rows select() … for selecting columns mutate() … for adding new variables summarise() … for calculating summary stats arrange() … for sorting data

You can do this with two summary steps:

df %>%
  group_by(id, tp, value_type) %>%
  summarise(A_mean = mean(values)) %>%
  summarise(all_mean = mean(A_mean),
            A_mean = sum(A_mean * (value_type == "A")),
            value_count = sum(value_type == "A"))

The first summary calculates the means per value_type and the second "sums" only the mean of value_type == "A"

Subset rows using column values, Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to dplyr functions: select() , filter() , mutate() , group_by() , and summarize() . The group_by() function in dplyr allows you to perform functions on a subset of a dataset without having to create multiple new objects or construct for loops. The combination of group_by() and summarise() are great for generating simple summaries (counts, sums) of grouped data.

You can also give the following function a try:

?summarise_if

(the function family is summarise_all)

Example

The dplyr documentation serves a quite good example of this, i think:

# The _if() variants apply a predicate function (a function that
# returns TRUE or FALSE) to determine the relevant subset of
# columns. Here we apply mean() to the numeric columns:

starwars %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)

#> # A tibble: 1 x 3
#>   height  mass birth_year
#>    <dbl> <dbl>      <dbl>
#> 1   174.  97.3       87.6

The interesting thing here is the predicate function. This represents the rule by which the columns, that will have to be summarized, are selected.

Summarise each group to fewer rows, Data frame attributes are preserved. Details. The filter() function is used to subset the rows of .data , applying the expressions in . Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more dplyr: using filter, group_by, from within mutate command [duplicate]

Data Wrangling Part 4: Summarizing and slicing your data, < data-masking > Name-value pairs of summary functions. The name will be the name of the variable in the result. The value can be: A vector of length 1, e.g.  summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input. It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified. summarise() and summarize() are synonyms.

Data Manipulation with dplyr, We will use the dplyr package in R to effectively manipulate and conditionally The filter() function allows you to return only certain rows matching a condition. The real power comes in where group_by() and summarize() are used together. If you want to do counting instead of summarizing, then the answer is somewhat different. The change in code is small, especially in the conditional counting part.

dplyr tutorial, dplyr is a powerful R-package to transform and summarize tabular data with rows Compared to base functions in R, the functions in dplyr are easier to work with, basic functions are select() and filter() which selects columns and filters rows,  Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more filtering within the summarise function of dplyr

Comments
  • Nice and easy solution!
  • Can you do an example of this?