Wich is the simpliest way to aggregate rows (sum) by columns values the following type of data frame on R?

Related searches
index   type.x  type.y   col3   col4
1        a        m      20      25
2        b        m      30      28
3        a        m      15      555
3        a        n      20      555
4        a        m      666     10
4        b        m      666     20

I have tried aggregate keeping the index and group_by without success when I try to get this shape:

index   col3   col4
1        20      25
2        30      28
3        35      555
4        666     30

If you are using base R, the following code may help

r <- aggregate(df[4:5],by = df[1],function(v) sum(unique(v)))

which gives

> r
  index col3 col4
1     1   20   25
2     2   30   28
3     3   35  555
4     4  666   30

rowsum: Give Column Sums of a Matrix or Data Frame, Based on a, For integer arguments, over/underflow in forming the sum results in NA . Value. A matrix or data frame containing the sums. There will be one row per unique value � 1 Wich is the simpliest way to aggregate rows (sum) by columns values the following type of data frame on R? Nov 29 '19 1 What does the symbol ~ means when it's before a function?

I assume you want the 1st element if they are similar otherwise the sum

df %>% 
   group_by(index) %>% 
   #n_distinct = length(unique)
   #Or using @Thomas's idea list(~sum(unique(.), na.rm = TRUE))
   summarise_at(vars(col3,col4), list(~if_else(n_distinct(.)==1, .[1], sum(., na.rm=TRUE))))

# A tibble: 4 x 3
  index  col3  col4
  <int> <int> <int>
1     1    20    25
2     2    30    28
3     3    35   555
4     4   666    30

Aggregating and analyzing data with dplyr, Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a dplyr is a package for making data manipulation easier. To select columns of a data frame, use select() . The first The row has a NA value for clade, so if we wanted to remove those we could insert a filter() in this chain:. This will aggregate with the sum everything on year, giving us. aggregate(. ~year, data=df1, sum, na.rm=TRUE) year month x1 x2 1 2000 6 -493.4367 -994.7560 2 2001 10 -456.5572 -846.1701 If you also wanted to aggregate on unique year month pairs, then just add a + to the RHS .

We can also use

df %>% 
  group_by(index) %>%
  summarise_at(vars(starts_with('col')), ~ sum(unique(.x)))

Aggregate – A Powerful Tool for Data Frame in R, Aggregate is a function in base R which can, as the name suggests, Interestingly, if these columns are of the same data.frame as the one inputted The most basic uses of aggregate involve base functions such as row-wise and every value gets its own row) using reshape2 package One weird thing. In the previous example, you didn't store the summary statistic in a data frame. You can proceed in two steps to generate a date frame from a summary: Step 1: Store the data frame for further use; Step 2: Use the dataset to create a line plot; Step 1) You compute the average number of games played by year.

Just assuming a similar assumption as in A. Suliman's dplyr answer (assuming you want to sum up unique values) I would suggest using data.table:

my_agg_function <- function(x) {
  x <- unique(x)


Sum by Group in R (2 Examples), How to compute the sum of a variable by group - 2 example codes - Base R ( aggregate function) nrow function in R - Iris Example Data Frame In the following examples, we will compute the sum of the first column vector Sepal. You can see based on the RStudio console output that the sum of all values of the setosa� Applies to all values. DISTINCT: Return the SUM of unique values. expression: Expression made up of a single constant, variable, scalar function, or column name. The expression is an expression of the exact numeric or approximate numeric data type category, except for the bit data type. Aggregate functions and subqueries are not permitted.

How to sum a variable by group in R?, I have a data frame of consisting two columns "Players" & "points" x<-data.frame( Players=c(" . aggregate(x$points, by=list(Players=x$Players), FUN=sum) How to find out the sum/mean for multiple variables per group in R? You can use mtabulate in the following way: library(qdapTools) cbind(data[1],� Aggregate() Function in R Splits the data into subsets, computes summary statistics for each subsets and returns the result in a group by form. Aggregate function in R is similar to group by in SQL. Aggregate() function is useful in performing all the aggregate operations like sum,count,mean, minimum and Maximum. Lets see an Example of following

In fact I have 2 files – one is with the data I already explained – columns: month, project number, accounts, 4-th column sum. In the second file I have more columns: 1st column month, 2nd column project number, 3th column external cost (there I should have the sum of some of the accounts in the 3thd column from the 1st file), 4th row

From there select the SUM option. The sum value of January sales will be found from this. Now perform this for cell C10 also. From this all the sum values of column B, C and D will be found. Download The Working File. How to Sum Multiple Rows and Columns in Excel. Related Readings. How to divide columns in Excel (Top 8 ways)

  • have you tried with the command merge ?
  • Could you please precisely define what kind of aggregation you wish, because now we can only guess when you sum up the values and when you don't.
  • Pardon me, I edited to add the missing info
  • Did u mean that there is a fastest or simplest way to doy with another package? Thanks a lot Thomas, that's a nice way.
  • @OrlandoStivenJaramilloPiza I have no idea if base R is more efficient than other packages, since aggregate is powerful enough to solve the problem, so I think no need to use functions from other packages
  • Alright, may I ask how the "function(v) sum(unique(v)" works? I think is an anonymous function but I don't get well how it works with the "unique" function for the aggregation part. I will read any docs you have. Thanks again.
  • @OrlandoStivenJaramilloPiza sum(unique(v)) works like this: for each grouped values, it removes duplicated and then sum up
  • Is there any book or docs where I can learn to do different functions in this "function(v) sum(unique(v)" way? thanks for the 1000th time!
  • I think your code would break if col3 contains an additional row with e.g. index=4 and col3=1, then you will sum up the two 666s. (However, it is not clear which kind of aggregation is desired)
  • @Volokh could you please provide this scenario using dput. Thanks
  • sth like this: df <- structure(list(index = c(1, 2, 3, 3, 4, 4, 4), col3 = c(20, 30, 15, 20, 666, 666, 111), col4 = c(25, 28, 555, 555, 10, 20, 11 )), class = "data.frame", row.names = c(NA, -7L))
  • The problem is just for the index duplicates, wich are that way cause in other columns has diferente values for the same index, and I don't come up with a simple way.