R- Collapse rows and sum the values in the column

collapse columns in r
dplyr collapse rows
r aggregate multiple columns
aggregate r
dplyr group by multiple columns
sum columns in r dplyr
r collapse rows into string
combine rows with same id r

I have the following dataframe (df1):

ID    someText    PSM OtherValues
ABC   c   2   qwe
CCC   v   3   wer
DDD   b   56  ert
EEE   m   78  yu
FFF   sw  1   io
GGG   e   90  gv
CCC   r   34  scf
CCC   t   21  fvb
KOO   y   45  hffd
EEE   u   2   asd
LLL   i   4   dlm
ZZZ   i   8   zzas

I would like to collapse the first column and add the corresponding PSM values and I would like to get the following output:

ID  Sum PSM
ABC 2
CCC 58
DDD 56
EEE 80
FFF 1
GGG 90
KOO 45
LLL 4
ZZZ 8

It seems doable with aggregate function but don't know the syntax. Any help is really appreciated! Thanks.


In base:

aggregate(PSM ~ ID, data=x, FUN=sum)
##    ID PSM
## 1 ABC   2
## 2 CCC  58
## 3 DDD  56
## 4 EEE  80
## 5 FFF   1
## 6 GGG  90
## 7 KOO  45
## 8 LLL   4
## 9 ZZZ   8

Summary This tutorial explains how to collapse data in R. Collapsing means with three variables: student (an id variable that uniquely identifies each row);  Was wondering if anybody knew how to do this. I have a dataframe with 2 columns and 20 rows and would like to add the values of both columns together from some rows. Say, for example, that df[1,1] = 3, df[2,1] = 2, df[1,2] = 4 and df[2,2] = 5, then I would like to collapse row 1 and 2 to get only 1 row where df[1,1] = 5 and df[1,2] = 9.


Example using dplyr, the next iteration of plyr:

df2 <- df1 %>% group_by(ID) %>%
     summarize(Sum_PSM = sum(PSM))

When you put the characters %>%, you are "piping." This means you're inputting what is on the left side of that pipe operator and performing the function on the right.

Abstractly speaking, the function allows one to collapse the rows of a numeric matrix, "Average" = for each column, take the average value of the rows in each Recall that the connectivity is defined as the rows sum of the adjacency matrix. The case_when() function (from dplyr) may be used to efficiently collapse discrete values into categories.[^3] This function also operates on vectors and, thus, must be used with mutate() to add a variable to a data.frame.


This is super easy using the plyr package:

library(plyr)
ddply(df1, .(ID), summarize, Sum=sum(PSM))

Hi All, I have a dataframe in R with rows as genes and columns as samples with expression values Can ordinal logistic regression be a good  If na.rm = FALSE and either NaN or NA appears in a sum, the result will be one of NaN or NA, but which might be platform-dependent. Notice that omission of missing values is done on a per-column or per-row basis, so column means may not be over the same set of rows, and vice versa.


Using aggregate function seems to be better than dplyr if you want to just keep the original column names and operate inside one column at a time. Avoiding the use of summarize function,

Note from summarize function documentation

Be careful when using existing variable names; the corresponding columns will be immediately updated with the new data and this can affect subsequent operations referring to those variables.

For instance

## modified example from aggregate documentation with character variables and NAs
testDF <- data.frame(v1 = c(1,3,5,7,8,3,5,NA,4,5,7,9),
                 v2 = c(11,33,55,77,88,33,55,NA,44,55,77,99) )
by <- c("red", "blue", 1, 2, NA, "big", 1, 2, "red", 1, NA, 12)

aggregate(x = testDF, by = list(by1), FUN = "sum")
Group.1 v1  v2
1       1 15 165
2      12  9  99
3       2 NA  NA
4     big  3  33
5    blue  3  33
6     red  5  55

You get what you want, but when you use summarise and ddply you need to specify names. So if you have many columns aggregate seems to be convenient.

testDF$ID=by1
ddply(testDF, .(ID), summarize, v1=sum(v1), v2=sum(v2) )
ID v1  v2
1    1 15 165
2   12  9  99
3    2 NA  NA
4  big  3  33
5 blue  3  33
6  red  5  55
7 <NA> 15 165

To see the effect of the immediate update of the columns with summarize you can check the following examples,

ddply(testDF, .(ID), summarize, v1=max(v1,v2), v2=min(v1,v2) )
ID v1 v2
1    1 55 55
2   12 99 99
3    2 NA NA
4  big 33 33
5 blue 33 33
6  red 44 11
7 <NA> 88 77

ddply(testDF, .(ID), summarize, v1=min(v1,v2), v2=min(v1,v2) )
ID v1 v2
1    1  5  5
2   12  9  9
3    2 NA NA
4  big  3  3
5 blue  3  3
6  red  1  1
7 <NA>  7  7

Note that when V1 uses max, the col is already update when calculating v2, so for instance in the case of ID=1 we can't get the number 5 when using min in v2.

Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to for example to do unit conversions or find the ratio of values in two columns. summarize() can be used to collapse each group into a single-row summary. Collapse / concatenate / aggregate a column to a single comma separated string within each group. I want to aggregate one column in a data frame according to two grouping variables, and separate the individual values by a comma.


Using data.table

setDT(df1)[,  lapply(.SD, sum) , by = ID, .SDcols = "PSM" ]

I have a dataframe with 2 columns and 20 rows and would like to add the values of both columns together from some rows. Say, for example, that df[1,1] = 3, df[2 Are you trying to find the sum of the column elements? You can check the colSums How to combine two rows in R? r. asked by amwalker on  I have a data frame where one column is species' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e., there is more than one row with Species X in it). I would like to consolidate those entries and sum their abundances. For example, given this data frame:


collapse is a C/C++ based package for data manipulation in R. It's aims are 0 1.35 21.8 8.98 40.1 0 # # with 2 more variables: OTH <dbl>, SUM <dbl> (last) column vectors containing the indices of the rows belonging to  which works absolutely fine, but as I said my dataframe is large (140,000 rows, 37 columns and nearly 100,000 unique rows which I want to sum) and my code takes ages to run and then eventually says it has run out of memory. Does anyone know of the most efficient way to do this. Thanks in advance!


A numeric vector will be treated as a column vector. group. a vector or factor giving the grouping, with one element per row of x . Missing values will be  Summing rows by month in R. So I have a data frame that has a date column, an hour column and a series of other numerical columns. Each row in the data frame is 1 hour of 1 day for an entire year. The data frame looks like this: The hours are out of order because this is subsetted from another data frame.


Selecting columns and filtering rows; Pipes; Challenge; Mutate; Challenge str() or data.frame() , come built into R; packages give you access to more of them. based on the values in existing columns, for example to do unit conversions,  Row wise sum of the dataframe in R is calculated using rowSums() function. Other method to get the row sum in R is by using apply() function. Row wise sum of r dataframe using rowSums() Row sum of the dataframe using apply() function. First let’s create the dataframe