Calculating subtotals in R

r dplyr subtotal
dplyr summarise add total row
group by in r
aggregate r
r table subtotals
r add subtotal row to dataframe
r sum multiple columns by group
summary in r by group

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue  / bsent   / breturn / tsent   / treturn / csales
type:        Date / Char  / Char  / Char  / Char  / Numeric / Numeric / Numeric / Numeric / Numeric / Numeric

I want to calculate the subtotals. For example, I want to calculate the sums at each change in yname, and add subtotal to all numerical variables. There are 160 distinct ynames, so the resulting table should tell me the subtotal of each yname. I haven't sorted the data yet, but this is not a problem because I can sort the data in whatever way I want. Below is a snippet from my data:

             date     mcode mname            ycode    yname   yissue bsent breturn tsent treturn csales
417572 2010-07-28     45740 ENDPOINT A        5772    XMAG  20100800     7       0     7       0      0
417573 2010-07-31     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417574 2010-08-04     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417575 2010-08-14     45740 ENDPOINT A        5772    XMAG  20100800     0       0     0       0      1
417576 2010-08-26     45740 ENDPOINT A        5772    XMAG  20100800     0       4     0       0      0
417577 2010-07-28     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417578 2010-08-04     45741 ENDPOINT L        5772    XMAG  20100800     2       0     2       0      0
417579 2010-08-26     45741 ENDPOINT L        5772    XMAG  20100800     0       4     0       0      0
417580 2010-07-28     46390 ENDPOINT R        5772    XMAG  20100800     3       0     3       0      1
417581 2010-07-29     46390 ENDPOINT R        5772    XMAG  20100800     0       0     0       0      2
417582 2010-08-01     46390 ENDPOINT R        5779    YMAG  20100800     3       0     3       0      0
417583 2010-08-11     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417584 2010-08-20     46390 ENDPOINT R        5779    YMAG  20100800     0       0     0       0      1
417585 2010-08-24     46390 ENDPOINT R        5779    YMAG  20100800     2       0     2       0      1
417586 2010-08-26     46390 ENDPOINT R        5779    YMAG  20100800     0       2     0       2      0
417587 2010-07-28     46411 ENDPOINT D        5779    YMAG  20100800     6       0     6       0      0
417588 2010-08-08     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417589 2010-08-11     46411 ENDPOINT D        5779    YMAG  20100800     0       0     0       0      1
417590 2010-08-26     46411 ENDPOINT D        5779    YMAG  20100800     0       4     0       4      0

What function should I use here? Maybe something like SQL group by?

OK. Assuming your data are in a data frame named foo:

> head(foo)
             date mcode      mname ycode yname   yissue bsent breturn tsent
417572 2010/07/28 45740 ENDPOINT A  5772  XMAG 20100800     7       0     7
417573 2010/07/31 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
417574 2010/08/04 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
417575 2010/08/14 45740 ENDPOINT A  5772  XMAG 20100800     0       0     0
417576 2010/08/26 45740 ENDPOINT A  5772  XMAG 20100800     0       4     0
417577 2010/07/28 45741 ENDPOINT L  5772  XMAG 20100800     2       0     2
       treturn csales
417572       0      0
417573       0      1
417574       0      1
417575       0      1
417576       0      0
417577       0      0

Then this will do the aggregation of the numeric columns in your data:

> aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
+           FUN = sum)
  yname bsent breturn tsent treturn csales
1  XMAG    14       8    14       0      6
2  YMAG    11       6    11       6      5

That was using the snippet of data you included in your Q. I used the formula interface to aggregate(), which is a bit nicer in this instance because you don't need all the foo$ bits on the variable names you wish the aggregate. If you have missing data (NA)in your full data set, then you'll need add an extra argument na.rm = TRUE which will get passed to sum(), like so:

> aggregate(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data = foo, 
+           FUN = sum, na.rm = TRUE)

Subtotals and Headings, Subtotals can be applied to any Categorical or Categorical Array variable. In R, we can view and set subtotal definitions with the subtotals() function. If there are� Calculating subtotals in R. Refresh. November 2018. Views. 7.9k time. 8. I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as

Or the plyr library, which is easily extensible to other data classes:

> library(plyr)
> result.2 <- ddply(df$a, .(df$b), sum)
> result.2
  df.b V1
1 down 30
2   up 25

How to Summarize a Data Frame by Groups in R, To calculate this in a spreadsheet,… In a spreadsheet, you would subtotal the “ cost” column by the column referencing the animal's class. Calculating Subtotals For table BUSINESS define a global structure to contain the result of the calculation, for example, R_BUSINESS.

You can also use xtabs or tapply:

xtabs(cbind(bsent, breturn, tsent, treturn, csales) ~ yname, data)

tapply(data$bsent, data$yname, sum)

Create a Function in R to Calculate the Subtotal After Discounts and , One of the coolest things you can do in R is write custom functions to solve your own unique problems. I'm not sure I'm brave enough to try my� One task that you may frequently do in a spreadsheet that you can also do in R is calculating row or column totals. The easiest way to do this is to use the functions rowSums() and colSums().Similarly, use the functions rowMeans() and colMeans() to calculate means. Try it on the built-in dataset iris. First, remove …

if your data is large and speed matters, i would recommend using the R function rowsum, which is a lot faster. i applied the 3 methods (f1 = aggregate, f2 = ddply, f3 = tapply) suggested in the answers to compare it with f4 = rowsum and here is what i find:

   test replications elapsed relative
4 f4()          100   0.033     1.00
3 f3()          100   0.046     1.39
1 f1()          100   0.165     5.00
2 f2()          100   0.605    18.33

i have added my code below if someone wants to explore in more detail.

library(plyr);
library(rbenchmark);

val  = rnorm(50);
name = rep(letters[1:5], each = 10);
data = data.frame(val, name);

f1 = function(){aggregate(data$val, by=list(data$name), FUN=sum)}
f2 = function(){ddply(data, .(name), summarise, sum = sum(val))}
f3 = function(){tapply(data$val, data$name, sum)}
f4 = function(){rowsum(x = data$val, group = data$name)}

benchmark(f1(), f2(), f3(), f4(),
          columns=c("test", "replications", "elapsed", "relative"),
          order="relative", replications=100)

[R] Creating dataframe with subtotals by all fields and totals of , [R] Creating dataframe with subtotals by all fields and totals of subtotals http:// stackoverflow.com/questions/5982546/r-calculating-column-� R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job . Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You can use aggregate

For instance, say that you have

val = rnorm(50)
name = rep(letters[1:5], each=10)
data <- data.frame(val, name)

Then you can do

aggregate(data$val, by=list(data$name), FUN=sum)

Subtotal-class function, Subtotals and headings for categorical Variables and CrunchCubes. These are especially useful for making aggregates across multiple categories (sometimes� I haven't delved too deep into tidyeval and quasiquotation yet, but I have a case where it seems like it makes sense to use and I need some help to make it work. Say I have a tibble in wide format where each row is an election district and each column is the number of votes a candidate received. I want to calculate to total votes per district and the proportion of votes each candidate received

How to add columns with totals and subtotals to a R dataframe , I have a big Qualtrics dataset containing answers to three surveys, for which I want to calculate totals and subtotals (according to some subscales … Hi John, Thanks for explaining the difference between 1-11 & 101 to 111. When i tried to use the SUM formula using single digit it actually works on including hidden rows to the total only when the range is not filtered.

Excel Tables - Subtotals, This feature can be used to calculate both subtotals as well as grand totals and can be accessed from the Data Tab. This allows you to quickly insert the� About the Book Author. Joseph Schmuller, PhD, has taught undergraduate and graduate statistics, and has 25 years of IT experience. The author of four editions of Statistical Analysis with Excel For Dummies and three editions of Teach Yourself UML in 24 Hours (SAMS), he has created online coursework for Lynda.com and is a former Editor in Chief of PC AI magazine.

SUBTOTAL function - Office Support, This article describes the formula syntax and usage of the SUBTOTAL function in Microsoft Excel. Description. Returns a subtotal in a list or database. function_num - A number that specifies which function to use in calculating subtotals within a list. See table below for full list. ref1 - A named range or reference to subtotal. ref2 - [optional] A named range or reference to subtotal.

Comments
  • It would be much easier if you gave us a snippet of data or more details of what you actually want. There is a degree of irony in your request for examples from us when you don't provide one yourself! Seriously though, it is highly likely that most R Gurus will not use Excel and probably haven't done so for a very very long time, so don't presume we know what you mean when you say "do it like Excel". Is yname sorted? Because then it would be an aggregation task in R (i.e. you want the sums of the numeric for the groups defined by yname).
  • @Gavin Simpson: I have updated my question in accordance with your comment.
  • here are some related questions: stackoverflow.com/search?q=%5Br%5D+%22group+by%22