Getting net values as a proportion from a dataframe in R

proportion table in r
r frequency table by group
r calculate proportion by group
r frequency table multiple categorical variables
r create table from data frame
count in r
calculate percentage in r
proportion of variable in r

I have a dataframe in R (p2.df) that has aggregated a range of values into the following (there are many more columns this is just an abridge version):

genre       rating  cc      dd      ee
Adventure   FAILURE 140393  20865   358806
Adventure   SUCCESS 197182  32872   492874
Fiction     FAILURE 140043  14833   308602
Fiction     SUCCESS 197725  28848   469879
Sci-fi      FAILURE 8681    1682    24259
Sci-fi      SUCCESS 7439    1647    22661

I want to get the net values of the proportions for each column, which I can get in a spreadsheet but can't in R studio.

The formula in the spreadsheet follows the pattern:

net_cc = (cc(success)/(cc(success)+dd(success)+ee(success)) - (cc(fail)/(cc(fail)+dd(fail)+ee(fail))

What I want to get out in R is this table that I can get from the spreadsheet:

genre       net_cc          net_dd          net_ee
Adventure   0.002801373059  0.005350579467  -0.008151952526
Fiction     -0.01825346696  0.009417699223  0.008835767735
Sci-fi      -0.01641517271  0.003297091109  0.0131180816

Any ideas how? If it's any use I created the p2.df by summarising a previous table as:

library(dplyr)

p2.df<- s2.df %>% group_by(genre,rating) %>% summarise_all(sum)  

using tidyverse:

library(tidyverse)
df %>% gather(,,3:5) %>%
  spread(rating,value) %>%
  group_by(genre) %>%
  transmute(key,net = SUCCESS/sum(SUCCESS) - FAILURE/sum(FAILURE)) %>%
  ungroup %>%
  spread(key,net)

# # A tibble: 3 x 4
# genre           cc      dd       ee
#   <chr>        <dbl>   <dbl>    <dbl>
# 1 Adventure  0.00280 0.00535 -0.00815
# 2 Fiction   -0.0183  0.00942  0.00884
# 3 Sci-fi    -0.0164  0.00330  0.0131 

Getting net values as a proportion from a dataframe in R (part 2 , I recently got help with calculating net proportions for a table in R, but trying to make a summary of that hasn't work. I recently got help with calculating net proportions for a table in R, but trying to make a summary of that hasn't worked and as I selected an answer I need to post a new question. Here is my ori

It's always better to work on data in long format. But if OP doesnt want to transform data in long format due to any constraint (e.g. number of columns are more which will lead to large number of rows in long format etc) then a solution in using dplyr::summarise_at can be achieved as:

library(dplyr)

df %>% mutate(rowSum = rowSums(.[,names(df)[3:5]])) %>%
  group_by(genre) %>%
  summarise_at(vars(names(df)[3:5]),
              funs(net = .[rating == "SUCCESS"]/rowSum[rating == "SUCCESS"] - 
                         .[rating == "FAILURE"]/rowSum[rating == "FAILURE"] )) %>%
  as.data.frame()

#       genre       cc_net      dd_net       ee_net
# 1 Adventure  0.002801373 0.005350579 -0.008151953
# 2   Fiction -0.018253467 0.009417699  0.008835768
# 3    Sci-fi -0.016415173 0.003297091  0.013118082

Data:

df <- read.table(text="
genre       rating  cc      dd      ee
Adventure   FAILURE 140393  20865   358806
Adventure   SUCCESS 197182  32872   492874
Fiction     FAILURE 140043  14833   308602
Fiction     SUCCESS 197725  28848   469879
Sci-fi      FAILURE 8681    1682    24259
Sci-fi      SUCCESS 7439    1647    22661",
header = TRUE, stringsAsFactors = FALSE)

Basic Statistical Analysis Using the R Statistical Package, R can be downloaded from the Internet site of the Comprehensive R Archive Network (CRAN) In R, these values can be represented as a column vector (as a data set, these The 'read.csv' command creates an object (dataframe) for the entire data set 1.8 Finding frequencies and proportions for categorical variables  In categorical data analysis, many R techniques use the marginal totals of the table in the calculations. The marginal totals are the total counts of the cases over the categories of interest. For example, the marginal totals for behavior would be the sum over the rows of the table trial.table.

My answer is very close to @MKR answer, however, I just wish to point out, that we can make use of decoded rating (SUCESS = 1 and FAILURE = -1`) variable to avoid subsetting in the last part:

df %>% 
  mutate(rating = (rating == "SUCCESS")*2 - 1, denom = rowSums(.[3:5])) %>%
  group_by(genre) %>%
  summarise_at(vars(cc:ee), funs(sum(rating * . / denom)))

   #   A tibble: 3 x 4
   #   genre           cc      dd       ee
   #   <chr>        <dbl>   <dbl>    <dbl>
   # 1 Adventure  0.00280 0.00535 -0.00815
   # 2 Fiction   -0.0183  0.00942  0.00884
   # 3 Sci-fi    -0.0164  0.00330  0.0131 

Frequencies, Learn how to use R to create frequency and contingency tables from using the table( ) function, tables of proportions using the prop.table( ) function, and marginal expected and residual values (pearson, standardized, adjusted standardized), code on the Rhelp mailing list for converting a table back into a data frame. After you have the data table with the counts, you can use R to easily calculate the proportion of each count to the total simply by dividing the table by the total counts. To calculate the proportion of manual and automatic gearboxes in the dataset cars, you can use the following code: > amtable/sum(amtable) auto […]

How can I calculate the frequency of specific words for each row in , Then I would like to add these values to my dataframe as a new variable in R. For example: Let's say, I have a list of words remove spaces between words and get frequency If you want to proportion instead of counts, you can use: for(i in  To know whether seat belts made a difference in the chances of surviving, you can carry out a proportion test. This test tells how probable it is that both proportions are the same. A low p-value tells you that both proportions probably differ from each other. To test this in R, you can use the prop.test() function on the preceding matrix:

Which R package will calculate absolute & relative change?, Absolute change ( B - F) and percentage change [(B - F )/ B], where B and F are If you have a data frame where B and F corresponds to separate columns, then were single values and you'd calculate the change for these single values. my response variable and model, I get a message from R telling me 'singular fit'. So the sum of a vector of 1’s and 0’s will be the total number of 1’s! Divided by the length then gives you the proportion. As a side note, you might realise that you can use sum() instead of mean() if you want to calculate the frequency.

[R] Calculating proportions from a data frame rather than a table, I had to scrap my column that contained character values. 1 0 0 I had tried the following and was getting nowhere > as.table(coinfection) Error  This splits the data.frame yourdata by unique combinations of the variables sex and age. Then, for each of those chunks (referred to as x ), it calculates the number of people who belong to that group ( n ), how many of them are married ( ever.married.n ), and what proportion of them are married ( ever.married.prop ).

Comments
  • Thanks all, I selected Moody's as the answer as it was the simplest (I couldn't get utubun's neater one to work) but MKR's also worked.
  • ...and then it stopped working. I think it's because I used 'summarise_at' to get the above dataframe and it doesn't like working with groups.
  • that's probably because you created your data set by data.frame() or read it by read.csv() which by default convert strings to factors. I wrote my example using data with rating and genre converted to character, that's a default for tible and read_csv from readr. Please see the data @MKR used in his answer (last row - stringsAsFactors = FALSE).
  • Thanks yes you were right, the table had groupings so I added as.data.frame() that fixed it.
  • it's a neat intuitive solution, but you can clean it a bit further, you could use just rowSums(.[,3:5]) on 1st line and then summarise_at(3:5,...
  • @Moody_Mudskipper Thats elegant suggestion. I had done same at first. But, the problem was that for summarise_at it was expected to be as 2:4 since one column was out for grouping. Hence, I thought it would be easier to relate if I use 3:5 at both places.
  • It could be something related to different versions, for me it works with 3:5 and returns an error with 2:4 . I'm using dplyr_0.7.5
  • see: stackoverflow.com/questions/45883513/…
  • Thanks for your help on this, you say it's better to work in long format which I'm happy to do as these solutions seem to be behaving temperamentally on me - when I expand them into my own summary table sometimes it works and sometimes it'll tell me objects (column names) are not found or Column rowSum must be length 2 (the group size) or one, not 16. Perhaps I should try the long form rather than from the summary table.