Getting net values as a proportion from a dataframe in R
r frequency table by group
r calculate proportion by group
r frequency table multiple categorical variables
r create table from data frame
count in r
calculate percentage in r
proportion of variable in r
I have a dataframe in R (p2.df) that has aggregated a range of values into the following (there are many more columns this is just an abridge version):
genre rating cc dd ee Adventure FAILURE 140393 20865 358806 Adventure SUCCESS 197182 32872 492874 Fiction FAILURE 140043 14833 308602 Fiction SUCCESS 197725 28848 469879 Sci-fi FAILURE 8681 1682 24259 Sci-fi SUCCESS 7439 1647 22661
I want to get the net values of the proportions for each column, which I can get in a spreadsheet but can't in R studio.
The formula in the spreadsheet follows the pattern:
net_cc = (cc(success)/(cc(success)+dd(success)+ee(success)) - (cc(fail)/(cc(fail)+dd(fail)+ee(fail))
What I want to get out in R is this table that I can get from the spreadsheet:
genre net_cc net_dd net_ee Adventure 0.002801373059 0.005350579467 -0.008151952526 Fiction -0.01825346696 0.009417699223 0.008835767735 Sci-fi -0.01641517271 0.003297091109 0.0131180816
Any ideas how? If it's any use I created the p2.df by summarising a previous table as:
library(dplyr) p2.df<- s2.df %>% group_by(genre,rating) %>% summarise_all(sum)
library(tidyverse) df %>% gather(,,3:5) %>% spread(rating,value) %>% group_by(genre) %>% transmute(key,net = SUCCESS/sum(SUCCESS) - FAILURE/sum(FAILURE)) %>% ungroup %>% spread(key,net) # # A tibble: 3 x 4 # genre cc dd ee # <chr> <dbl> <dbl> <dbl> # 1 Adventure 0.00280 0.00535 -0.00815 # 2 Fiction -0.0183 0.00942 0.00884 # 3 Sci-fi -0.0164 0.00330 0.0131
Getting net values as a proportion from a dataframe in R (part 2 , I recently got help with calculating net proportions for a table in R, but trying to make a summary of that hasn't work. I recently got help with calculating net proportions for a table in R, but trying to make a summary of that hasn't worked and as I selected an answer I need to post a new question. Here is my ori
It's always better to work on data in long format. But if OP doesnt want to transform data in long format due to any constraint (e.g. number of columns are more which will lead to large number of rows in long format etc) then a solution in using
dplyr::summarise_at can be achieved as:
library(dplyr) df %>% mutate(rowSum = rowSums(.[,names(df)[3:5]])) %>% group_by(genre) %>% summarise_at(vars(names(df)[3:5]), funs(net = .[rating == "SUCCESS"]/rowSum[rating == "SUCCESS"] - .[rating == "FAILURE"]/rowSum[rating == "FAILURE"] )) %>% as.data.frame() # genre cc_net dd_net ee_net # 1 Adventure 0.002801373 0.005350579 -0.008151953 # 2 Fiction -0.018253467 0.009417699 0.008835768 # 3 Sci-fi -0.016415173 0.003297091 0.013118082
df <- read.table(text=" genre rating cc dd ee Adventure FAILURE 140393 20865 358806 Adventure SUCCESS 197182 32872 492874 Fiction FAILURE 140043 14833 308602 Fiction SUCCESS 197725 28848 469879 Sci-fi FAILURE 8681 1682 24259 Sci-fi SUCCESS 7439 1647 22661", header = TRUE, stringsAsFactors = FALSE)
Basic Statistical Analysis Using the R Statistical Package, R can be downloaded from the Internet site of the Comprehensive R Archive Network (CRAN) In R, these values can be represented as a column vector (as a data set, these The 'read.csv' command creates an object (dataframe) for the entire data set 1.8 Finding frequencies and proportions for categorical variables In categorical data analysis, many R techniques use the marginal totals of the table in the calculations. The marginal totals are the total counts of the cases over the categories of interest. For example, the marginal totals for behavior would be the sum over the rows of the table trial.table.
My answer is very close to @MKR answer, however, I just wish to point out, that we can make use of decoded
SUCESS = 1 and
FAILURE = -1`) variable to avoid subsetting in the last part:
df %>% mutate(rating = (rating == "SUCCESS")*2 - 1, denom = rowSums(.[3:5])) %>% group_by(genre) %>% summarise_at(vars(cc:ee), funs(sum(rating * . / denom))) # A tibble: 3 x 4 # genre cc dd ee # <chr> <dbl> <dbl> <dbl> # 1 Adventure 0.00280 0.00535 -0.00815 # 2 Fiction -0.0183 0.00942 0.00884 # 3 Sci-fi -0.0164 0.00330 0.0131
Frequencies, Learn how to use R to create frequency and contingency tables from using the table( ) function, tables of proportions using the prop.table( ) function, and marginal expected and residual values (pearson, standardized, adjusted standardized), code on the Rhelp mailing list for converting a table back into a data frame. After you have the data table with the counts, you can use R to easily calculate the proportion of each count to the total simply by dividing the table by the total counts. To calculate the proportion of manual and automatic gearboxes in the dataset cars, you can use the following code: > amtable/sum(amtable) auto […]
How can I calculate the frequency of specific words for each row in , Then I would like to add these values to my dataframe as a new variable in R. For example: Let's say, I have a list of words remove spaces between words and get frequency If you want to proportion instead of counts, you can use: for(i in To know whether seat belts made a difference in the chances of surviving, you can carry out a proportion test. This test tells how probable it is that both proportions are the same. A low p-value tells you that both proportions probably differ from each other. To test this in R, you can use the prop.test() function on the preceding matrix:
Which R package will calculate absolute & relative change?, Absolute change ( B - F) and percentage change [(B - F )/ B], where B and F are If you have a data frame where B and F corresponds to separate columns, then were single values and you'd calculate the change for these single values. my response variable and model, I get a message from R telling me 'singular fit'. So the sum of a vector of 1’s and 0’s will be the total number of 1’s! Divided by the length then gives you the proportion. As a side note, you might realise that you can use sum() instead of mean() if you want to calculate the frequency.
[R] Calculating proportions from a data frame rather than a table, I had to scrap my column that contained character values. 1 0 0 I had tried the following and was getting nowhere > as.table(coinfection) Error This splits the data.frame yourdata by unique combinations of the variables sex and age. Then, for each of those chunks (referred to as x ), it calculates the number of people who belong to that group ( n ), how many of them are married ( ever.married.n ), and what proportion of them are married ( ever.married.prop ).
- Thanks all, I selected Moody's as the answer as it was the simplest (I couldn't get utubun's neater one to work) but MKR's also worked.
- ...and then it stopped working. I think it's because I used 'summarise_at' to get the above dataframe and it doesn't like working with groups.
- that's probably because you created your data set by
data.frame()or read it by
read.csv()which by default convert strings to factors. I wrote my example using data with
character, that's a default for
readr. Please see the data @MKR used in his answer (last row -
stringsAsFactors = FALSE).
- Thanks yes you were right, the table had groupings so I added as.data.frame() that fixed it.
- it's a neat intuitive solution, but you can clean it a bit further, you could use just
rowSums(.[,3:5])on 1st line and then
- @Moody_Mudskipper Thats elegant suggestion. I had done same at first. But, the problem was that for
summarise_atit was expected to be as
2:4since one column was out for grouping. Hence, I thought it would be easier to relate if I use
3:5at both places.
- It could be something related to different versions, for me it works with
3:5and returns an error with
2:4. I'm using dplyr_0.7.5
- see: stackoverflow.com/questions/45883513/…
- Thanks for your help on this, you say it's better to work in long format which I'm happy to do as these solutions seem to be behaving temperamentally on me - when I expand them into my own summary table sometimes it works and sometimes it'll tell me objects (column names) are not found or Column
rowSummust be length 2 (the group size) or one, not 16. Perhaps I should try the long form rather than from the summary table.