Transforming R dataframe by applying function rowwise and create (possibly) larger columns
r apply function with multiple arguments
dplyr cheat sheet
apply function in r
mutate in r
r apply custom function to each row
summarize in r
group by in r
I'm trying to transform a dataframe (tibble) by using each row as function arguments and create a new column out of it, which is possibly bigger than the number of arguments. Consider the following example, where I have some sample observations:
library(dplyr) library(stringi) observations <- c("110", "11011", "1100010") df <- tibble(obs = observations) %>% transmute( Failure = stri_count(obs, fixed = "0"), Success = stri_count(obs, fixed = "1") )
df is then:
# A tibble: 3 x 2 Failure Success <int> <int> 1 1 2 2 1 4 3 4 3
I would like to take every row and use that for calculating a bunch of values, and save each result vector in a new column. For example I would like to do:
p_values = pgrid <- seq(from = 0, to = 1, length.out = 11) df %>% rowwise() %>% transmute( p = p_values, likelihood = dbinom(Success, size = Failure + Success, prob = p_values ) ) Error: Column `p` must be length 1 (the group size), not 11
And get something like:
# A tibble: 4 x 11 p_values likelihood_1 likelihood_2 likelihood_3 <float> <float> <float> <float> 1 0 ... ... ... 2 0.1 ... ... ... ... ... ... ... ... 10 0.9 ... ... ... 11 1 ... ... ...
I would actually switch into
purrr for this. The function
pmap() will iterate by row. We use
..2 to signify the first and second inputs, respectively. Using
pmap_dfc() will bind the results by columns (dfc = data frame columns).
library(purrr) library(tibble) df %>% pmap_dfc(~ dbinom(..2, size = ..1 + ..2, prob = p_values)) %>% set_names(paste0("likelihood_", seq_along(.))) %>% add_column(p_values = p_values, .before = 1)
# A tibble: 11 x 4 p_values likelihood_1 likelihood_2 likelihood_3 <dbl> <dbl> <dbl> <dbl> 1 0 0 0 0 2 0.1 0.027 0.00045 0.0230 3 0.2 0.096 0.0064 0.115 4 0.3 0.189 0.0284 0.227 5 0.4 0.288 0.0768 0.290 6 0.5 0.375 0.156 0.273 7 0.6 0.432 0.259 0.194 8 0.7 0.441 0.360 0.0972 9 0.8 0.384 0.410 0.0287 10 0.9 0.243 0.328 0.00255 11 1 0 0 0
Different ways of calculating rowmeans on selected variables in a , I find it weird to use the apply -family of functions within the dplyr -pipes. some functions on “rows” (values spread across different columns on individual (row) is so common that I imagined it to have a bigger role in We'll use this dataframe : The easiest way is to use the base R rowmeans function: Here, we apply the function over the columns. In the case of more-dimensional arrays, this index can be larger than 2. The name of the function that has to be applied: You can use quotation marks around the function name, but you don’t have to. Here, we apply the function max. Note that there are no parentheses needed after the function name.
This sort of workflow can be somewhat awkward with a tidyverse approach, as the data is not in a 'tidy' format.
I would come at it from the other angle, starting with the
likelihoods <- tibble(p = p_values) %>% mutate(likelihood_1 = dbinom(df[1,]$Success,size = df[1,]$Failure + df[1,]$Success,prob = p), likelihood_2 = dbinom(df[2,]$Success,size = df[2,]$Failure + df[2,]$Success,prob = p), likelihood_3 = dbinom(df[3,]$Success,size = df[3,]$Failure + df[3,]$Success,prob = p))
5 Data transformation, Often you'll need to create some new variables or summaries, or maybe you Create new variables with functions of existing variables ( mutate() ). It takes a data frame and a set of column names (or more complicated expressions) to order by. verbs on a grouped data frame they'll be automatically applied “by group”. After you created the DataFrame in R, using either of the above methods, you can then apply some statistical analysis. In the next, and final section, I’ll show you how to apply some basic stats in R. Applying Basic Stats in R. Once you created the DataFrame, you can apply different computations and statistical analysis to your data.
The issue is that
mutate expects the number of elements to be same as number of rows (or if it is grouped, then the number of rows for that group). Here, we do
rowwise- which is basically grouping each row, so the
n() expected is 1, whereas the output is
length of 'p_values'. One option is to wrap in a
unnest, and reshape to 'wide' format with
pivot_wider (if needed)
library(dplyr) library(tidyr) library(stringr) df %>% mutate(grp = str_c('likelihood_', row_number())) %>% rowwise() %>% transmute(grp, p = list(p_values), likelihood = list(dbinom(Success, size = Failure + Success, prob = p_values )) ) %>% unnest(c(p, likelihood)) %>% pivot_wider(names_from = grp, values_from = likelihood) # A tibble: 11 x 4 # p likelihood_1 likelihood_2 likelihood_3 # <dbl> <dbl> <dbl> <dbl> # 1 0 0 0 0 # 2 0.1 0.027 0.00045 0.0230 # 3 0.2 0.096 0.0064 0.115 # 4 0.3 0.189 0.0284 0.227 # 5 0.4 0.288 0.0768 0.290 # 6 0.5 0.375 0.156 0.273 # 7 0.6 0.432 0.259 0.194 # 8 0.7 0.441 0.360 0.0972 # 9 0.8 0.384 0.410 0.0287 #10 0.9 0.243 0.328 0.00255 #11 1 0 0 0
6 Data Transformations, We can use rowwise in a pipe chain to tell dplyr to do all following commands row-by-row: Let's create a function and apply it row by row to a data frame: But if your matrix is large, you will notice a meaningful runtime slowdown using purrr . data but will probably not do what you want if some columns are numeric and� Teams. Q&A for Work. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information.
Dealing with Apply functions in R | by vikashraj luhaniwal, However, at large scale data processing usage of these loops can… To make use of profvis , enclose the instructions in profvis() , it opens an interactive MARGIN indicates how the function is applicable whether row-wise or column- wise, class(result) #class is a list#case 2. data frame as an input Many times you want additional information by applying calculation on existing columns and then want to add it to your existing dataframe so that it is part of overall dataframe or dataset. In
Data Wrangling Part 2: Transforming your columns into the right shape, You can make new columns with the mutate() function. mean(c(sleep_rem, sleep_cycle))) ## Source: local data frame [83 x 5] Imagine that we have a database with two large values which we assume are You just pass an action ( in the form of a function) that you want to apply across all columns. To call a function for each row in an R data frame, we shall use R apply function. apply ( data_frame , 1 , function , arguments_to_function_if_any ) The second argument 1 represents rows, if it is 2 then the function would apply on columns.
A brief introduction to “apply” in R, base::by Apply a Function to a Data Frame Split by Factors. base::eapply Apply a create a matrix of 10 rows x 2 columns. m <- matrix ( c (1:10,� The first column of our example data is called x1 and the column at the third position is called x3. For that reason, the previous R syntax would extract the columns x1 and x3 from our data set. Example 3: Subsetting Data with select Argument of subset Function. In Example 3, we will extract certain columns with the subset function.