How to divide each row of a dataframe by rows of corresponding columns in a dataframe while recycling values

pandas dataframe divide multiple columns by one column
pandas divide column by scalar
pandas groupby divide two columns
dataframe divide one column by another
divide two dataframes pandas
dataframe divide each row
pandas divide one dataframe by another
pandas series divide

I have a dataframe that looks like:

Gene.names=c("ESR", "ESR.1", "ESR.2", "ESR.3", "PKB", "PKB.1", "PKB.2", "PKB.3")
mean_0.x = c(3,2,5,9,2,4,6,7)
mean_1.x = c(6,2,5,1,9,1,1,9)
mean_2.x = c(3,2,9,9,6,7,3,3)
mean_0.y = c(1,NA,NA,NA,6,NA,NA,NA)
mean_1.y = c(1,NA,NA,NA,3,NA,NA,NA)
mean_2.y = c(6,NA,NA,NA,4,NA,NA,NA)

df = cbind.data.frame(Gene.names, mean_0.x, mean_1.x, mean_2.x, mean_0.y, mean_1.y, mean_2.y)

My desired output:

Gene.names = c("ESR", "ESR.1", "ESR.2", "ESR.3", "PKB", "PKB.1", "PKB.2", "PKB.3")
mean_0_diff = c(3,2,5,9,0.33,0.66,1,1.16)
mean_1_diff = c(6,2,5,1,3,0.33,.0.33,3)
mean_2_diff = c(0.5,0.33,1.5,1.5,1.5,1.75,0.75,0.75)

df_out = cbind.data.frame(Gene.names, mean_0_diff, mean_1_diff, mean_2_diff)
  1. My dataframe contains thousands of rows and >50 columns
  2. I want to divide corresponding columns, e.g., mean_0.x/mean_0.y; mean_1.x/mean_1.y; mean_2.x/mean_2.y; and so on
  3. I want to recycle row values in *.y such that, in this example, values in mean_0.y is used 4 times on mean_0.x. However, in my real dataset this "recycling" has to occur any unknown number of times.

Using tidyverse:

library(tidyverse)
res <- cbind(df[1],
             `/`(df %>% select(ends_with('x')),
                 df %>% select(ends_with('y')) %>% 
                   fill(everything())))

#   Gene.names  mean_0.x  mean_1.x  mean_2.x
# 1        ESR 3.0000000 6.0000000 0.5000000
# 2      ESR.1 2.0000000 2.0000000 0.3333333
# 3      ESR.2 5.0000000 5.0000000 1.5000000
# 4      ESR.3 9.0000000 1.0000000 1.5000000
# 5        PKB 0.3333333 3.0000000 1.5000000
# 6      PKB.1 0.6666667 0.3333333 1.7500000
# 7      PKB.2 1.0000000 0.3333333 0.7500000
# 8      PKB.3 1.1666667 3.0000000 0.7500000

And this would be the idiomatic way:

df %>%
  fill(ends_with('y')) %>%
  gather(,,-1) %>%
  separate(key,c("key","xy"),sep="\\.") %>%
  spread(xy,value) %>%
  transmute(Gene.names,key, value=x /y) %>%
  spread(key,value) 

#   Gene.names    mean_0    mean_1    mean_2
# 1        ESR 3.0000000 6.0000000 0.5000000
# 2      ESR.1 2.0000000 2.0000000 0.3333333
# 3      ESR.2 5.0000000 5.0000000 1.5000000
# 4      ESR.3 9.0000000 1.0000000 1.5000000
# 5        PKB 0.3333333 3.0000000 1.5000000
# 6      PKB.1 0.6666667 0.3333333 1.7500000
# 7      PKB.2 1.0000000 0.3333333 0.7500000
# 8      PKB.3 1.1666667 3.0000000 0.7500000 

r, where each row value (intensity) has its corresponding p value of detection right in the next column. Notice how the dataframe may contain cases like the "Asd" column where the detection p value column How to divide each row of a dataframe by rows of corresponding columns in a dataframe while recycling values. Dealing with Columns. In order to deal with columns, we perform basic operations on columns like selecting, deleting, adding and renaming. Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.

The following needs function na.locf from package zoo.

inx.x <- grep("x$", names(df))
inx.y <- grep("y$", names(df))

df[inx.y] <- lapply(df[inx.y], zoo::na.locf)

df_out2 <- df[1]
df_out2 <- cbind(df_out2, df[inx.x]/df[inx.y])

nms <- sub("\\.x$", "", names(df[inx.x]))
names(df_out2)[-1] <- paste(nms, "diff", sep = "_")

df_out2
#  Gene.names mean_0_diff mean_1_diff mean_2_diff
#1        ESR   3.0000000   6.0000000   0.5000000
#2      ESR.1   2.0000000   2.0000000   0.3333333
#3      ESR.2   5.0000000   5.0000000   1.5000000
#4      ESR.3   9.0000000   1.0000000   1.5000000
#5        PKB   0.3333333   3.0000000   1.5000000
#6      PKB.1   0.6666667   0.3333333   1.7500000
#7      PKB.2   1.0000000   0.3333333   0.7500000
#8      PKB.3   1.1666667   3.0000000   0.7500000

Note that the results are not equal, since your results are rounded values:

all.equal(df_out, df_out2)
#[1] "Component "mean_0_diff": Mean relative difference: 0.007751938"
#[2] "Component "mean_1_diff": Mean relative difference: 0.01010101" 
#[3] "Component "mean_2_diff": Mean relative difference: 0.01010101"

pandas.DataFrame.divide, DataFrame.iterrows · pandas. DataFrame.all · pandas. DataFrame.plot.line · pandas. DataFrame. divide (self, other, axis='columns', level=None, values, and any new element needed for successful DataFrame alignment, with If data in both corresponding DataFrame locations is missing the result will be missing. def splitDataFrameList(df,target_column,separator): ''' df = dataframe to split, target_column = the column containing the values to split separator = the symbol used to perform the split returns: a dataframe with each entry for the target column separated, with each element moved into a new row. The values in the other columns are duplicated across the newly divided rows. ''' def splitListToRows(row, row_accumulator, target_columns, separator): split_rows = [] for target_column in target

Another option can be to work on data in wide-format itself. A dplyr based solution using mutate_at can be written as:

library(dplyr)

# Group data on base name of 'Gene.names` first.
df %>% group_by(Gene = gsub("(^\\w+)\\..*","\\1", Gene.names)) %>%
  # For each column ending with .x divide corresponding column ending with .y
  mutate_at(vars(ends_with(".x")), 
            funs(diff = ./get(sub("\\.x",".y",quo_name(quo(.))))[1] )) %>%
  ungroup() %>%
  select( Gene.names, ends_with("diff"))


# # A tibble: 8 x 4
# Gene.names   mean_0.x_diff mean_1.x_diff mean_2.x_diff
# <fctr>             <dbl>         <dbl>         <dbl>
# 1 ESR                3.00          6.00          0.500
# 2 ESR.1              2.00          2.00          0.333
# 3 ESR.2              5.00          5.00          1.50 
# 4 ESR.3              9.00          1.00          1.50 
# 5 PKB                0.333         3.00          1.50 
# 6 PKB.1              0.667         0.333         1.75 
# 7 PKB.2              1.00          0.333         0.750
# 8 PKB.3              1.17          3.00          0.750

split: Divide into Groups and Reassemble, The replacement forms replace values corresponding to such a division. vector or data frame containing values to be divided into groups. are obtained by unsplitting the row name vectors from the elements of value . f is recycled as necessary and if the length of x is not a multiple of the length of f a warning is printed. In this article we will discuss different ways to select rows and columns in DataFrame. DataFrame provides indexing labels loc & iloc for accessing the column and rows. Also, operator [] can be used to select columns. Let’s discuss them one by one, First create a DataFrame object i.e. Contents of DataFrame object dfObj are,

15 Easy Solutions To Your Data Frame Problems In R, Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. This means that a data frame's rows do not need to contain, but can contain, Each column needs to consist of values of the same type, since they are data  That would only columns 2005, 2008, and 2009 with all their rows. Extracting specific rows of a pandas dataframe ¶ df2[1:3] That would return the row with index 1, and 2. The row with index 3 is not included in the extract because that’s how the slicing syntax works. Note also that row with index 1 is the second row.

split function, The replacement forms replace values corresponding to such a division. vector or data frame containing values to be divided into groups. are obtained by unsplitting the row name vectors from the elements of value . f is recycled as necessary and if the length of x is not a multiple of the length of f a warning is printed. dividing values of each column in a dataframe. Hey Guys I want to divide(numerically) all the columns of a data frame by different numbers. Here is what I am doing

15 Easy Solutions To Your Data Frame Problems In R, Each row of these grids corresponds to measurements or values of an instance, while each column is a vector containing data for a specific variable. To check how many rows and columns you have in your data frame, you can Let's recycle the code from the previous section in which two data frames  Pandas dataframe.div() is used to find the floating division of the dataframe and other element-wise. This function is similar to datafram/other, but with an additional support to handle missing value in one of the input data. Parameters: fill_value : Fill missing (NaN) values with this value.

Comments
  • I removed the .x, .y with str_replace
  • The solution will work nicely as long as data is sorted on Gene.names.
  • @MKR Yes, I noticed that. I believe that the OP posted data similar to what needs to be processed.
  • Thanks. With the funs operation I am getting Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘funs’ for signature ‘"numeric"’. The solution from @Moody_Mudskipper allows me to avoid that
  • @ip2018 Are you getting that error with the data frame shared with question? If yes then please re-start your RStudio and try again. If you are getting error with another data.frame then please share output of dput(head(df)) so that I can investigate further.