## Multiply each row of one dataframe by all rows of a second dataframe

Am struggling with operation as my datasets are very large and i have provided an example of what i want.

I have two dataframes.

df1 - contains sampling-derived iterations for each parameter of a variable defined as the column name (10,000 rows)

df2 - contains the actual value of each of the variable defined as the column name (4,000 rows)

I want a df3 which is effectively the multiplication of each row of df2 by df1 and would therefore be 4000*10000 rows

As a short example i have provided a minimal example of df1 and df2. I have provided the output that i would be looking at shown in df3.

df1 <- structure(list(intercept = c(3.4, 3.6, 3.7), age = c(0.08, 0.05, 0.06), male = c(0.07, 0.06, 0.07)), class = "data.frame", row.names = c(NA, -3L)) df2 <- structure(list(id = structure(1:2, .Label = c("a", "b"), class = "factor"), intercept = c(1L, 1L), age = c(40L, 45L), male = 1:0), class = "data.frame", row.names = c(NA, -2L)) df3 <- structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("a", "b"), class = "factor"), intercept = c(3.4, 3.6, 3.7, 3.4, 3.6, 3.7), age = c(3.2, 2, 2.4, 3.6, 2.25, 2.7), male = c(0.07, 0.06, 0.07, 0, 0, 0)), class = "data.frame", row.names = c(NA, -6L))

Can somebody point me to an efficient way to do this in R?

Another idea via base R using `outer`

,

data.frame(id = rep(df2$id, each = nrow(df1)), mapply(function(x, y)c(outer(x, y, `*`)), df1, df2[-1]) )

which gives,

id intercept age male 1 a 3.4 3.20 0.07 2 a 3.6 2.00 0.06 3 a 3.7 2.40 0.07 4 b 3.4 3.60 0.00 5 b 3.6 2.25 0.00 6 b 3.7 2.70 0.00

**pandas.DataFrame.multiply — pandas 1.1.1 documentation,** Parameters. otherscalar, sequence, Series, or DataFrame. Any single or multiple element data structure, or list-like object. axis{0 or 'index', 1 or 'columns'}. DataFrame.multiply (other, axis = 'columns', level = None, fill_value = None) [source] ¶ Get Multiplication of dataframe and other, element-wise (binary operator mul ). Equivalent to dataframe * other , but with support to substitute a fill_value for missing data in one of the inputs.

You can perform row-wise Kronecker product (from package `MGLM`

) like below

out <- data.frame(id = rep(df2$id,each=nrow(df1)), t(MGLM::kr(t(df2[-1]),t(df1))))

such that

> out id intercept age male 1 a 3.4 3.20 0.07 2 a 3.6 2.00 0.06 3 a 3.7 2.40 0.07 4 b 3.4 3.60 0.00 5 b 3.6 2.25 0.00 6 b 3.7 2.70 0.00

**Benchmarking** (so far the approach by @Sotos is the winner)

df1 <- do.call(rbind,replicate(500,structure(list(intercept = c(3.4, 3.6, 3.7), age = c(0.08, 0.05, 0.06), male = c(0.07, 0.06, 0.07)), class = "data.frame", row.names = c(NA, -3L)),simplify = F)) df2 <- do.call(rbind,replicate(100,structure(list(id = structure(1:2, .Label = c("a", "b"), class = "factor"), intercept = c(1L, 1L), age = c(40L, 45L), male = 1:0), class = "data.frame", row.names = c(NA, -2L)),simplify = F)) library(MGLM) library(purrr) f_ThomasIsCoding <- function() { data.frame(id = rep(df2$id,each=nrow(df1)), t(MGLM::kr(t(df2[-1]),t(df1)))) } f_tmfmnk_1 <- function() { map_dfr(.x = asplit(df2[-1], 1), ~ sweep(df1, 2, FUN = `*`, .x)) } f_tmfmnk_2 <- function() { data.frame(do.call(rbind, lapply(asplit(df2[-1], 1), function(x) sweep(df1, 2, FUN = `*`, x))), id = rep(df2$id, each = nrow(df1))) } f_RonakShah <- function() { new1 <- df1[rep(seq(nrow(df1)), nrow(df2)), ] new2 <- df2[rep(seq(nrow(df2)), each = nrow(df1)),] out <- cbind(new2[1], new1 * new2[-1]) rownames(out) <- NULL out } f_Sotos <- function() { data.frame(id = rep(df2$id, each = nrow(df1)), mapply(function(x, y)c(outer(x, y, `*`)), df1, df2[-1]) ) } bmk <- microbenchmark(times = 20, unit = "relative", f_ThomasIsCoding(), f_tmfmnk_1(), f_tmfmnk_2(), f_RonakShah(), f_Sotos())

which gives

> bmk Unit: relative expr min lq mean median uq max neval f_ThomasIsCoding() 1.186124 1.218201 1.197346 1.321731 1.042721 1.077854 20 f_tmfmnk_1() 7.594520 7.572723 4.539698 7.297610 2.437621 3.446436 20 f_tmfmnk_2() 9.670286 12.212220 6.583183 11.888061 3.370593 4.088534 20 f_RonakShah() 28.918724 28.861437 16.707258 27.889563 8.403161 11.668252 20 f_Sotos() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20

**pandas.DataFrame.dot — pandas 1.1.1 documentation,** In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication. The dot method� Apply a function to each row or column in Dataframe using pandas.apply() Apply a function to single or selected columns or rows in Pandas Dataframe; Ways to apply an if condition in Pandas DataFrame; Ways to apply an if condition in Pandas DataFrame; Apply uppercase to a column in Pandas dataframe; Highlight Pandas DataFrame's specific columns

You could repeat rows in both the dataframes based on number of rows in other dataframe and multiply them directly

df1[rep(seq(nrow(df1)), nrow(df2)),] * df2[rep(seq(nrow(df2)), each = nrow(df1)),-1] # intercept age male #1 3.4 3.20 0.07 #2 3.6 2.00 0.06 #3 3.7 2.40 0.07 #1.1 3.4 3.60 0.00 #2.1 3.6 2.25 0.00 #3.1 3.7 2.70 0.00

To also get `id`

column

new1 <- df1[rep(seq(nrow(df1)), nrow(df2)), ] new2 <- df2[rep(seq(nrow(df2)), each = nrow(df1)),] out <- cbind(new2[1], new1 * new2[-1]) rownames(out) <- NULL out # id intercept age male #1 a 3.4 3.20 0.07 #2 a 3.6 2.00 0.06 #3 a 3.7 2.40 0.07 #4 b 3.4 3.60 0.00 #5 b 3.6 2.25 0.00 #6 b 3.7 2.70 0.00

**Python,** Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.mul() function return multiplication of dataframe and Example #2: Use mul() function to find the multiplication of two datframes. Notice, all the missing value cells has been filled with 100 before� Now, to apply this lambda function to each row in dataframe, pass the lambda function as first argument and also pass axis=1 as second argument in Dataframe.apply () with above created dataframe object i.e. # Apply a lambda function to each row by adding 5 to each value in each column modDfObj = dfObj.apply(lambda x: x + 5, axis=1)

One option involving `purrr`

could be:

map_dfr(.x = asplit(df2[-1], 1), ~ sweep(df1, 2, FUN = `*`, .x)) intercept age male 1 3.4 3.20 0.07 2 3.6 2.00 0.06 3 3.7 2.40 0.07 4 3.4 3.60 0.00 5 3.6 2.25 0.00 6 3.7 2.70 0.00

If also the id column is important:

data.frame(map_dfr(.x = asplit(df2[-1], 1), ~ sweep(df1, 2, FUN = `*`, .x)), id = rep(df2$id, each = nrow(df1))) intercept age male id 1 3.4 3.20 0.07 a 2 3.6 2.00 0.06 a 3 3.7 2.40 0.07 a 4 3.4 3.60 0.00 b 5 3.6 2.25 0.00 b 6 3.7 2.70 0.00 b

The same with `base R`

:

do.call(rbind, lapply(asplit(df2[-1], 1), function(x) sweep(df1, 2, FUN = `*`, x)))

Or:

data.frame(do.call(rbind, lapply(asplit(df2[-1], 1), function(x) sweep(df1, 2, FUN = `*`, x))), id = rep(df2$id, each = nrow(df1)))

**How to multiply or divide a column by a fixed number in a Pandas ,** You can can do that either by just multiplying or dividing the columns by a number Go to my Profile and you can find all about Dental and Gum Disease material there. for what you want I would probably use the second method I described here. to merge multiple rows into 1 row separated by "|" in pandas, DataFrame? It yields an iterator which can can be used to iterate over all the rows of a dataframe in tuples. For each row it returns a tuple containing the index label and row contents as series. Let’s iterate over all the rows of above created dataframe using iterrows() i.e.

**How to multiply a column in a pandas DataFrame by a scalar in Python,** Multiplying a column in a pandas DataFrame by a scalar will make each data entry in the column equal to the product of it and the scalar. Select a column of DataFrame df using syntax df["column_name"] and set it equal to n col 0 1 1 2 2 3. Let’s convert our matrices to data frames using the function data.frame. c1 = data.frame(c) x1 = data.frame(x) Now let’s look at our data. Note that there is an extra column of numbers from 1 to 3 for both c1 and x1. This is just a feature of the data frame output in R, where it is counting the rows 1 through 3.

You may use the following syntax to sum each column and row in Pandas DataFrame: (1) Sum each column: df.sum(axis=0) (2) Sum each row: df.sum(axis=1) In the next section, you’ll see how to apply the above syntax using a simple example. Steps to Sum each Column and Row in Pandas DataFrame Step 1: Prepare your Data

To call a function for each row in an R data frame, we shall use R apply function. apply ( data_frame , 1 , function , arguments_to_function_if_any ) The second argument 1 represents rows, if it is 2 then the function would apply on columns.