## Dividing columns by colSums in R

I am trying to scale the values in a matrix so that each column adds up to one. I have tried:

m = matrix(c(1:9),nrow=3, ncol=3, byrow=T) [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 colSums(m) 12 15 18 m = m/colSums(m) [,1] [,2] [,3] [1,] 0.08333333 0.1666667 0.25 [2,] 0.26666667 0.3333333 0.40 [3,] 0.38888889 0.4444444 0.50 colSums(m) [1] 0.7388889 0.9444444 1.1500000

so obviously this doesn't work. I then tried this:

m = m/matrix(rep(colSums(m),3), nrow=3, ncol=3, byrow=T) [,1] [,2] [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000 m = colSums(m) [1] 1 1 1

so this works, but it feels like I'm missing something here. This can't be how it is routinely done. I'm certain I am being stupid here. Any help you can give would be appreciated Cheers, Davy

See `?sweep`

, eg:

> sweep(m,2,colSums(m),`/`) [,1] [,2] [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000

or you can transpose the matrix and then `colSums(m)`

gets recycled correctly. Don't forget to transpose afterwards again, like this :

> t(t(m)/colSums(m)) [,1] [,2] [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000

Or you use the function `prop.table()`

to do basically the same:

> prop.table(m,2) [,1] [,2] [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000

The time differences are rather small. the `sweep()`

function and the `t()`

trick are the most flexible solutions, `prop.table()`

is only for this particular case

Per usual, Joris has a great answer. Two others that came to mind:

#Essentially your answer f1 <- function() m / rep(colSums(m), each = nrow(m)) #Two calls to transpose f2 <- function() t(t(m) / colSums(m)) #Joris f3 <- function() sweep(m,2,colSums(m),`/`)

Joris' answer is the fastest on my machine:

> m <- matrix(rnorm(1e7), ncol = 10000) > library(rbenchmark) > benchmark(f1,f2,f3, replications=1e5, order = "relative") test replications elapsed relative user.self sys.self user.child sys.child 3 f3 100000 0.386 1.0000 0.385 0.001 0 0 1 f1 100000 0.421 1.0907 0.382 0.002 0 0 2 f2 100000 0.465 1.2047 0.386 0.003 0 0

Certainly late, but I just used the

adorn_percentages(table.with.value, denominator = "col").

More information on the following link : https://rdrr.io/cran/janitor/man/adorn_percentages.html

