## R: rank individual column data in a large dataframe or matrix

r rank by group

r rank descending

r rank multiple columns

difference between sort and order in r

order function in r

dplyr rank multiple columns

how to find top 5 values in r

I have a large file of patient data that I want to rank based on column values (without changing the order of the data). For example

patient<-c("a", "b", "c", "d", "e","f") gene1<-c(500, 490, 500, 750, 550, 500) gene2<-c(200, 470, 1000, 50, 720, 1100) x<-data.frame(patient,gene1,gene2) x patient gene1 gene2 1 a 500 200 2 b 490 470 3 c 500 1000 4 d 750 50 5 e 550 720 6 f 500 1100

I want to get something like this...

x patient gene1 gene2 1 a 2 2 2 b 1 3 3 c 6 5 4 d 5 1 5 e 4 4 6 f 3 6

I can do this for individual columns using something similar to the below code, but I have thousands of columns worth of patient data to deal with, so this is unrealistic.

x <- read.csv("data.csv", row.names = "Patient") order.scores<-order(x$gene1,x) x$rank <- NA x$rank[order.scores] <- 1:nrow(x)

Can anyone suggest a suitable function? Thanks!

here's one way using `dplyr`

package. This will rank all columns from 2nd to last assuming first column is always `patient`

.

Also you need to pass `ties.method = "first"`

argument to `rank`

which means that ties are broken by whichever value appears first.

library(dplyr) x %>% mutate_at(2:ncol(.), rank, ties.method = "first") patient gene1 gene2 1 a 2 2 2 b 1 3 3 c 3 5 4 d 6 1 5 e 5 4 6 f 4 6

**R: rank vs. order,** If you're learning R you've come across the sort, rank and order functions. of data frame by column values, whether it's a single column or multiple a data frame (or matrix) using the square brackets with a Row, Column df <- data.frame(item = rep(c('a','b','c'), 3), year = rep(c('2010','2011','2012'), each=3), count = c(1,4,6,3,8,3,5,7,9)) And I would like to add a "year.rank" column, which gives an item's rank within a given year, where a higher count leads to a higher "rank". With the above, it would look like:

This code would allow you to loop through the columns

for (i in 2:length(colnames(x))) { x[,i] <- rank(x[,i]) }

and yields this result:

patient gene1 gene2 1 a 3 2 2 b 1 3 3 c 3 5 4 d 6 1 5 e 5 4 6 f 3 6

Or

for (i in 2:length(colnames(x))) { x[,i] <- order(x[,i]) }

yields

patient gene1 gene2 1 a 2 4 2 b 1 1 3 c 3 2 4 d 6 5 5 e 5 3 6 f 4 6

**rankings function,** Create a "rankings" object from data or convert a matrix of rankings or ordered items to a "rankings" object. a data frame with columns specified by id , item and rank . id. an index of if TRUE return single row/column matrices as a vector. Similar to base::rank but much faster. And it accepts vectors, lists, data.frames or data.tables as input. In addition to the ties.method possibilities provided by base::rank, it also provides ties.method="dense". Like forder, sorting is done in "C-locale"; in particular, this may affect how capital/lowercase letters are ranked. See Details on forder for more. bit64::integer64 type is also

Try out:

library(dplyr) x %>% mutate_at(vars(starts_with("gene")), rank, ties.method = "first") # or x %>% mutate_at(vars(contains("gene")), rank, ties.method = "first")

**frank: Fast rank in data.table: Extension of `data.frame`,** In addition to the ties.method possibilities provided by base::rank, it also provides in particular, this may affect how capital/lowercase letters are ranked. To sort by a column in descending order prefix "-" , e.g., frank(x, a, -b, c) . 4, NA, 1, NA, 4) # NAs are considered identical (unlike base R) # default is average frankv(x) A data frame is a list of vectors which are of equal length. A matrix contains only one type of data, while a data frame accepts different data types (numeric, character, factor, R Data Frame: Create, Append, Select, Subset

**rowRanks: Gets the rank of the elements in each row (column) of a ,** Gets the rank of the elements in each row (column) of a matrix. Details Value Missing values Performance Author(s) See Also. View source: R/rowRanks.R R provides a variety of methods for summarising data in tabular and other forms. View data structure. Before you do anything else, it is important to understand the structure of your data and that of any objects derived from it.

**Chapter 2 R basics,** But then you remember that the US is a large and diverse country with 50 very Matrices are another type of object that are common in R. Matrices are similar to data You can also use single square brackets ( [ ) to access rows and columns of a data frame: Say we want to rank the states from least to most gun murders. average: average rank of the group. min: lowest rank in the group. max: highest rank in the group. first: ranks assigned in order they appear in the array. dense: like ‘min’, but rank always increases by 1 between groups. numeric_only bool, optional. For DataFrame objects, rank only numeric columns if set to True. na_option {‘keep’, ‘top’, ‘bottom’}, default ‘keep’

**Basic Statistical Analysis Using the R Statistical Package,** For our basic applications, matrices representing data sets (where columns object (this is helpful with larger data sets when the print out extends over several lines). For an analysis of a single variable, with a small number of observations, it is An R dataframe can be viewed and edited as a spreadsheet within R using So you specify the data frame, followed by a dollar sign and then the name of the variable. You don’t have to surround the variable name by quotation marks (as you would when you use the indices). R will return a vector with all the values contained in that variable. Note again that the row names are dropped here.

##### Comments

- your ranks for
`gene1`

seem wrong. - great, thank you! In my data I had actually specified row.names = "patient", so I just switched the code to 1:ncol(.)
- In that case you can use
`mutate_all()`

- thanks that worked! My gene list is only 33 genes, so working that into the code is no problem. It would be useful in other circumstances (such as large gene lists) to use a code that ranks for all genes without having to specify the gene names within the code. Any ideas?
- You can
`mutate_at`

variables starting on containing`gene`