How to randomize (or permute) a dataframe rowwise and columnwise?

shuffle function in r package
shuffle rows of a dataframe python
randomize column in r
r random split dataframe
dataframe randomize rows
dataframe shuffle one column
random order pandas dataframe
r sort dataframe random

I have a dataframe (df1) like this.

     f1   f2   f3   f4   f5
d1   1    0    1    1    1  
d2   1    0    0    1    0
d3   0    0    0    1    1
d4   0    1    0    0    1

The d1...d4 column is the rowname, the f1...f5 row is the columnname.

To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.

Is it possible to do the randomization row-wise or column-wise?

I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.

     f1   f2   f3   f4   f5
d1   1    0    0    0    1  
d2   0    1    0    1    1
d3   1    0    0    1    1
d4   0    0    1    1    0

Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:

     f1   f2   f3   f4   f5
d1   0    1    1    1    1  <- two entries are different
d2   0    0    1    0    1  <- four entries are different
d3   1    0    0    0    1  <- two entries are different
d4   0    0    1    0    1  <- two entries are different

PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.

Given the R data.frame:

> df1
  a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0

Shuffle row-wise:

> df2 <- df1[sample(nrow(df1)),]
> df2
  a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0

By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.

Shuffle column-wise:

> df3 <- df1[,sample(ncol(df1))]
> df3
  c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0

How to shuffle the rows in a Pandas DataFrame in Python, Rearranging every row for the whole dataframe should be faster as it only has to apply the permutation once and It writes in-place. Column wise might be a last  That said, I am trying to find out if there is a way to basically do what was done in the link I posted (randomize column-wise) and apply that to rows. I was able to make this work, but only if the dataframe contains numbers only, though I want to extend the possibility to strings and such. – avidman Jul 11 '14 at 16:45

This is another way to shuffle the data.frame using package dplyr:


df2 <- slice(df1, sample(1:n()))


df2 <- sample_frac(df1, 1L)


df2 <- select(df1, one_of(sample(names(df1)))) 

How to permute the rows of a DataFrame in-place efficiently?, We can use any sort of conditions, either row-wise or column-wise, to detect and some more mathematical terms to learn: permutation and random sampling. function, we can randomly select or permute a series of rows in a dataframe. randomize permutates independently the entries in each column of a matrix-like object, to produce random data that can be used in permutation tests or bootstrap analysis. share | improve this answer | follow |

Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.

mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")

This gives:

R> out$perm[[1]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    1    1    1
[2,]    0    1    0    1    0
[3,]    0    0    0    1    1
[4,]    1    0    0    0    1
R> out$perm[[2]]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    1    1
[2,]    0    0    0    1    1
[3,]    1    0    0    1    0
[4,]    0    0    1    0    1

To explain the call:

out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
  1. times is the number of randomised matrices you want, here 99
  2. burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
  3. thin says only take a random draw every thin swaps
  4. mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.

A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.

Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.

Hands-On Exploratory Data Analysis with Python: Perform EDA , Create a vector v <- 11:20 # Randomize the order of the vector v <- sample(v) # Create a data frame data <- data.frame(label=letters[1:5], number=11:15) data  Sort the matrix row-wise and column-wise; Maximum sum path in a Matrix; Minimum cost to reach from the top-left to the bottom-right corner of a matrix; Submatrix of given size with maximum 1's; Construct a Doubly linked linked list from 2D Matrix; Check whether a Matrix is a Latin Square or not; Program to reverse the rows in a 2d Array

you can also use the randomizeMatrix function in the R package picante


test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
     [,1] [,2] [,3] [,4]
[1,]    1    0    1    0
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "frequency",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    0    0
[3,]    1    0    1    0
[4,]    1    0    1    0

randomizeMatrix(test,null.model = "richness",iterations = 1000)

     [,1] [,2] [,3] [,4]
[1,]    1    0    0    1
[2,]    1    1    0    1
[3,]    0    0    0    0
[4,]    1    0    1    0

The option null.model="frequency" maintains column sums and richness maintains row sums. Though mainly used for randomizing species presence absence datasets in community ecology it works well here.

This function has other null model options as well, check out following link for more details (page 36) of the picante documentation

Randomizing order, set.seed(42); Next, you use the sample() function to shuffle the row indices of the dataframe(df). You can later use these indices to reorder the  Re: Randomizing a dataframe In reply to this post by mtb954 Here is one approach (there are others, some that are probably better, but this can get you started): 1. rearrange your data so that every insect is a single row with 2 columns: the tree id and the species (this new dataset will have as many rows as the sum of the values in the old dataset).

Of course you can sample each row:

sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))

will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P

How to shuffle a dataframe in R by rows - Sudarshini Tyagi, We can randomly shuffle DataFrame rows in Pandas using sample(), shuffle(), and numpy.random.permutation() to shuffle Pandas DataFrame Rows We set the axis parameter to 0 as we need to sample elements from row-wise, which Here, the drop=True option prevents the index column from being  I have a dataframe with 9000 rows and 6 columns. I want to make the order of rows random i.e. some kind of shuffling to produce another dataframe with the same data but the rows in random order.

How to randomly shuffle DataFrame rows in Pandas, So, the count of 1 is conserved for the whole dataframe but not for each row or each column. Is it possible to do the randomization row-wise or column-wise? Assign new column to dataframe in pandas; Group a dataframe in pandas; Sort the List in python; sort a dataframe in pandas; sort a dataframe in pandas by index; Cross tab in pandas; Rank the dataframe in pandas; Drop the duplicate row in pandas; Find the duplicate rows in pandas; Drop the row in pandas with conditions; Drop or delete column in pandas

shuffling/permutating a DataFrame in pandas, Random.permutation function randomly permutes a sequence. So, using random​.permuation function of numpy: In [1]: df = pd. DataFrame, under the hood, uses NumPy ndarray as data holder.(You can check from DataFrame source code). So if you use np.random.shuffle(), it would shuffles the array along the first axis of a multi-dimensional array.

how to randomly shuffle the row elements of a predefined matrix , how to randomly shuffle the row elements of a predefined matrix?? Follow. 989 views I only want to randomly permute one dimension of that array. Could you please help me to randomly arrange the elements in each column. Sign in to​  Invert the row order in R – Reverse the dataframe order row wise. Inverting the row order in R is done using order () and nrow () function as shown below. so the resultant dataframe will be in inverted order.