## Numpy shuffle multidimensional array by row only, keep column order unchanged

numpy shuffle two arrays the same way
np.random.shuffle returns none
numpy random permutation
numpy shuffle columns
np.random.shuffle seed
numpy shuffle not in place
np.random.shuffle not working
shuffle rows of two numpy arrays

How can I shuffle a multidimensional array by row only in Python (so do not shuffle the columns).

I am looking for the most efficient solution, because my matrix is very huge. Is it also possible to do this highly efficient on the original array (to save memory)?

Example:

```import numpy as np
X = np.random.random((6, 2))
print(X)
Y = ???shuffle by row only not colls???
print(Y)
```

What I expect now is original matrix:

```[[ 0.48252164  0.12013048]
[ 0.77254355  0.74382174]
[ 0.45174186  0.8782033 ]
[ 0.75623083  0.71763107]
[ 0.26809253  0.75144034]
[ 0.23442518  0.39031414]]
```

Output shuffle the rows not cols e.g.:

```[[ 0.45174186  0.8782033 ]
[ 0.48252164  0.12013048]
[ 0.77254355  0.74382174]
[ 0.75623083  0.71763107]
[ 0.23442518  0.39031414]
[ 0.26809253  0.75144034]]
```

That's what `numpy.random.shuffle()` is for :

```>>> X = np.random.random((6, 2))
>>> X
array([[ 0.9818058 ,  0.67513579],
[ 0.82312674,  0.82768118],
[ 0.29468324,  0.59305925],
[ 0.25731731,  0.16676408],
[ 0.27402974,  0.55215778],
[ 0.44323485,  0.78779887]])

>>> np.random.shuffle(X)
>>> X
array([[ 0.9818058 ,  0.67513579],
[ 0.44323485,  0.78779887],
[ 0.82312674,  0.82768118],
[ 0.29468324,  0.59305925],
[ 0.25731731,  0.16676408],
[ 0.27402974,  0.55215778]])
```

How to randomly shuffle an array in python using numpy, To randomly shuffle a 1D array in python, there is the numpy function multidimensional array by row only, keep column order unchanged� So, it seems using these np.take based could be used only if memory is a concern or else np.random.shuffle based solution looks like the way to go. Answer 3 After a bit experiment i found most memory and time efficient way to shuffle data(row wise) of nd-array is, shuffle the index and get the data from shuffled index

You can also use `np.random.permutation` to generate random permutation of row indices and then index into the rows of `X` using `np.take` with `axis=0`. Also, `np.take` facilitates overwriting to the input array `X` itself with `out=` option, which would save us memory. Thus, the implementation would look like this -

```np.take(X,np.random.permutation(X.shape),axis=0,out=X)
```

Sample run -

```In : X
Out:
array([[ 0.60511059,  0.75001599],
[ 0.30968339,  0.09162172],
[ 0.14673218,  0.09089028],
[ 0.31663128,  0.10000309],
[ 0.0957233 ,  0.96210485],
[ 0.56843186,  0.36654023]])

In : np.take(X,np.random.permutation(X.shape),axis=0,out=X);

In : X
Out:
array([[ 0.14673218,  0.09089028],
[ 0.31663128,  0.10000309],
[ 0.30968339,  0.09162172],
[ 0.56843186,  0.36654023],
[ 0.0957233 ,  0.96210485],
[ 0.60511059,  0.75001599]])
```

Here's a trick to speed up `np.random.permutation(X.shape)` with `np.argsort()` -

```np.random.rand(X.shape).argsort()
```

Speedup results -

```In : X = np.random.random((6000, 2000))

In : %timeit np.random.permutation(X.shape)
1000 loops, best of 3: 510 µs per loop

In : %timeit np.random.rand(X.shape).argsort()
1000 loops, best of 3: 297 µs per loop
```

Thus, the shuffling solution could be modified to -

```np.take(X,np.random.rand(X.shape).argsort(),axis=0,out=X)
```

Runtime tests -

These tests include the two approaches listed in this post and `np.shuffle` based one in `@Kasramvd's solution`.

```In : X = np.random.random((6000, 2000))

In : %timeit np.random.shuffle(X)
10 loops, best of 3: 25.2 ms per loop

In : %timeit np.take(X,np.random.permutation(X.shape),axis=0,out=X)
10 loops, best of 3: 53.3 ms per loop

In : %timeit np.take(X,np.random.rand(X.shape).argsort(),axis=0,out=X)
10 loops, best of 3: 53.2 ms per loop
```

So, it seems using these `np.take` based could be used only if memory is a concern or else `np.random.shuffle` based solution looks like the way to go.

np.random.shuffle(array), 问题How can I shuffle a multidimensional array by row only in Python (so do Numpy shuffle multidimensional array by row only, keep column order unchanged. To randomly shuffle a 1D array in Numpy shuffle multidimensional array by row only, keep column order unchanged: randomly shuffle an array in python using

After a bit experiment i found most memory and time efficient way to shuffle data(row wise) of nd-array is, shuffle the index and get the data from shuffled index

```rand_num2 = np.random.randint(5, size=(6000, 2000))
perm = np.arange(rand_num2.shape)
np.random.shuffle(perm)
rand_num2 = rand_num2[perm]
```

in more detailsHere, I am using memory_profiler to find memory usage and python's builtin "time" module to record time and comparing all previous answers

```def main():
# shuffle data itself
rand_num = np.random.randint(5, size=(6000, 2000))
start = time.time()
np.random.shuffle(rand_num)
print('Time for direct shuffle: {0}'.format((time.time() - start)))

# Shuffle index and get data from shuffled index
rand_num2 = np.random.randint(5, size=(6000, 2000))
start = time.time()
perm = np.arange(rand_num2.shape)
np.random.shuffle(perm)
rand_num2 = rand_num2[perm]
print('Time for shuffling index: {0}'.format((time.time() - start)))

# using np.take()
rand_num3 = np.random.randint(5, size=(6000, 2000))
start = time.time()
np.take(rand_num3, np.random.rand(rand_num3.shape).argsort(), axis=0, out=rand_num3)
print("Time taken by np.take, {0}".format((time.time() - start)))
```

Result for Time

```Time for direct shuffle: 0.03345608711242676   # 33.4msec
Time for shuffling index: 0.019818782806396484 # 19.8msec
Time taken by np.take, 0.06726956367492676     # 67.2msec
```

Memory profiler Result

```Line #    Mem usage    Increment   Line Contents
================================================
39  117.422 MiB    0.000 MiB   @profile
40                             def main():
41                                 # shuffle data itself
42  208.977 MiB   91.555 MiB       rand_num = np.random.randint(5, size=(6000, 2000))
43  208.977 MiB    0.000 MiB       start = time.time()
44  208.977 MiB    0.000 MiB       np.random.shuffle(rand_num)
45  208.977 MiB    0.000 MiB       print('Time for direct shuffle: {0}'.format((time.time() - start)))
46
47                                 # Shuffle index and get data from shuffled index
48  300.531 MiB   91.555 MiB       rand_num2 = np.random.randint(5, size=(6000, 2000))
49  300.531 MiB    0.000 MiB       start = time.time()
50  300.535 MiB    0.004 MiB       perm = np.arange(rand_num2.shape)
51  300.539 MiB    0.004 MiB       np.random.shuffle(perm)
52  300.539 MiB    0.000 MiB       rand_num2 = rand_num2[perm]
53  300.539 MiB    0.000 MiB       print('Time for shuffling index: {0}'.format((time.time() - start)))
54
55                                 # using np.take()
56  392.094 MiB   91.555 MiB       rand_num3 = np.random.randint(5, size=(6000, 2000))
57  392.094 MiB    0.000 MiB       start = time.time()
58  392.242 MiB    0.148 MiB       np.take(rand_num3, np.random.rand(rand_num3.shape).argsort(), axis=0, out=rand_num3)
59  392.242 MiB    0.000 MiB       print("Time taken by np.take, {0}".format((time.time() - start)))
```

numpy.ndarray — NumPy v1.19 Manual, An array object represents a multidimensional, homogeneous array of fixed-size items. An associated Row-major (C-style) or column-major (Fortran-style) order . Create an array, but leave its allocated memory unchanged (i.e., it contains “ garbage”). dtype If buffer is None, then only shape , dtype , and order are used. numpy.random.shuffle() “Modify a sequence in-place by shuffling its contents. This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same”. From the documentation.

You can shuffle a two dimensional array `A` by row using the `np.vectorize()` function:

```shuffle = np.vectorize(np.random.permutation, signature='(n)->(n)')

A_shuffled = shuffle(A)
```

numpy.ndarray — NumPy v1.20.dev0 Manual, An array object represents a multidimensional, homogeneous array of fixed-size items. Row-major (C-style) or column-major (Fortran-style) order. See also. array Create an array, but leave its allocated memory unchanged (i.e., it contains “garbage”). dtype If buffer is None, then only shape , dtype , and order are used. Here's one way avoid loops completely and build the required array: Given an array X with n columns, construct an array Y with n copies of X. Create a mask to select the i-th column from the i-th copy of X in the array Y. Reassign a column-shuffled copy of X to the relevant indices of Y using the mask on Y. In NumPy it looks like this:

I tried many solutions, and at the end I used this simple one:

```from sklearn.utils import shuffle
x = np.array([[1, 2],
[3, 4],
[5, 6]])
print(shuffle(x, random_state=0))
```

output:

```[
[5 6]
[3 4]
[1 2]
]
```

if you have 3d array, loop through the 1st axis (axis=0) and apply this function, like:

```np.array([shuffle(item) for item in 3D_numpy_array])
```

Turn numpy array into df, Y: If you have a NumPy array which is essentially a row vector (or column vector) shuffle multidimensional array by row only, keep column order unchanged. currently im facing a problem regarding the permutation of 2 numpy arrays of different row sizes, i know how to to utilize the np.random.shuffle function but i cannot seem to find a solution to my specific problem, the examples from the numpy documentation only refers to nd arrays with the same row sizes, e.g x.shape= y.shape=

4. NumPy Basics: Arrays and Vectorized Computation, ndarray , a fast and space-efficient multidimensional array providing vectorized Linear algebra, random number generation, and Fourier transform capabilities It's often only necessary to care about the general kind of data you're dealing with , Setting whole rows or columns using a 1D boolean array is also easy: numpy.random. shuffle (x) ¶ Modify a sequence in-place by shuffling its contents. This function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same.

Look Ma, No For-Loops: Array Programming With NumPy – Real , I might be biased towards looking at 2D & 3D numpy arrays as having axis=0= rows and axis=1=columns (the same for 2D in pandas DataFrames). So while you'd� So you could use numpy.random.permutation function to generate the index array and use it to shuffle multiple arrays. For example def randomize (a, b): # Generate the permutation index array. permutation = np . random . permutation(a . shape) # Shuffle the arrays by giving the permutation in the square brackets. shuffled_a = dataset

User Divakar, strongest skill. Nay loops , Yay MATLAB bsxfun / NumPy Broadcasting Numpy shuffle multidimensional array by row only, keep column order unchanged. order: {‘K’, ‘A’, ‘C’, ‘F’}, optional. Specify the memory layout of the array. If object is not an array, the newly created array will be in C order (row major) unless ‘F’ is specified, in which case it will be in Fortran order (column major). If object is an array the following holds.

• Option 2: shuffle array in place. `np.random.shuffle(x)`, docs state that "this function only shuffles the array along the first index of a multi-dimensional array", which is good enough for you, right? Obv., some time taken at startup, but from that point, it's as fast as original matrix.
• Compare to `np.random.shuffle(x)`, shuffling index of nd-array and getting data from shuffled index is more efficient way to solve this problem. For more details comparision refer my answer bellow
• I completely agree. I just realized that you are using `np.random` instead of the Python `random` module which also contains a shuffle function. I'm sorry for causing confusion.