## Fastest way of generating numpy arrays or randomly distributed 0s and 1s

I need to generate masks for dropout for a specific neural network. I am looking at the fastest way possible to achieve this using numpy (CPU only).

I have tried:

def gen_mask_1(size, p=0.75): return np.random.binomial(1, p, size) def gen_mask_2(size, p=0.75): mask = np.random.rand(size) mask[mask>p]=0 mask[mask!=0]=1 return mask

where `p`

is the probability of having `1`

The speed of these two approaches is comparable.

%timeit gen_mask_1(size=2048) %timeit gen_mask_2(size=2048) 45.9 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 47.4 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Are there faster methods?

**UPDATE**

Following the suggestions got so far, I have tested a few extra implementations. I couldn't get `@njit`

to work when setting `parallel=True`

(`TypingError: Failed in nopython mode pipeline (step: convert to parfors)`

), it works without but, I think, less efficiently.
I have found a python binding for Intel's `mlk_random`

(thank you @MatthieuBrucher for the tip!) here: https://github.com/IntelPython/mkl_random
So far, using mlk_random together with @nxpnsv's approach gives the best result.

@njit def gen_mask_3(size, p=0.75): mask = np.random.rand(size) mask[mask>p]=0 mask[mask!=0]=1 return mask def gen_mask_4(size, p=0.75): return (np.random.rand(size) < p).astype(int) def gen_mask_5(size): return np.random.choice([0, 1, 1, 1], size=size) def gen_mask_6(size, p=0.75): return (mkl_random.rand(size) < p).astype(int) def gen_mask_7(size): return mkl_random.choice([0, 1, 1, 1], size=size) %timeit gen_mask_4(size=2048) %timeit gen_mask_5(size=2048) %timeit gen_mask_6(size=2048) %timeit gen_mask_7(size=2048) 22.2 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 25.8 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 7.64 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 29.6 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As I said in the comment the question the implementation

def gen_mask_2(size, p=0.75): mask = np.random.rand(size) mask[mask>p]=0 mask[mask!=0]=1 return mask

can be improved, by using that comparison gives an `bool`

which then can be converted to `int`

. This removes the two comparisons with masked assignments you otherwise have, and it makes for a pretty one liner :)

def gen_mask_2(size, p=0.75): return = (np.random.rand(size) < p).astype(int)

You can make use of Numba compiler and make things faster by applying `njit`

decorator on your functions. Below is an example for a very large `size`

from numba import njit def gen_mask_1(size, p=0.75): return np.random.binomial(1, p, size) @njit(parallel=True) def gen_mask_2(size, p=0.75): mask = np.random.rand(size) mask[mask>p]=0 mask[mask!=0]=1 return mask %timeit gen_mask_1(size=100000) %timeit gen_mask_2(size=100000)

2.33 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 512 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Another option is `numpy.random.choice`

, with an input of 0s and 1s where the proportion of 1s is `p`

. For example, for `p`

= 0.75, use `np.random.choice([0, 1, 1, 1], size=n)`

:

In [303]: np.random.choice([0, 1, 1, 1], size=16) Out[303]: array([1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0])

This is faster than using `np.random.binomial`

:

In [304]: %timeit np.random.choice([0, 1, 1, 1], size=10000) 71.8 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [305]: %timeit np.random.binomial(1, 0.75, 10000) 174 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

To handle an arbitrary value for `p`

, you can use the `p`

option of `np.random.choice`

, but then the code is slower than `np.random.binomial`

:

In [308]: np.random.choice([0, 1], p=[0.25, 0.75], size=16) Out[308]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0]) In [309]: %timeit np.random.choice([0, 1], p=[0.25, 0.75], size=10000) 227 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

