Fastest way of generating numpy arrays or randomly distributed 0s and 1s

Fastest way of generating numpy arrays or randomly distributed 0s and 1s

numpy random
use numpy to generate an array of 25 random numbers sampled from a standard normal distribution
numpy random array
numpy random seed
numpy random choice
create a numpy array of length 100 containing random numbers in the range of 0, 10
create 2d numpy array with random values
python random number between 0 and 1

I need to generate masks for dropout for a specific neural network. I am looking at the fastest way possible to achieve this using numpy (CPU only).

I have tried:

def gen_mask_1(size, p=0.75):
    return np.random.binomial(1, p, size)


def gen_mask_2(size, p=0.75):
    mask = np.random.rand(size)
    mask[mask>p]=0
    mask[mask!=0]=1
    return mask

where p is the probability of having 1

The speed of these two approaches is comparable.

%timeit gen_mask_1(size=2048)
%timeit gen_mask_2(size=2048)

45.9 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
47.4 µs ± 372 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Are there faster methods?

UPDATE

Following the suggestions got so far, I have tested a few extra implementations. I couldn't get @njit to work when setting parallel=True (TypingError: Failed in nopython mode pipeline (step: convert to parfors)), it works without but, I think, less efficiently. I have found a python binding for Intel's mlk_random (thank you @MatthieuBrucher for the tip!) here: https://github.com/IntelPython/mkl_random So far, using mlk_random together with @nxpnsv's approach gives the best result.

@njit
def gen_mask_3(size, p=0.75):
    mask = np.random.rand(size)
    mask[mask>p]=0
    mask[mask!=0]=1
    return mask

def gen_mask_4(size, p=0.75):
    return (np.random.rand(size) < p).astype(int)

def gen_mask_5(size):
    return np.random.choice([0, 1, 1, 1], size=size)

def gen_mask_6(size, p=0.75):
    return (mkl_random.rand(size) < p).astype(int)

def gen_mask_7(size):
    return mkl_random.choice([0, 1, 1, 1], size=size)

%timeit gen_mask_4(size=2048)
%timeit gen_mask_5(size=2048)
%timeit gen_mask_6(size=2048)
%timeit gen_mask_7(size=2048)

22.2 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
25.8 µs ± 336 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
7.64 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.6 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

As I said in the comment the question the implementation

def gen_mask_2(size, p=0.75):
    mask = np.random.rand(size)
    mask[mask>p]=0
    mask[mask!=0]=1
    return mask

can be improved, by using that comparison gives an bool which then can be converted to int. This removes the two comparisons with masked assignments you otherwise have, and it makes for a pretty one liner :)

def gen_mask_2(size, p=0.75):
    return = (np.random.rand(size) < p).astype(int)

Generating Random Data in Python (Guide) – Real Python, Fastest way of generating numpy arrays or randomly distributed 0s and def gen_mask_1(size, p=0.75): return np.random.binomial(1, p, size)  Generating random numbers with NumPy. array([-1.03175853, 1.2867365 , -0.23560103, -1.05225393]) Generate Four Random Numbers From The Uniform Distribution


You can make use of Numba compiler and make things faster by applying njit decorator on your functions. Below is an example for a very large size

from numba import njit

def gen_mask_1(size, p=0.75):
    return np.random.binomial(1, p, size)

@njit(parallel=True)
def gen_mask_2(size, p=0.75):
    mask = np.random.rand(size)
    mask[mask>p]=0
    mask[mask!=0]=1
    return mask

%timeit gen_mask_1(size=100000)
%timeit gen_mask_2(size=100000)

2.33 ms ± 215 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
512 µs ± 25.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

numpy - Construct an array of random values between 0 and 1 , Fastest way of generating numpy arrays or randomly distributed 0s and 1s. use numpy to generate an array of 25 random numbers sampled from a standard  The random module of Numpy has functions that can generate pseudo-random numbers. 1. np.random.rand() This can be used to generate random numbers from a uniform distribution in a given shape.


Another option is numpy.random.choice, with an input of 0s and 1s where the proportion of 1s is p. For example, for p = 0.75, use np.random.choice([0, 1, 1, 1], size=n):

In [303]: np.random.choice([0, 1, 1, 1], size=16)
Out[303]: array([1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0])

This is faster than using np.random.binomial:

In [304]: %timeit np.random.choice([0, 1, 1, 1], size=10000)
71.8 µs ± 368 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [305]: %timeit np.random.binomial(1, 0.75, 10000)
174 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

To handle an arbitrary value for p, you can use the p option of np.random.choice, but then the code is slower than np.random.binomial:

In [308]: np.random.choice([0, 1], p=[0.25, 0.75], size=16)
Out[308]: array([1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0])

In [309]: %timeit np.random.choice([0, 1], p=[0.25, 0.75], size=10000)
227 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Mastering pandas: A complete guide to pandas, from installation to , The random Module; PRNGs for Arrays: numpy.random Most random data generated with Python is not fully random in the scientific sense of the word. However, it's actually about 4x faster to choose from (0, 1) and then view-cast these To sample from the multivariate normal distribution, you specify the means and  2. Random numbers. Of course, we cannot pass by the random sequences, since they are overused in testing purposes. Numpy provides a massive number of different random distributions based on different formulas. Though we will review the most common ways to create random matrices, all other methods may be used similarly:


Data Science with SQL Server Quick Start Guide: Integrate SQL , Python code example 'Construct an array of random values between 0 and 1' for Construct a 2D array of normally distributed values of mean 0 and variance 1. 1 Fastest way of generating numpy arrays or randomly distributed 0s and 1s Dec 18 '18 1 How to write a pandas df to text with multiple whitespaces May 6 '19 1 Cannot load Python module Jul 20 '15


A Cheat Sheet on Generating Random Numbers in NumPy, The random.rand function generates random values from 0 to 1 (uniform distribution) to create an array of given shape: # Creating a random array with 2 rows from a uniform distribution In [49]: np.random.rand(2, 4) Out [49]: array([[​0.06573958, This array requires no initialization and would perform faster than functions  You can also use special library functions to create arrays. For example, to create an array filled with random values between 0 and 1, use random function. This is particularly useful for problems where you need a random state to get started.


Statistical Methods for Machine Learning: Discover how to , I imported the NumPy library, which is short for Numerical Python; the library name is A numpy array must have elements of a single data type, just like R arrays. Use the random.random() function to generate uniformly-distributed numbers. np.arange(0, 20, 2) np.random.random((1, 10)) np.random.normal(0, 1, (1,  . Random number creation. . Random numpy array generation. . Normalized random numpy array generation with NumPy . Calculating mean and standard deviation of numpy array.