Why does NumPy's random function seemingly display a pattern in its generated values?

numpy install
numpy documentation
numpy methods
numpy python 3
numpy array
numpy website
numpy basics
does python 3.7 come with numpy

I was playing around with NumPy and Pillow and came across an interesting result that apparently showcases a pattern in NumPy random.random() results.

Here a sample of the full code for generating and saving 100 of these images (with seed 0), the above are the first four images generated by this code.

import numpy as np
from PIL import Image

np.random.seed(0)
img_arrays = np.random.random((100, 256, 256, 3)) * 255
for i, img_array in enumerate(img_arrays):
    img = Image.fromarray(img_array, "RGB")
    img.save("{}.png".format(i))

The above are four different images created using PIL.Image.fromarray() on four different NumPy arrays created using numpy.random.random((256, 256, 3)) * 255 to generate a 256 by 256 grid of RGB values in four different Python instances (the same thing also happens in the same instance).

I noticed that this only happens (in my limited testing) when the width and height of the image is a power of two, I am not sure how to interpret that.

Although it may be hard to see due to browser anti-aliasing (you can download the images and view them in image viewers with no anti-aliasing), there are clear purple-brown columns of pixels every 8th column starting from the 3rd column of every image. To make sure, I tested this on 100 different images and they all followed this pattern.

What is going on here? I am guessing that patterns like this are the reason that people always say to use cryptographically secure random number generators when true randomness is required, but is there a concrete explanation behind why this is happening in particular?

Don't blame Numpy, blame PIL / Pillow. ;) You're generating floats, but PIL expects integers, and its float to int conversion is not doing what we want. Further research is required to determine exactly what PIL is doing...

In the mean time, you can get rid of those lines by explicitly converting your values to unsigned 8 bit integers:

img_arrays = (np.random.random((100, 256, 256, 3)) * 255).astype(np.uint8)

As FHTMitchell notes in the comments, a more efficient form is

img_arrays = np.random.randint(0, 256, (100, 256, 256, 3), dtype=np.uint8) 

Here's typical output from that modified code:


The PIL Image.fromarray function has a known bug, as described here. The behaviour you're seeing is probably related to that bug, but I guess it could be an independent one. ;)

FWIW, here are some tests and workarounds I did on the bug mentioned on the linked question.

What is NumPy?, NumPy is one of the most powerful Python libraries. This article will outline the core features of the NumPy library. It will also provide an overview of the common​ 

I'm pretty sure the problem is to do with the dtype, but not for the reasons you think. Here is one with np.random.randint(0, 256, (1, 256, 256, 3), dtype=np.uint32) note the dtype is not np.uint8:

Can you see the pattern ;)? PIL interprets 32 bit (4 byte) values (probably as 4 pixels RGBK) differently from 8 bit values (RGB for one pixel). (See PM 2Ring's answer).

Originally you were passing 64 bit float values, these are going to also are interpreted differently (and probably incorrectly from how you intended).

NumPy, The ndarray (NumPy Array) is a multidimensional array used to store values of same datatype. These arrays are indexed just like Sequences, starts with zero.

The Python Docs for random() say this:

Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

The best random number generators pass randomness tests, lesser quality random number generators are often used because they are quick and deemed 'good enough'.

In "Some Difficult-to-Pass Tests of Randomness" Jan 2002, by Marsaglia and Tsang, they determined that a subset of the "Diehard Battery of Tests" could be used to assess the randomness of a series of numbers, specifically the gcd, gorilla and birthday spacings tests. See "Dieharder test descriptions" for a discussion of entropy and comments on those tests.

Over at our Programming Puzzles and Golf Code some people took a shot at developing code to pass the Diehard tests in this question: "Build a random number generator that passes the Diehard tests".

You should expect to see patterns in all but the best (and likely slower) RNGs.

The modern standard for statistical testing of RNGs, "NIST SP 800-22 - Recommendation for Random Number Generation Using Deterministic Random Bit Generators" (Overview) provides a series of tests which amongst other things assesses the closeness of the fraction of ones to ½, that is, the number of ones and zeroes in a sequence should be about the same.

An article published on the ACM website "Algorithm 970: Optimizing the NIST Statistical Test Suite and the Berlekamp-Massey Algorithm" January 2017, by Sýs, Říha and Matyáš, promises an enormous speedup of the NIST algorithms with their reimplantation.

Why Should We Use NumPy? - FinTechExplained, NumPy in Python | Set 1 (Introduction). This article will help you get acquainted with the widely used array-processing library in Python, NumPy. What is NumPy?

In Python, what is NumPy? How is it used?, Questions may be asked as why NumPy? Isn't python lists or other data structures can do the same things? Well Yes and No, there is nothing in 

NumPy in Python, It provides a high-performance multidimensional array object, and tools for working with these arrays. A numpy array is a grid of values, all of the same type, and is 

A hitchhiker guide to python NumPy Arrays, NumPy is a module for Python. The name is an acronym for Both NumPy and SciPy are not part of a basic Python installation. They have to be installed after 

Comments
  • Can you remake them with a set random seed, otherwise ours will be different.
  • Did you try different aspect ratios, do the also have the pattern? When you compare the data, do the statistics show a pattern?
  • @FHTMitchell This occurs with all seeds, I ran the 100 trial run multiple times and they all generated the same purple "bars".
  • One thing I notice is that you have a float array. What happens if you do np.random.randint(0, 256, (100, 256, 256, 3))?
  • @ZiyadEdher thats becuase we should set dtype=np.unit8 to make sure they are interpreted as individual bytes. See my answer.
  • or np.random.randint(0, 256, (100, 256, 256, 3), dtype=np.unit8)
  • Wow! That's very interesting, I noticed the pattern is only appearing when the width and height of the image are powers of two, what could be causing that?
  • Also, I would expect the float to int conversion to either truncate or round or something along those lines, why would that be causing a pattern?
  • @ZiyadEdher The power of two thing is because it happens every 8th pixel. Hence the columns show up if multiples of 8 always occur in the same column, so the width must be a multiple of 8 (of which every power of 2 above 2 is).
  • @FHTMitchell That makes a lot of sense! I am really interested in why non-byte data is handled so badly by PIL, I guess I will look into that.
  • Aha, that makes a lot of sense. I find it quite odd the way PIL handles these values, I wonder what exactly is happening in the 8 byte case to produce sensible random images but with a slight pattern.
  • PIL's Image.fromarray seems to be totally reliable when working with simple byte data, but I definitely wouldn't trust it to do the Right Thing in other cases. It even has trouble unpacking bitmap data from bytes correctly.