2D boolean mask in numpy yields different results (mask ordering vs. original indices)

numpy boolean mask
numpy mask 2d array
numpy boolean indexing
python mask array
python boolean mask list
numpy boolean array
convert int array to boolean python
numpy where

I am playing around with different indexing methods. I have the following working example:

import numpy as np

x = np.random.rand(321,321)
a = range(0, 300)
b = range(1, 301)
mask = np.zeros(x.shape, dtype=bool)
# a and b are lists 
mask[a, b] = True
assert x[a, b].shape == x[mask].shape  # passes
assert np.isclose(np.sum(x[mask]), np.sum(x[a, b]))  # passes
assert np.allclose(x[mask], x[a, b])  # fails sometimes

When I try it with a different x for a project, the last assertion fails. Here is a failing case:

import numpy as np

x = np.random.rand(431,431)
a = [0, 1, 1, 1, 2, 2, 2, 3]
b = [1, 0, 2, 4, 3, 1, 11, 2]

mask = np.zeros(x.shape, dtype=bool)
# a and b are lists 
mask[a, b] = True
assert x[a, b].shape == x[mask].shape  # passes
assert np.isclose(np.sum(x[mask]), np.sum(x[a, b]))  # passes
assert np.allclose(x[mask], x[a, b])  # fails

Can anyone explain why this error occurs? I assume it's because mask is indexing into x differently from (a,b), but not sure how.

I want to do this because I'd like to easily get x[~mask]

Any insight would be appreciated!

The problem with your example lies in how you defined a and b. If you were to print out x[a, b] and x[mask] you would notice that the 5th and 6th elements on x[a, b] would be switched with the 5th and 6th values in x[mask]. The reason for this is that you set every value in mask to True using a and b to index so order didn't matter but you're using a and b to index x in your assertion so order matters there. When you do your index, numpy is taking each value from a to get the appropriate row from your matrix and using the value in the same index on b to index into that row. To illustrate using a 3x8 array:

a = [0, 1, 1, 1, 2, 2, 2]
b = [1, 0, 2, 4, 3, 1, 7]

x = [[1, 2, 3, 4, 5, 6, 7, 8],
    [9, 10, 11, 12, 13, 14, 15, 16],
    [17, 18, 19, 20, 21, 22, 23, 24]]

x[a, b] = [2, 9, 11, 13, 20, 18, 24]
mask[a, b] = [2, 9, 11, 13, 18, 20, 24]

A good way to fix this would be to first define a and b as a list of tuples, sort them on their "a-value" first and then on their "b-value" and use them from there. That way you can guarantee the order.

Comparisons, Masks, and Boolean Logic, Masking comes up when you want to extract, modify, count, or otherwise In NumPy, Boolean masking is often the most efficient way to accomplish these in the same way, we can use other ufuncs to do element-wise comparisons The result of these comparison operators is always an array with a Boolean data type. Notice that the corresponding bits of the binary representation are compared in order to yield the result. When you have an array of Boolean values in NumPy, this can be thought of as a string of bits where 1 = True and 0 = False , and the result of & and | operates similarly to above:

x[a, b] selects elements from x in the order given by a and b. x[a[i], b[i]] will come before x[a[i+1], b[i+1]] in the result.

x[mask] selects elements in the order given by iterating over mask in row-major order to find True cells. This is only the same order as x[a, b] if zip(a, b) is already lexicographically sorted.

In your failing example, 2, 3 comes before 2, 1 in a and b, but iterating over mask in row-major order will find the True at 2, 1 before 2, 3. Thus, x[mask] has x[2, 1] before x[2, 3], while x[a, b] has those elements the other way around.

Numerical & Scientific Computing with Python: Boolean Masking of , NumPy: Boolean Masking of Arrays. The results of these tests are the Boolean elements of the result array. Of course, it is also possible to check on "<", "<="  We will index an array C in the following example by using a Boolean mask. It is called fancy indexing, if arrays are indexed by using boolean or integer arrays (masks). The result will be a copy and not a view. In our next example, we will use the Boolean mask of one array to select the corresponding elements of another array.

As @hpaulj mentioned the order of the arrays is different:

import numpy as np
np.random.seed(42)

x = np.random.rand(431,431)
a = [0, 1, 1, 1, 2, 2, 2, 3]
b = [1, 0, 2, 4, 3, 1, 11, 2]

mask = np.zeros(x.shape, dtype=bool)
# a and b are lists
mask[a, b] = True

print(x[mask])
print(x[a, b])

Output

[0.95071431 0.76151063 0.10112268 0.70096913 0.44076275 0.55964033
 0.40873417 0.20015024]
[0.95071431 0.76151063 0.10112268 0.70096913 0.55964033 0.44076275
 0.40873417 0.20015024]

The reason is that the mask returns in in row-major (C-style) order (see docs) and as for multidimensional indexing:

if the index arrays have a matching shape, and there is an index array for each dimension of the array being indexed, the resultant array has the same shape as the index arrays, and the values correspond to the index set for each position in the index arrays.

In your case the order from the multidimensional indexing is:

[(0, 1), (1, 0), (1, 2), (1, 4), (2, 3), (2, 1), (2, 11), (3, 2)]

and from the mask is:

[(0, 1), (1, 0), (1, 2), (1, 4), (2, 1), (2, 3), (2, 11), (3, 2)]

Masked array operations, Return the mask of a masked array, or full boolean array of False. ma.getdata (a[ Return the indices of unmasked elements that are not zero. ma.shape (obj). Indexing and slicing are quite handy and powerful in NumPy, but with the booling mask it gets even better! Let's start by creating a boolean array first. Note that there is a special kind of array in NumPy named a masked array. Here, we are not talking about it but we're also going to explain how to extend indexing and slicing with NumPy Arrays:

Indexing numpy arrays, The simplest way to pick one or some elements of an array looks very similar to python lists: That is, to pick out a particular element, you simply put the indices into array contains (some of) the elements of the original in reverse order: in the original array, look into using numpy's "masked array" tools. Return the mask of a masked array, or full boolean array of False. ma.getdata (a[, subok]) Return the data of a masked array as an ndarray. ma.nonzero (self) Return the indices of unmasked elements that are not zero. ma.shape (obj) Return the shape of an array. ma.size (obj[, axis]) Return the number of elements along a given axis. ma.is_masked (x)

4. NumPy Basics: Arrays and Vectorized Computation, One of the key features of NumPy is its N-dimensional array object, or ndarray, which In addition to np.array , there are a number of other functions for creating new arrays. Thus, comparing names with the string 'Bob' yields a boolean array​: False, False], dtype=bool) In [95]: data[mask] Out[95]: array([[-0.048 , 0.5433,  The main feature of the numpy.ma module is the MaskedArray class, which is a subclass of numpy.ndarray. The class, its attributes and methods are described in more details in the MaskedArray class section. The numpy.ma module can be used as an addition to numpy: >>>. >>> import numpy as np >>> import numpy.ma as ma.

masked_arrays, When working with real oceanographic data sets, there are often gaps. Numpy, however, provides an alternative way to handle missing data: the Suppose you are using a library that reads a file (e.g., netCDF) and returns the results as a masked array. Or, perhaps it might return either an ndarray or a masked array. I would like to add an example in the documentation of np.ma.polyfit to demonstrate that the 2D mask is collapsed on 1D mask before doing the polyfit of a masked array.

Comments
  • Could be because x has Fortran ordering?
  • Do you here change the mask? Since right now x[mask] will be an empty array.
  • I updated the question to be more precise.
  • I added an explicit failing case
  • The order of elements in x[mask] and x[a,b] are different. With only 8 elements you can easily print and compare them. No need to depend on tests that hide the details. You could also see the difference by comparing np.where(mask) and (a,b).