Counting the number of non-NaN elements in a numpy ndarray in Python

numpy replace nan with 0
numpy count
numpy isnan
number of elements in numpy array
number of non-zero elements in array python
number of non-zero elements in numpy array
numpy is not nan
python count 2d array

I need to calculate the number of non-NaN elements in a numpy ndarray matrix. How would one efficiently do this in Python? Here is my simple code for achieving this:

import numpy as np

def numberOfNonNans(data):
    count = 0
    for i in data:
        if not np.isnan(i):
            count += 1
    return count 

Is there a built-in function for this in numpy? Efficiency is important because I'm doing Big Data analysis.

Thnx for any help!

np.count_nonzero(~np.isnan(data))

~ inverts the boolean matrix returned from np.isnan.

np.count_nonzero counts values that is not 0\false. .sum should give the same result. But maybe more clearly to use count_nonzero

Testing speed:

In [23]: data = np.random.random((10000,10000))

In [24]: data[[np.random.random_integers(0,10000, 100)],:][:, [np.random.random_integers(0,99, 100)]] = np.nan

In [25]: %timeit data.size - np.count_nonzero(np.isnan(data))
1 loops, best of 3: 309 ms per loop

In [26]: %timeit np.count_nonzero(~np.isnan(data))
1 loops, best of 3: 345 ms per loop

In [27]: %timeit data.size - np.isnan(data).sum()
1 loops, best of 3: 339 ms per loop

data.size - np.count_nonzero(np.isnan(data)) seems to barely be the fastest here. other data might give different relative speed results.

How do I count numpy nans?, import numpy as np ! python --version. data = np. ones((10000,10000)); data, data. shape. random_rows = [np. random. randint(0,10000,100)] random_columns = [np. random. data[random_rows, random_columns] = np. nan. np. count_nonzero(np. isnan(data)) np. count_nonzero(~np. isnan(data)) Count the number of elements satisfying the condition for each row and column of ndarray. np.count_nonzero() for multi-dimensional array counts for each axis (each dimension) by specifying parameter axis. In the case of a two-dimensional array, axis=0 gives the count per column, axis=1 gives the count per row. By using this, you can count the number of elements satisfying the conditions for each row and column.

Quick-to-write alterantive

Even though is not the fastest choice, if performance is not an issue you can use:

sum(~np.isnan(data)).

Performance:
In [7]: %timeit data.size - np.count_nonzero(np.isnan(data))
10 loops, best of 3: 67.5 ms per loop

In [8]: %timeit sum(~np.isnan(data))
10 loops, best of 3: 154 ms per loop

In [9]: %timeit np.sum(~np.isnan(data))
10 loops, best of 3: 140 ms per loop

NumPy: Count the number of elements satisfying the condition, A method of counting the number of elements satisfying the For the entire ndarray For each row and column of ndarray Check if numpy.all(); Multiple conditions; Count missing values NaN and infinity inf If you want to count elements that are not missing values, use negation ~ . Python � NumPy� numpy. count_nonzero (a, axis=None) [source] ¶ Counts the number of non-zero values in the array a. The word “non-zero” is in reference to the Python 2.x built-in method __nonzero__ () (renamed __bool__ () in Python 3.x) of Python objects that tests an object’s “truthfulness”.

To determine if the array is sparse, it may help to get a proportion of nan values

np.isnan(ndarr).sum() / ndarr.size

If that proportion exceeds a threshold, then use a sparse array, e.g. - https://sparse.pydata.org/en/latest/

how to count Nan occurrence in a ndarray?, I have an ndarray with dimension of 4X62500. is there anyway I can count the number of missing value (NaN)? because I want to know how� The arithmetic mean is the sum of the non-NaN elements along the axis divided by the number of non-NaN elements. Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32.

An alternative, but a bit slower alternative is to do it over indexing.

np.isnan(data)[np.isnan(data) == False].size

In [30]: %timeit np.isnan(data)[np.isnan(data) == False].size
1 loops, best of 3: 498 ms per loop 

The double use of np.isnan(data) and the == operator might be a bit overkill and so I posted the answer only for completeness.

numpy.nansum — NumPy v1.19 Manual, Return the sum of array elements over a given axis treating Not a Numbers ( NaNs) as zero. In NumPy versions <= 1.9.0 Nan is returned for� Counts the number of non-zero elements in the input array. Notes While the nonzero values can be obtained with a[nonzero(a)] , it is recommended to use x[x.astype(bool)] or x[x != 0] instead, which will correctly handle 0-d arrays.

numpy.isnan — NumPy v1.19 Manual, Test element-wise for NaN and return result as a boolean array. Parameters This means that Not a Number is not equivalent to infinity. how to count Nan occurrence in a ndarray?. Dear all, I have an ndarray with dimension of 4X62500. is there anyway I can count the number of missing value (NaN)? because I want to know how many

np.sum() - Numpy and Scipy Documentation, Values with a NaN value are ignored from operations like sum, count, etc. We can use not a number to represent missing or null values in Pandas. nan ). Delete elements from a Numpy Array by value or conditions in Python; Python� Browse other questions tagged python python-3.x numpy count or ask your own question. The Overflow Blog Java at 25: Features that made an impact and a look to the future

Numpy drop nan, Count number of occurrences of each value in array of non-negative ints. of array elements over a given axis treating Not a Numbers (NaNs) as one. Can be instantiated from a NumPy ndarray (via from_numpy ), or a Python iterable. NumPy: Count the frequency of unique values in numpy array Last update on February 26 2020 08:09:26 (UTC/GMT +8 hours) NumPy: Array Object Exercise-94 with Solution

Comments
  • This question appears to be off-topic because it belongs on codereview.stackexchange.com
  • You mean efficient in terms of memory?
  • +1 I was thinking about CPU time, but yeah why not memory as well. The faster and cheaper the better =)
  • @jjepsuomi A memory efficient version wil be sum(not np.isnan(x) for x in a), but in terms of speed it is slow compared to @M4rtini numpy version.
  • @AshwiniChaudhary Thank you very much! I need to see which one is more important in my application =)
  • +1 @M4rtini thank you again! You're great! ;D I will accept your answer as soon as I can :)
  • Maybe even numpy.isnan(array).sum()? I'm not very proficient with numpy though.
  • @msvalkon, It will count the number of NaN, while OP want the number of non-NaN elements.
  • @goncalopp stackoverflow.com/questions/8305199/… =)
  • An extension of @msvalkon answer: data.size - np.isnan(data).sum() will be slightly more efficient.
  • This answer provides the sum which is not the same as counting the number of elements ... You should use len instead.
  • @BenT the sum of a bool array elements that meet a certain condition is the same providing the len of a subset array with the elements that meet a certain condition. Can you please clarify where this is wrong?
  • My mistake I forgot a Boolean got return.