## python numpy weighted average with nans

First things first: this is not a duplicate of NumPy: calculate averages with NaNs removed, i'll explain why:

Suppose I have an array

a = array([1,2,3,4])

and I want to average over it with the weights

weights = [4,3,2,1] output = average(a, weights=weights) print output 2.0

ok. So this is pretty straightforward. But now I have something like this:

a = array([1,2,nan,4])

calculating the average with the usual method yields of course`nan`

. Can I avoid this?
In principle I want to ignore the nans, so I'd like to have something like this:

a = array([1,2,4]) weights = [4,3,1] output = average(a, weights=weights) print output 1.75

First find out indices where the items are not `nan`

, and then pass the filtered versions of `a`

and `weights`

to `numpy.average`

:

>>> import numpy as np >>> a = np.array([1,2,np.nan,4]) >>> weights = np.array([4,3,2,1]) >>> indices = np.where(np.logical_not(np.isnan(a)))[0] >>> np.average(a[indices], weights=weights[indices]) 1.75

As suggested by @mtrw in comments, it would be cleaner to use masked array here instead of index array:

>>> indices = ~np.isnan(a) >>> np.average(a[indices], weights=weights[indices]) 1.75

**numpy.nanmean — NumPy v1.19 Manual,** Nan is returned for slices that contain only NaNs. See also. average. Weighted average. mean. Arithmetic mean taken� numpy.average¶ numpy.average (a, axis=None, weights=None, returned=False) [source] ¶ Compute the weighted average along the specified axis. Parameters a array_like. Array containing data to be averaged. If a is not an array, a conversion is attempted. axis None or int or tuple of ints, optional. Axis or axes along which to average a. The default, axis=None, will average over all of the elements of the input array.

Alternatively, you can use a MaskedArray as such:

>>> import numpy as np >>> a = np.array([1,2,np.nan,4]) >>> weights = np.array([4,3,2,1]) >>> ma = np.ma.MaskedArray(a, mask=np.isnan(a)) >>> np.ma.average(ma, weights=weights) 1.75

**numpy.nanmean — NumPy v1.9 Manual,** Nan is returned for slices that contain only NaNs. See also. average: Weighted average; mean: Arithmetic mean taken while not ignoring NaNs. numpy.nanmean() function can be used to calculate the mean of array ignoring the NaN value. If array have NaN value and we can find out the mean without effect of NaN value. Syntax: numpy.nanmean(a, axis=None, dtype=None, out=None, keepdims=)) Parametrs: a: [arr_like] input array axis: we can use axis=1 means row wise or axis=0 means column wise.

I would offer another solution, which is more scalable to bigger dimensions (eg when doing average over different axis). Attached code works with 2D array, which possibly contains nans, and takes average over `axis=0`

.

a = np.random.randint(5, size=(3,2)) # let's generate some random 2D array # make weights matrix with zero weights at nan's in a w_vec = np.arange(1, a.shape[0]+1) w_vec = w_vec.reshape(-1, 1) w_mtx = np.repeat(w_vec, a.shape[1], axis=1) w_mtx *= (~np.isnan(a)) # take average as (weighted_elements_sum / weights_sum) w_a = a * w_mtx a_sum_vec = np.nansum(w_a, axis=0) w_sum_vec = np.nansum(w_mtx, axis=0) mean_vec = a_sum_vec / w_sum_vec # mean_vec is vector with weighted nan-averages of array a taken along axis=0

**Python,** numpy.nanmean() function can be used to calculate the mean of array ignoring the NaN value. If array have NaN value and we can find out the� Python NumPy Array Object Exercises, Practice and Solution: Write a NumPy program to calculate averages without NaNs along a given array.

Expanding on @Ashwini and @Nicolas' answers, here is a version that can also handle an edge case where all the data values are np.nan, and that is designed to also work with pandas DataFrame without type-related issues:

def calc_wa_ignore_nan(df: pd.DataFrame, measures: List[str], weights: List[Union[float, int]]) -> np.ndarray: """ Calculates the weighted average of `measures`' values, ex-nans. When nans are present in `measures`' values, the weights are recalculated based only on the weights for non-nan measures. Note: The calculation used is NOT the same as just ignoring nans. For example, if we had data and weights: data = [2, 3, np.nan] weights = [0.5, 0.2, 0.3] calc_wa_ignore_nan approach: (2*(0.5/(0.5+0.2))) + (3*(0.2/(0.5+0.2))) == 2.285714285714286 The ignoring nans approach: (2*0.5) + (3*0.2) == 1.6 Args: data: Multiple rows of numeric data values with `measures` as column headers. measures: The str names of values to select from `row`. weights: The numeric weights associated with `measures`. Example: >>> df = pd.DataFrame({"meas1": [1, 1], "meas2": [2, 2], "meas3": [3, 3], "meas4": [np.nan, 0], "meas5": [5, 5]}) >>> measures = ["meas2", "meas3", "meas4"] >>> weights = [0.5, 0.2, 0.3] >>> calc_wa_ignore_nan(df, measures, weights) array([2.28571429, 1.6]) """ assert not df.empty, "Nothing to calculate weighted average for: `df` is empty." # Need to coerce type to np.float instead of python's float # to avoid "ufunc 'isnan' not supported for the input types ..." error data = np.array(df[measures].values, dtype=np.float64) # Make a 2d array with the same weights for each row # cast for safety and better errors weights = np.array([weights, ] * data.shape[0], dtype=np.float64) mask = np.isnan(data) masked_data = np.ma.masked_array(data, mask=mask) masked_weights = np.ma.masked_array(weights, mask=mask) # np.nanmean doesn't support weights weighted_avgs = np.average(masked_data, weights=masked_weights, axis=1) # Replace masked elements with np.nan # otherwise those elements will be interpretted as 0 when read into a pd.DataFrame weighted_avgs = weighted_avgs.filled(np.nan) return weighted_avgs

**NumPy: Compute the weighted average along the specified axis of a ,** NumPy Statistics Exercises, Practice and Solution: Write a Python NumPy program to compute the weighted average along the specified axis of� import numpy as np dat = np.array ([ [1, 2, 3], [4, 5, np.nan], [np.nan, 6, np.nan], [np.nan, np.nan, np.nan]]) print (dat) print (dat.mean (1)) # [ 2. nan nan nan] With NaNs removed, my expected output would be: array ([ 2., 4.5, 6., nan])

Compute the arithmetic mean along the specified axis, ignoring NaNs. Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs. For all-NaN slices, NaN is returned and a RuntimeWarning is raised.

Compute the arithmetic mean along the specified axis, ignoring NaNs. nanstd (a[, axis, dtype, out, ddof, keepdims]) Compute the standard deviation along the specified axis, while ignoring NaNs. nanvar (a[, axis, dtype, out, ddof, keepdims]) Compute the variance along the specified axis, while ignoring NaNs.

Compute the arithmetic mean along the specified axis, ignoring NaNs. Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs. For all-NaN slices, NaN is returned and a RuntimeWarning is raised.

##### Comments

- +1, though I think
`indices = ~np.isnan(a)`

looks nicer (and for huge`a`

might be faster). - @mtrw That certainly looks better, will update my answer. Thanks.
- Another alternative is to use
`np.nan_to_num(arr)`

before doing average. This will replace any NaN with 0. - @TirthaR The zeros you'd obtain this way would distort the result.
- A problem here is that the method alters the size of the array, so any operation downstream of this and depending on size would need to be corrected accordingly. In such cases it sounds much better to use the masking method proposed by @Nicolas Barbey below.
- This is the best solution because it can be used with
`axis`

parameter to evaluate the weighted average over several columns where the NaNs do not have the same indices over each column.