How can I reduce a numpy array based on a key rather than an axis?

numpy reshape
numpy where
numpy concatenate
numpy flatten
numpy zeros
numpy append
numpy sum
numpy transpose

I have a numpy array with 2 columns. The second column represents the keys that I want to reduce on.

>>> x
array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])

I want to sum up all the values which share a key, like this.

>>>sum_key(x)
array([[0.35 , 1.  ],
       [1.0, 0.  ]])

This seems like a relatively universal task, but I can't find a good name for it or see it discussed. Any ideas?

This is kinda overcomplicated but it should do the work:

import numpy as np
x = np.array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])
keys = x[:,1]
values = x[:,0]
keys_unique = np.unique(keys)
print([[sum(values[keys == k]), k] for k in keys_unique])

Output:

[[1.0, 0.0], [0.35, 1.0]]

Array manipulation routines — NumPy v1.20.dev0 Manual, Move axes of an array to new positions. rollaxis (a, axis[, start]). Roll the specified axis backwards, until it lies in� On the other hand, if you instead want to reduce the axis of the array, use the squeeze() method. It removes the axis that has a single entry. It removes the axis that has a single entry. This means if you have created a 2 x 2 x 1 matrix, squeeze() will remove the third dimension from the matrix:

import numpy as np
import pandas as pd

data = np.array([[0.1 , 1.  ],
       [0.25, 1.  ],
       [0.45, 0.  ],
       [0.55, 0.  ]])

df = pd.DataFrame(data)

gr = df.groupby([1])[0].agg('sum')

print(gr.keys().values)

data1 = np.array([[gr[k],k] for k in gr.keys().values])
print(data1)

numpy.sum — NumPy v1.19 Manual, The default, axis=None, will sum all of the elements of the input array. a has an integer dtype of less precision than the default platform integer. If this is set to True, the axes which are reduced are left in the result as dimensions with size one. In contrast to NumPy, Python's math.fsum function uses a slower but more � numpy.append - This function adds values at the end of an input array. The append operation is not inplace, a new array is allocated. Also the dimensions of the input arrays m

If the indices (keys) are ascending integers (or can be casted easily as in your case) the most convenient way is to use np.bincount.

import numpy as np

x = np.array([[0.1 , 1.  ],
             [0.25, 1.  ],
             [0.45, 0.  ],
             [0.55, 0.  ]])

v = x[:, 0]
i = x[:, 1]

counts = np.bincount(i.astype(int), v)

print(counts)

# returns [1.   0.35]

Indexing numpy arrays — SciPy Cookbook documentation, As usual for python, the start index is included and the stop index is not One can also supply a number for an axis rather than a slice: In [ ]:. Slicing an array. You can slice a numpy array is a similar way to slicing a list - except you can do it in more than one dimension. As with indexing, the array you get back when you index or slice a numpy array is a view of the original array. It is the same data, just accessed in a different order.

Look Ma, No For-Loops: Array Programming With NumPy – Real , If you're looking to read more on NumPy indexing, grab some coffee and head within NumPy to express operations as occurring on entire arrays rather than You could argue that, based on this description, the results above should be “ reversed.” However, the key is that axis refers to the axis along which a function gets� a: array_like. Array to be sorted. axis: int or None, optional. Axis along which to sort. If None, the array is flattened before sorting. The default is -1, which sorts along the last axis. kind: {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, optional. Sorting algorithm. Default is ‘quicksort’. order: str or list of str

4. NumPy Basics: Arrays and Vectorized Computation, For more on advanced NumPy features like broadcasting, see Chapter 12. You are, of course, welcome to put from numpy import * in your code to avoid having One of the key features of NumPy is its N-dimensional array object, or ndarray, which The boolean array must be of the same length as the axis it's indexing. It may look better on the actual site rather than through syndicated pages like planet.python and it may take a while to load on non-broadband connections (total size is around 20MB) Summary We analyze a stack of images in parallel with NumPy arrays distributed across a cluster of machines on Amazon’s EC2 with Dask array.

Indexing and selecting data — pandas 0.8.1 documentation, The axis labeling information in pandas objects serves many purposes: with implementing class behavior in Python) is selecting out lower-dimensional slices. Note, with the advanced indexing ix method, you may select along more than one You may wish to set values on a DataFrame based on some boolean criteria� NumPy functions now always support overrides with __array_function__ ¶ NumPy now always checks the __array_function__ method to implement overrides of NumPy functions on non-NumPy arrays, as described in NEP 18. The feature was available for testing with NumPy 1.16 if appropriate environment variables are set, but is now always enabled.

Comments
  • Great! this seems to work. I'm interested using this for somewhat large arrays O(10^6) elements. So hopefully it scales reasonably
  • This solution is a bit complicated and introduces an unnecessary dependence on pandas. Is there any advantage over ExplodingGayFish's answer?
  • The keys are integers, but may be negative as well. This solution works, but will require a little extra gymnastics to keep track of the correspondence with the keys
  • Yes, but it is probably faster than other approaches. You need to check, maybe it is worth the little extra gymnastics.