I couldn't summarize my question in the title very well. I'm writing a code and in one part of the code I need to compute the following:

Let's say we have a vector (e.g. a numpy array):

a = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3]

We want to turn any number greater than 5 to 5:

a = [3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]

Then we compute the sum of consecutive 5s and the number that follows them and replace all these elements with the resulting sum:

a = [3.2, 4, 5+ 2, 5+ 5+ 5+ 1.7, 2, 5+ 5+ 1, 3]

so the resulting array would be:

a = [3.2, 4, 7, 16.7, 2, 11, 3]

I can do this using a for loop like this:

    indx = np.where(a>5)[0]
    a[indx] = 5
    counter = 0
    c = []
    while (counter < len(a)):
        elem = a[counter]
        if elem ~= 5:
            temp = 0
                temp += elem
                counter +=1
                elem = a[counter]
            temp += elem
        counter += 1

Is there a way to avoid using the for loop? Perhaps by using the indx variable?

I have a vague idea if we turn it into a string: a = '[3.2, 4, 5, 2, 5, 5, 5, 1.7, 2, 5, 5, 1, 3]' and then change anywhere we have ' 5,' with ' 5+' and then use eval(a). However, is there an efficient way to find all indices containing a sub-string? How about the fact that strings are immutable?

You can use pandas for data manipulation, using cumsum and shift to groupby your values with your logic, and aggregating it with sum

df = pd.DataFrame(a, columns=['col1'])
df.loc[df.col1 > 5] = 5
s = df.col1.groupby((df.col1 != 5).cumsum().shift().fillna(0)).sum()

0.0     3.2
1.0     4.0
2.0     7.0
3.0    16.7
4.0     2.0
5.0    11.0
6.0     3.0

To get a numpy back, just get .values

>>> s.values
array([  3.2,   4. ,   7. ,  16.7,   2. ,  11. ,   3. ])

This is what you want (all in vectorized numpy):

import numpy as np

a = np.array([0, 3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3, 0]) # add a 0 at the beginning and the end
aa = np.where(a>5, 5, a) # clip values to 5, can use np.clip(a, None, 5) too...
c = np.cumsum(aa) # get cumulative sum
np.diff(c[aa < 5]) # only keep values where original array is less than 5, then diff again

array([ 3.2,  4. ,  7. , 16.7,  2. , 11. ,  3. ])

  • in numpy you can use np.where(a>=5) to get the indices
  • Thank you! Can we also use this method if "a" is a 2D array and we want to compute this for individual rows and fillna with np.inf?
  • Your answer fails for, e.g. arr = [3.2, 4, 7, 2, 8, 9, 7, 1.7, 2, 8, 9, 1, 3,5,2,5,5,5,5,5,5,5,5,5,5,10] or if arr finishes with 5
  • Is this the only case that it fails? What if we also add a "0" at the end of the array?
  • I would use numpy.clip to clip.
  • Note that the final 0 will show if your last value was <5, so this might need an extra check if you need it gone... Or maybe there is a cleaner less hacky way I didn't think of...
  • Thanks but I'm not sure this is much different. I'm looking for a loop-free solution if it exists.
  • I think any solution will require at least one pass over the array. Even if you were only looking to accomplish the first task, converting any number greater than 5 to 5, you would still need to evaluate each item once.