Speed Up Nested For Loops with NumPy

python speed up nested loops
python speed up for loop
numpy for loop
numpy vectorize performance
optimize nested for loops python
how to make for loop faster in python
numpy map
numpy performance tips

I'm trying to solve a dynamic programming problem, and I came up with a simple loop-based algorithm which fills in a 2D array based on a series of if statements like this:

s = # some string of size n
opt = numpy.zeros(shape=(n, n))

for j in range(0, n):
    for i in range(j, -1, -1):
        if j - i == 0:
            opt[i, j] = 1
        elif j - i == 1:
            opt[i, j] = 2 if s[i] == s[j] else 1
        elif s[i] == s[j] and opt[i + 1, j - 1] == (j - 1) - (i + 1) + 1:
            opt[i, j] = 2 + opt[i + 1, j - 1]
        else:
            opt[i, j] = max(opt[i + 1, j], opt[i, j - 1], opt[i + 1, j - 1])

Unfortunately, this code is extremely slow for large values of N. I found that it is much better to use built in functions such as numpy.where and numpy.fill to fill in the values of the array as opposed to for loops, but I'm struggling to find any examples which explain how these functions (or other optimized numpy methods) can be made to work with a series of if statements, as my algorithm does. What would be an appropriate way to rewrite the above code with built-in numpy libraries to make it better optimized for Python?

I don't think that np.where and np.fill can solve your problem. np.where is used to return elements of a numpy array that satisfy a certain condition, but in your case, the condition is NOT on VALUES of the numpy array, but on the values from variables i and j.

For your particular question, I would recommend using Cython to optimize your code specially for larger values of N. Cython is basically an interface between Python and C. The beauty of Cython is that it allows you to keep your python syntax, but optimize it using C structures. It allows you to define types of variables in a C-like manner to speed up your computations. For example, defining i and j as integers using Cython will speed thing up quite considerably because the types of i and j are checked at every loop iteration.

Also, Cython will allow you to define classic, fast, 2D arrays using C. You can then use pointers for fast element access to this 2D array instead of using numpy arrays. In your case, opt will be that 2D array.

Code Mechanic: Numpy Vectorization – Chelsea Troy, because it replaces the loop (running each item one by one) with something else that runs the operation on several items in parallel. So using broadcasting not only speed up writing code, it’s also faster the execution of it! In the vectorized element-wise product of this example, in fact i used the Numpy np.dot function. Now, how can apply such strategy to get rid of the loops? Vectorizing the loop in the distance function. Let’s begin with the loop in the distance function.

Your if statements and the left-hand sides of your assignment statements contain references to the array that you're modifying in the loop. This means that there will be no general way to translate your loop into array operations. So you're stuck with some kind of for loop.

If you instead had the simpler loop:

for j in range(0, n):
    for i in range(j, -1, -1):
        if j - i == 0:
            opt[i, j] = 1
        elif j - i == 1:
            opt[i, j] = 2
        elif s[i] == s[j]:
            opt[i, j] = 3
        else:
            opt[i, j] = 4

you could construct boolean arrays (using some broadcasting) that represent your three conditions:

import numpy as np

# get arrays i and j that represent the row and column indices
i,j = np.ogrid[:n, :n]
# construct an array with the characters from s
sarr = np.fromiter(s, dtype='U1').reshape(1, -1)

cond1 = i==j             # result will be a bool arr with True wherever row index equals column index
cond2 = j==i+1           # result will be a bool arr with True wherever col index equals (row index + 1)
cond3 = sarr==sarr.T     # result will be a bool arr with True wherever s[i]==s[j]

You could then use numpy.select to construct your desired opt:

opt = np.select([cond1, cond2, cond3], [1, 2, 3], default=4)

For n=5 and s='abbca', this would yield:

array([[1, 2, 4, 4, 3],
       [4, 1, 2, 4, 4],
       [4, 3, 1, 2, 4],
       [4, 4, 4, 1, 2],
       [3, 4, 4, 4, 1]])

If you have slow loops in Python, you can fix it…until you can't, If you have slow loops in Python, you can fix it…until you can't In the first part (​lines 3–7 above), two nested for loops are used to build Inside the outer loop, initialization of grid[item+1] is 4.5 times faster for a NumPy array  The implementation of numba is quite easy if one uses numpy and is particularly performant if the code has a lot of loops. If the functions are correctly set up, i.e. using loops and basic numpy functions, a simple addition of the @njit decorator will flag the function to be compiled in numba and will be rewarded with an increase in speed.

Speeding up a numpy loop, I rewrite the function to speed up the calculation? The first thing to do would be to stop using range for the nested j loop and use the  Initialization of grid[0] as a numpy array (line 274) is three times faster than when it is a Python list (line 245). Inside the outer loop, initialization of grid[item+1] is 4.5 times faster for a NumPy array (line 276) than for a list (line 248). So far, so good.

Look Ma, No For-Loops: Array Programming With NumPy – Real , In this tutorial you'll see step-by-step how these advanced features in NumPy help you of ways to speed up operation runtime in Python without sacrificing ease of use. approach to creating sliding patches would involve a nested for-​loop. The pure-Python approach to creating sliding patches would involve a nested for-loop. You’d need to consider that the starting index of the right-most patches will be at index n - 3 + 1 , where n is the width of the array.

Speeding up Python Code: Fast Filtering and Slow Loops, Speeding up Python Code: Fast Filtering and Slow Loops. List comprehensions, boolean indexing and just-in-time (JIT) compilation for up to  write one for loop to print out each element of the list several_things.then,write another for loop to print out the type of each element of the list several_things 2 days ago Write code to create a list of word lengths for the words in original_str using the accumulation pattern and assign the answer to a variable num_words_list. 6 days ago

Iterating Over Arrays - Numpy and Scipy, We can see that in the case of nested loops, list comprehensions are faster than the ordinary for loops, which are faster than while. In this case,  The results show that list comprehensions were faster than the ordinary for loop, which was faster than the while loop. The simple loops were slightly faster than the nested loops in all three cases. numpy offers the routines and operators that can substantially reduce the amount of code and increase the speed of execution. It’s especially