How to make this for loop faster?

how to make for loop faster in python
python speed up for loop
optimize nested for loops python
how to optimize for loop in java
python speed up nested loops
is map faster than for loop python
for loop taking a really long time
how to avoid for loops in python

I know that python loops themselves are relatively slow when compared to other languages but when the correct functions are used they become much faster. I have a pandas dataframe called "acoustics" which contains over 10 million rows:

print(acoustics)
                        timestamp            c0  rowIndex
0        2016-01-01T00:00:12.000Z  13931.500000   8158791
1        2016-01-01T00:00:30.000Z  14084.099609   8158792
2        2016-01-01T00:00:48.000Z  13603.400391   8158793
3        2016-01-01T00:01:06.000Z  13977.299805   8158794
4        2016-01-01T00:01:24.000Z  13611.000000   8158795
5        2016-01-01T00:02:18.000Z  13695.000000   8158796
6        2016-01-01T00:02:36.000Z  13809.400391   8158797
7        2016-01-01T00:02:54.000Z  13756.000000   8158798

and there is the code I wrote:

acoustics = pd.read_csv("AccousticSandDetector.csv", skiprows=[1])
weights = [1/9, 1/18, 1/27, 1/36, 1/54]
sumWeights = np.sum(weights)
deltaAc = []
for i in range(5, len(acoustics)):
    time = acoustics.iloc[i]['timestamp']
    sum = 0
    for c in range(5):
        sum += (weights[c]/sumWeights)*(acoustics.iloc[i]['c0']-acoustics.iloc[i-c]['c0'])
    print("Row " + str(i) + " of " + str(len(acoustics)) + " is iterated")
    deltaAc.append([time, sum])

deltaAc = pd.DataFrame(deltaAc)

It takes a huge amount of time, how can I make it faster?

You can use diff from pandas and create all the differences for each row in an array, then multiply with your weigths and finally sum over the axis 1, such as:

deltaAc = pd.DataFrame({'timestamp': acoustics.loc[5:, 'timestamp'], 
                       'summation': (np.array([acoustics.c0.diff(i) for i in range(5) ]).T[5:]
                                               *np.array(weights)).sum(1)/sumWeights})

and you get the same values than what I get with your code:

print (deltaAc)
                  timestamp  summation
5  2016-01-01T00:02:18.000Z -41.799986
6  2016-01-01T00:02:36.000Z  51.418728
7  2016-01-01T00:02:54.000Z  -3.111184

Optimize loops, append(urlt) and then immediately overwriting that urlt item with your title data, you should just append the title data directly to the list. If you don't really need that index, you can make things simpler. Here's an approach using a list comprehension, which is a little faster than using append in a loop. It is important to realize that everything you put in a loop gets executed for every loop iteration. They key to optimizing loops is to minimize what they do. Even operations that appear to be very fast will take a long time if the repeated many times. Executing an operation that takes 1 microsecond a million times will take 1 second to complete.

First optimization, weights[c]/sumWeights could be done outside the loop.

weights_array = np.array([1/9, 1/18, 1/27, 1/36, 1/54])
sumWeights = np.sum(weights_array)
tmp = weights_array / sumWeights
...
        sum += tmp[c]*...

I'm not familiar with pandas, but if you could extract your columns as 1D numpy array, it would be great for you. It might look something like:

# next lines to be tested, or find the correct way of extracting the column
c0_column = acoustics[['c0']].values
time_column = acoustics[['times']].values
...
sum = numpy.zeros(shape=(len(acoustics)-5,))
delta_ac = []
for c in range(5):
    sum += tmp[c]*(c0_column[5:]-c0_column[5-c:len(acoustics)-c])

for i in range(len(acoustics)-5):
    deltaAc.append([time[5+i], sum[i])

How to make for loop run faster?, It is important to realize that everything you put in a loop gets executed for every loop iteration. They key to optimizing loops is to minimize what they do. Even operations that appear to be very fast will take a long time if the repeated many times. I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks. library(HI) dmat <- matr

Dataframes have a great method rolling for constructing and applying windowing transformations; So, you don't need loops at all:

# df is your data frame
window_size = 5
weights = pd.np.array([1/9, 1/18, 1/27, 1/36, 1/54])
weights /= weights.sum()
df.loc[:,'deltaAc'] = df.loc[:, 'c0'].rolling(window_size).apply(lambda x: ((x[-1] - x)*weights).sum())

How to make loops run faster using Python?, That can't be the fastest way to do it, said my friend. the first one, but gets rid of the for loop overhead in favor of the faster, implied loop of the reduce() function. This article covers a quick comparison of For, Foreach, and While loop on array and list. As I said before, an Array is faster than List but as per my observation (in terms of iteration), List is faster as we can see in the outputs. I think it depends on data and the way you use the data.

PythonSpeed/PerformanceTips, Indeed, map() runs noticeably, but not overwhelmingly, faster. List comprehension. You may have noticed that each run of the inner loop� The apply () Method — 811 times faster. apply is not faster in itself but it has advantages when used in combination with DataFrames. This depends on the content of the apply expression. If it can be executed in Cython space, apply is much faster (which is the case here). We can use apply with a Lambda function.

Python Patterns, The naive way to do this would be to loop for each point and to check whether it fulfills this criterion. Codewise, this could look like as follows:� This can be done by using variants as arrays. You can then use For/Next loops in VBA and they will be quick. Another trick, when you need to look things up, is to use a Dictionary object. (The way the program is written you will need to make sure that "Microsoft Scripting Runtime" is selected from the Tools-->References menu in the VB Editor.)

If you have slow loops in Python, you can fix it…until you can't, How To Make Your Pandas Loop 71803 Times Faster. Looping through Pandas DataFrames can be very slow — I will show you some very fast� In the code I want to sum the cashflow matrix on each price simulation. This have to be done for every day left in the contract, the j for loop, and for every day with opportunities left in the contract, the w for loop. This is only a part of the code and might be why is doesn’t make sense that I overwrite Y in every loop.

Comments
  • well removing print would be a good first step
  • @SuperStew I want to track the progress
  • What version of python are you running? And how long is acoustics?
  • @RehimAlizadeh then i would only print multiples of 100k or something
  • Normally when using pandas you don't want to use for loops. It looks like you are making a new series based on some criteria from the rest of the dataframe, is that right? Here is an example of using pandas without having to use for loops.
  • thanks for your solution, the results are not the same as the code I wrote
  • @RehimAlizadeh Indeed, I had a mistypo (df instead of acoustics) in my code, but otherwse it gives the same values in the column with the sum than your code
  • @RehimAlizadeh so the method diff with a value i allows to calculate at once all the difference of each row with the value i rows before. The list compresion in np.array create an array in which each row represent the difference between the row with the i-th row before from the dataframe. The T transposes rows and columns and [5:] is to not select the first 5 rows to do the operation equivalent to your code for i in range(5, len(acoustics)). The *np.array(weights) will multiply each column, being the i-th difference, with the good weight in a vectorize way
  • @RehimAlizadeh finally, the sum will sum all the values on a same rows, that you can divide by the total of weight to normalize the result. All this code is used to create the values from the column summation and create the result dataframe with the good timestamp. I hope it helps. Let me know if you need more
  • Thank you very much for the explanation, it was very helpful!
  • This code doesn't exactly do what I wanted, I want to find the sum (from 1 to 5) of (weight multiplied ( given point -previous point(from 1 to 5)))
  • I'm unable to test the code at the moment, so if anyone wants to build upon my answer to fix what is wrong and get it to produce exactly the same result as the original loop, I wouldn't mind ;) Juggling with indices is always tricky. But the general idea is that for loops can generally be replaced by array operations (with adequate slicing when computing a difference).
  • thanks for your solution, it's super fast but the results are not the same