Update a dataframe in pandas while iterating row by row

pandas update row value
update pandas dataframe while iterating
pandas iterate over rows
pandas iterrows previous row
pandas iterate row and update
pandas iterate over rows and create new column
pandas at
update pandas column in loop

I have a pandas data frame that looks like this (its a pretty big one)

           date      exer exp     ifor         mat  
1092  2014-03-17  American   M  528.205  2014-04-19 
1093  2014-03-17  American   M  528.205  2014-04-19 
1094  2014-03-17  American   M  528.205  2014-04-19 
1095  2014-03-17  American   M  528.205  2014-04-19    
1096  2014-03-17  American   M  528.205  2014-05-17 

now I would like to iterate row by row and as I go through each row, the value of ifor in each row can change depending on some conditions and I need to lookup another dataframe.

Now, how do I update this as I iterate. Tried a few things none of them worked.

for i, row in df.iterrows():
    if <something>:
        row['ifor'] = x
    else:
        row['ifor'] = y

    df.ix[i]['ifor'] = x

None of these approaches seem to work. I don't see the values updated in the dataframe.

You can assign values in the loop using df.set_value:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.set_value(i,'ifor',ifor_val)

If you don't need the row values you could simply iterate over the indices of df, but I kept the original for-loop in case you need the row value for something not shown here.

update

df.set_value() has been deprecated since version 0.21.0 you can use df.at() instead:

for i, row in df.iterrows():
    ifor_val = something
    if <condition>:
        ifor_val = something_else
    df.at[i,'ifor'] = ifor_val

Update a dataframe in pandas while iterating row by row, Update a dataframe in pandas while iterating row by row. for i, row in df.iterrows(): if <something>: row['ifor'] = x. else: row['ifor'] = y. df.ix[i]['ifor'] = x. now I would like to iterate row by row and as I go through each row, the value of ifor in each row can change depending on some conditions and I need to lookup another dataframe. Now, how do I update this as I iterate. Tried a few things none of them worked. for i, row in df.iterrows(): if <something>: row['ifor'] = x. else: row['ifor'] = y

Pandas DataFrame object should be thought of as a Series of Series. In other words, you should think of it in terms of columns. The reason why this is important is because when you use pd.DataFrame.iterrows you are iterating through rows as Series. But these are not the Series that the data frame is storing and so they are new Series that are created for you while you iterate. That implies that when you attempt to assign tho them, those edits won't end up reflected in the original data frame.

Ok, now that that is out of the way: What do we do?

Suggestions prior to this post include:

  1. pd.DataFrame.set_value is deprecated as of Pandas version 0.21
  2. pd.DataFrame.ix is deprecated
  3. pd.DataFrame.loc is fine but can work on array indexers and you can do better

My recommendation Use pd.DataFrame.at

for i in df.index:
    if <something>:
        df.at[i, 'ifor'] = x
    else:
        df.at[i, 'ifor'] = y

You can even change this to:

for i in df.index:
    df.at[i, 'ifor'] = x if <something> else y

Response to comment

and what if I need to use the value of the previous row for the if condition?

for i in range(1, len(df) + 1):
    j = df.columns.get_loc('ifor')
    if <something>:
        df.iat[i - 1, j] = x
    else:
        df.iat[i - 1, j] = y

Pandas : 6 Different ways to iterate over rows in a Dataframe , Update contents a dataframe While iterating row by row So, to update the contents of dataframe we need to iterate over the rows of dataframe using iterrows() and then access earch row using at() to update it's contents. Update a dataframe in pandas while iterating row by row (4) A method you can use is itertuples() , it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows() .

A method you can use is itertuples(), it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows(). For itertuples(), each row contains its Index in the DataFrame, and you can use loc to set the value.

for row in df.itertuples():
    if <something>:
        df.at[row.Index, 'ifor'] = x
    else:
        df.at[row.Index, 'ifor'] = x

    df.loc[row.Index, 'ifor'] = x

Under most cases, itertuples() is faster than iat or at.

Thanks @SantiStSupery, using .at is much faster than loc.

Update a dataframe in pandas while iterating row by row, Update a dataframe in pandas while iterating row by row. Vis Team February 15, 2019. I have a pandas data frame that looks like this (its a pretty big one) The rows you get back from iterrows are copies that are no longer connected to the original data frame, so edits don't change your dataframe. Thankfully, because each item you get back from iterrows contains the current index, you can use that to access and edit the relevant row of the dataframe:

You should assign value by df.ix[i, 'exp']=X or df.loc[i, 'exp']=X instead of df.ix[i]['ifor'] = x.

Otherwise you are working on a view, and should get a warming:

-c:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead

But certainly, loop probably should better be replaced by some vectorized algorithm to make the full use of DataFrame as @Phillip Cloud suggested.

pandas.DataFrame.iterrows — pandas 1.0.5 documentation, Iterate over DataFrame rows as namedtuples of the values. To preserve dtypes while iterating over the rows, it is better to use itertuples() which returns namedtuples of the values You should never modify something you are iterating over. Dataframe cell value by Integer position. From the above dataframe, Let’s access the cell value of 1,2 i.e Index 1 and Column 2 i.e Col C. iat - Access a single value for a row/column pair by integer position. Use iat if you only need to get or set a single value in a DataFrame or Series.

Well, if you are going to iterate anyhow, why don't use the simplest method of all, df['Column'].values[i]

df['Column'] = ''

for i in range(len(df)):
    df['Column'].values[i] = something/update/new_value

Or if you want to compare the new values with old or anything like that, why not store it in a list and then append in the end.

mylist, df['Column'] = [], ''

for <condition>:
    mylist.append(something/update/new_value)

df['Column'] = mylist

How to update a value in each row of a Pandas DataFrame in Python, Iterate and use pandas.DataFrame.at() to update a value in a row. Update elements of a column individually by iterating through pandas.DataFrame.index . Iterating row-by-row is slow (and can usually be avoided). – Andy Hayden Dec 20 '13 at 1:15 In my case, data is updated trial by trial as the subject goes through the experiment and analysis needs to be done "online" before all data is collected.

Update a dataframe in pandas while iterating row by row, I have a pandas data frame that looks like this (its a pretty big one) date exer exp ifor mat 1092 2014-03-17 American M 528.205 2014-04-19� Pandas : 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row Pandas : skip rows while reading csv file to a Dataframe using read_csv() in Python 3 Comments Already

how to iterate over certain rows of a certain column an update their , I want to update the value of opening stock and closing stock by adding 20 when dcsDep2 < 3.028512 . here is the snapshot of Take a look at this example and modify according to your dataset. In [1]: import pandas as pd In [2]: df = pd. DataFrame({'a':[1,2,3], 'b':[0.1,0.3,0.9], 'c':[4,5,6]}) In [3]: df Out[3]:� Update a dataframe in pandas while iterating row by row Thanks for contributing an answer to Stack Overflow! Some of your past answers have not been well-received, and you're…

Iterate over rows in a dataframe in Pandas, A step-by-step Python code example that shows how to Iterate over rows in a If you wish to modify the rows you're iterating over, then df.apply is preferred:. Pandas: Sort rows or columns in Dataframe based on values using Dataframe.sort_values() Pandas : 6 Different ways to iterate over rows in a Dataframe & Update while iterating row by row; Python Pandas : How to add rows in a DataFrame using dataframe.append() & loc[] , iloc[] Pandas : Loop or Iterate over all or certain columns of a dataframe

Comments
  • I think you want df.ix[i,'ifor']. df.ix[i]['ifor'] is problematic because it is chained indexing (which isn't reliable in pandas).
  • Can you provide the other frame as well as the <something>. Whether your code can be vectorized will depend on those things. In general, avoid iterrows. In your case, you should definitely avoid it since each row will be an object dtype Series.
  • You would be better off creating a boolean mask for your condition, update all those rows and then set the rest to the other value
  • Please do not use iterrows(). It is a blatant enabler of the worst anti-pattern in the history of pandas.
  • See pandas.pydata.org/pandas-docs/stable/generated/…, second bullet: "2.You should never modify something you are iterating over"
  • I'm not sure if we read it exactly the same. If you look in my pseudo code I do the modification on the dataframe, not on the value from the iterator. The iterator value is only used for the index of the value/object. What will fail is row['ifor']=some_thing, for the reasons mentioned in the documentation.
  • Thank you for the clarification.
  • now set_value is also deprectated, and should use .at (or .iat), so my loop looks like this: for i, row in df.iterrows(): ifor_val = something if <condition>: ifor_val = something_else df.at[i,'ifor'] = ifor_val
  • set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  • and what if I need to use the value of the previous row for the if condition? add a lagged column to the OG df?
  • efficiency wise, is your approach better vs adding a lagged column or is the effect negligible for small datasets? (< 10k rows)
  • That depends. I'd go for using a lagged column. This answer is showing what to do if you must loop. But if you don't have to loop, then don't.
  • Got it, also if it's possible to have your feedback for stackoverflow.com/q/51753001/9754169 then it would be awesome :D
  • Nice for contrasting .at[] with the older alternatives