Remove duplicate column indexes from pandas dataframe

pandas drop duplicate index
pandas drop duplicates
pandas drop duplicates multiple columns
pandas duplicated
pandas duplicated index
pandas duplicate column names
pandas drop index
pandas drop column

I am looking for a solution to removing duplicate column indexes in my dataframe- what I need to do is to add the values in the duplicate columns row by row and then keep only 1 of these columns with the summed value

df = pd.DataFrame(np.array([[0,0,0,1,0,0,0], [0,1,0,0,0,0,0],
                        [0,0,0,0,0,0,1]]), columns=[1,1,2,2,2,3,3], index=[1,2,3])

   1  1  2  2  2  3  3
1  0  0  0  1  0  0  0
2  0  1  0  0  0  0  0
3  0  0  0  0  0  0  1

should become

   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

Due to the missing data, an ugly attempt to your problem:

import pandas as pd
df = pd.DataFrame(np.array([[0,0,0,1,0,0,0], [0,1,0,0,0,0,0], 
                            [0,0,0,0,0,0,1]]))
df.columns = [1,1,2,2,2,3,3]
df1 = df.groupby(lambda x:x, axis=1).sum()
df1.index = range(1,4)
df1

outputs the desired dataframe you posted. The following df1.index = range(1,4) is just to re-index the rows because they start with 1 in your example.

pandas.Index.drop_duplicates, 'last' : Drop duplicates except for the last occurrence. False : Drop all duplicates. Returns. deduplicatedIndex. See also. Series.drop_duplicates. Equivalent  Steps to Remove Duplicates from Pandas DataFrame Step 1: Gather the data. Step 2: Create Pandas DataFrame. Step 3: Remove duplicates from Pandas DataFrame.


Simply group by columns:

df.groupby(df.columns, 1).sum()

   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

Or as pointed out by @user2285236

df.groupby(axis=1, level=0).sum()

pandas.Index.duplicated, Equivalent method on pandas.DataFrame. Index.drop_duplicates. Remove duplicate values from Index. Examples. By default, for each set of duplicated values,  Drop Duplicates from a specific Column and Keep first row. We will group the rows for each zone and just keep the first in each group i.e. For Zone East we have two rows in original dataframe i.e. index 0 and 4 and we want to keep only index 0 in this zone. df.drop_duplicates ('Zone',keep='first') df.drop_duplicates('Zone',keep='first')


you do not need groupby here

df.sum(level=0,axis=1)
Out[358]: 
   1  2  3
1  0  1  0
2  1  0  0
3  0  0  1

pandas.DataFrame.drop_duplicates, Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Parameters. Drop duplicate columns in a DataFrame. To remove the duplicate columns we can pass the list of duplicate column’s names returned by our API to the dataframe.drop() i.e.


Have you tried?

df = df.loc[:,~df.columns.duplicated(keep='last')]

Remove duplicate columns by name in Pandas, A step-by-step Python code example that shows how to remove duplicate columns by name in DataFrame(raw_data, index = ['Willard Morris', 'Al Jennings']) df  An important part of Data analysis is analyzing Duplicate Values and removing them. Pandas drop_duplicates() method helps in removing duplicates from the data frame. Syntax: DataFrame.drop_duplicates(subset=None, keep=’first’, inplace=False) Parameters: subset: Subset takes a column or list of column label.


How to Find & Drop duplicate columns in a DataFrame, How to Find & Drop duplicate columns in a DataFrame | Python Pandas Iterate over all the columns in DataFrame from (x+1)th index till end. Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Parameters subset column label or sequence of labels, optional. Only consider certain columns for identifying duplicates, by default use all of the columns. keep {‘first’, ‘last’, False}, default ‘first’


Remove rows with duplicate indices in Pandas DataFrame , import pandas as pd. df = pd.DataFrame({ 'Age' : [ 30 , 30 , 22 , 40 , 20 , 30 , 20 , 25 ],. 'Height' : [ 165 , 165 , 120 , 80 , 162 , 72 , 124 , 81 ],. Drop duplicates in the first name column, but take the last obs in the duplicated set. df.drop_duplicates(['first_name'], keep='last') Everything on this site is available on GitHub. Head to and submit a suggested change.


How to remove duplicate data from python dataframe, Drop the Duplicate rows. The row at index 2 and 6 in above dataframe are duplicates and all the three columns Name, Age and Zone matches  Delete a Multiple Rows by Index Position in DataFrame. As df.drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop().