Automatically rename columns to ensure they are unique

pandas rename multiple columns
pandas rename single column
pandas rename column
rename column values pandas
rename columns pandas stack overflow
pandas rename column if contains string
pandas rename duplicate columns names
rename column by index pandas

I fetch a spreadsheet into a Python DataFrame named df.

Let's give a sample:

df=pd.DataFrame({'a': np.random.rand(10), 'b': np.random.rand(10)})
df.columns=['a','a']

          a         a
0  0.973858  0.036459
1  0.835112  0.947461
2  0.520322  0.593110
3  0.480624  0.047711
4  0.643448  0.104433
5  0.961639  0.840359
6  0.848124  0.437380
7  0.579651  0.257770
8  0.919173  0.785614
9  0.505613  0.362737

When I run df.columns.is_unique I get False

I would like to automatically rename column 'a' to 'a_2' (or things like that)

I don't expect a solution like df.columns=['a','a_2']

I looking for a solution that could be usable for several columns!

You can uniquify the columns manually:

df_columns = ['a', 'b', 'a', 'a_2', 'a_2', 'a', 'a_2', 'a_2_2']

def uniquify(df_columns):
    seen = set()

    for item in df_columns:
        fudge = 1
        newitem = item

        while newitem in seen:
            fudge += 1
            newitem = "{}_{}".format(item, fudge)

        yield newitem
        seen.add(newitem)

list(uniquify(df_columns))
#>>> ['a', 'b', 'a_2', 'a_2_2', 'a_2_3', 'a_3', 'a_2_4', 'a_2_2_2']

Panda's DataFrame, Starting with Pandas 0.19.0 pd.read_csv() has improved support for duplicate column names. So we can try to use the internal method: In [137]: pd.io.parsers. Rename columns with base R functions. Along with dplyr rename() , you can also rename columns of a dataframe using a logical vector or an index. Let us now modify the column name “Month” of hflights to “month” using logical vector. Generate a logical expression by comparing the names vector to the target element,

I fetch a spreadsheet into a Python DataFrame named df... I would like to automatically rename [duplicate] column [names].

Pandas does that automatically for you without you having to do anything...

test.xls:

import pandas as pd
import numpy as np

df = pd.io.excel.read_excel(
    "./test.xls", 
    "Sheet1",
    header=0,
    index_col=0,
)
print df

--output:--
        a    b   c  b.1  a.1  a.2
index                            
0      10  100 -10 -100   10   21
1      20  200 -20 -200   11   22
2      30  300 -30 -300   12   23
3      40  400 -40 -400   13   24
4      50  500 -50 -500   14   25
5      60  600 -60 -600   15   26


print df.columns.is_unique

--output:--
True

If for some reason you are being given a DataFrame with duplicate columns, you can do this:

import pandas as pd
import numpy as np
from collections import defaultdict 

df = pd.DataFrame(
    {
        'k': np.random.rand(10),
        'l': np.random.rand(10), 
        'm': np.random.rand(10),
        'n': np.random.rand(10),
        'o': np.random.rand(10),
        'p': np.random.rand(10),
    }
)

print df

--output:--
         k         l         m         n         o         p
0  0.566150  0.025225  0.744377  0.222350  0.800402  0.449897
1  0.701286  0.182459  0.661226  0.991143  0.793382  0.980042
2  0.383213  0.977222  0.404271  0.050061  0.839817  0.779233
3  0.428601  0.303425  0.144961  0.313716  0.244979  0.487191
4  0.187289  0.537962  0.669240  0.096126  0.242258  0.645199
5  0.508956  0.904390  0.838986  0.315681  0.359415  0.830092
6  0.007256  0.136114  0.775670  0.665000  0.840027  0.991058
7  0.719344  0.072410  0.378754  0.527760  0.205777  0.870234
8  0.255007  0.098893  0.079230  0.225225  0.490689  0.554835
9  0.481340  0.300319  0.649762  0.460897  0.488406  0.16604


df.columns = ['a', 'b', 'c', 'b', 'a', 'a']
print df

--output:--
          a         b         c         b         a         a
0  0.566150  0.025225  0.744377  0.222350  0.800402  0.449897
1  0.701286  0.182459  0.661226  0.991143  0.793382  0.980042
2  0.383213  0.977222  0.404271  0.050061  0.839817  0.779233
3  0.428601  0.303425  0.144961  0.313716  0.244979  0.487191
4  0.187289  0.537962  0.669240  0.096126  0.242258  0.645199
5  0.508956  0.904390  0.838986  0.315681  0.359415  0.830092
6  0.007256  0.136114  0.775670  0.665000  0.840027  0.991058
7  0.719344  0.072410  0.378754  0.527760  0.205777  0.870234
8  0.255007  0.098893  0.079230  0.225225  0.490689  0.554835
9  0.481340  0.300319  0.649762  0.460897  0.488406  0.166047


print df.columns.is_unique

--output:--
False  


name_counts = defaultdict(int)
new_col_names = []

for name in df.columns:
    new_count = name_counts[name] + 1
    new_col_names.append("{}{}".format(name, new_count))
    name_counts[name] = new_count 

print new_col_names


--output:--
['a1', 'b1', 'c1', 'b2', 'a2', 'a3']


df.columns = new_col_names
print df

--output:--
         a1        b1        c1        b2        a2        a3
0  0.264598  0.321378  0.466370  0.986725  0.580326  0.671168
1  0.938810  0.179999  0.403530  0.675112  0.279931  0.011046
2  0.935888  0.167405  0.733762  0.806580  0.392198  0.180401
3  0.218825  0.295763  0.174213  0.457533  0.234081  0.555525
4  0.891890  0.196245  0.425918  0.786676  0.791679  0.119826
5  0.721305  0.496182  0.236912  0.562977  0.249758  0.352434
6  0.433437  0.501975  0.088516  0.303067  0.916619  0.717283
7  0.026491  0.412164  0.787552  0.142190  0.665488  0.488059
8  0.729960  0.037055  0.546328  0.683137  0.134247  0.444709
9  0.391209  0.765251  0.507668  0.299963  0.348190  0.731980

print df.columns.is_unique

--output:--
True

Practical PostgreSQL, id The id column is a numeric identifier unique to each book. UNIQUE This constraint ensures that the column always has a unique value. NOT NULL This constraint is set automatically by setting the PRIMARY KEY constraint. These include, for example, renaming the table, renaming its columns, and adding new​  Indexing a Column with Unique Constraints. As stated, a column that enforces a uniqueness constraint must be indexed. When the user selects Enforce unique values and clicks OK, a warning dialog is displayed if the column is not already indexed; the user is then given the option to automatically index the column. After a column has been set to

In case anyone needs this in Scala->

def renameDup (Header : String) : String = {

val trimmedList: List[String] = Header.split(",").toList
var fudge =0
var newitem =""
var seen = List[String]()

for (item <- trimmedList){
    fudge = 1
    newitem = item
    for (newitem2 <- seen){
        if (newitem2 == newitem ){
            fudge += 1
            newitem = item + "_" + fudge
        }
    }
    seen= seen :+ newitem
}   
return seen.mkString(",")
}
>>> ['a', 'b', 'a_2', 'a_2_2', 'a_2_3', 'a_3', 'a_2_4', 'a_2_2_2']

Inside Symbian SQL: A Mobile Developer's Guide to SQLite, Whenever you see syntax in brackets, it means that the content is optional. For instance, you can ensure that only unique values are placed in a column by using the column into a column that increments automatically (see Section 4.7.​1). TABLE table { RENAME TO name | ADD COLUMN CREATING A DATABASE 61. You want to rename the columns in a data frame. Start with a sample data frame with three columns: The simplest way is to use rename () from the plyr package: If you don’t want to rely on plyr, you can do the following with R’s built-in functions. Note that these modify d directly; that is, you don’t have to save the result back into d.

I ran into this problem when loading DataFrames from oracle tables. 7stud is right that pd.read_excel() automatically designates duplicated columns with a *.1, but not all of the read functions do this. One work around is to save the DataFrame to a csv (or excel) file and then reload it to re-designate duplicated columns.

data = pd.read_SQL(SQL,connection)
data.to_csv(r'C:\temp\temp.csv')
data=read_csv(r'C:\temp\temp.csv')  

Python, Pandas Dataframe type has two attributes called 'columns' and 'index' which In order to change the column names, we provide a Python list containing the  Rename a column to something more meaningful, for display to users or for use in formulas. A table is a value in PowerApps, just like a string or a number. You can specify a table as an argument in a formula, and functions can return a table as a result.

Altova® DatabaseSpy 2009 User & Reference Manual, Checked column: This column has a check constraint defined which ensures that (available only for columns of type XML) You can rename an index, change, add, Indexes that have been created automatically for primary or unique keys  If you define a UNIQUE index for two or more columns, the combined values in these columns cannot be duplicated in multiple rows. PostgreSQL treats NULL as distinct value, therefore, you can have multiple NULL values in a column with a UNIQUE index. When you define a primary key or a unique constraint for a table, PostgreSQL automatically creates a corresponding UNIQUE index. PostgreSQL UNIQUE index examples. The following statement creates a table called employees :

Select transformation in mapping data flow, Use the select transformation to rename, drop, or reorder columns. use patterns to do rule-based mapping, or enable auto mapping. A regex-mapping condition matches all column names that match the specified regex condition. duplicate settings and provide a new alias for the existing columns. Scroll down to Columns section, click on Title column; Under the Column Name, type in the new name for a column. Click OK at the bottom; Done! The column will now have a new, custom name that makes sense to users! Option 3: Hide it. This mostly applies to SharePoint document libraries again.

Rename a table or field in Power Pivot - Excel, When you rename tables or fields, they will automatically update in any connected the table or column in the Data Model and the change is picked up automatically by reports throughout the workbook. Make sure Power Pivot is enabled. A unique constraint is a rule that restricts column entries to unique. In other words, this type of constraints prevents inserting duplicates into a column. A unique constraint is one of the instruments to enforce data integrity in an SQL Server database. Since a table can have only one primary key,

Comments
  • see solution provided in Panda's DataFrame - renaming multiple identically named columns
  • It's a Google Spreadsheet... neither a CSV or an Excel file. So, in such a case, Pandas doesn't have the kind of behavior you are talking about. Moreover I explicitly said I don't expect a solution like df.columns=['a','a_2']