Generating a custom ID based on other columns in python
pandas create new column based on condition
create pandas column with new values based on values in other columns
pandas create unique id column
pandas set column value based on other column
pandas dataframe apply function with arguments
create new column in dataframe based on other columns python
pandas apply
I have a pandas df which looks like this
UID DOB BEDNUM 0 1900-01-01 CICU1 1 1927-05-21 CICU1 2 1929-10-03 CICU1 3 1933-06-29 CICU1 4 1936-01-09 CICU1 5 1947-11-14 CICU1 6 1900-01-01 CICU1 7 1927-05-21 CICU1 8 1929-10-03 CICU1 9 1933-06-29 CICU1 10 1936-01-09 CICU1 11 1947-11-14 CICU1
Now I would like to add a new column TID to that data frame which should be in 'YYYY-0000000-P' format
UID DOB BEDNUM TID 0 1900-01-01 CICU1 1900-0000000-P 1 1927-05-21 CICU1 1927-0000001-P 2 1929-10-03 CICU1 1929-0000002-P 3 1933-06-29 CICU1 1933-0000003-P 4 1936-01-09 CICU1 1936-0000004-P 5 1947-11-14 CICU1 1947-0000005-P 6 1900-01-01 CICU1 1900-0000006-P 7 1927-05-21 CICU1 1927-0000007-P 8 1929-10-03 CICU1 1929-0000008-P 9 1933-06-29 CICU1 1933-0000009-P 10 1936-01-09 CICU1 1936-0000010-P 11 1947-11-14 CICU1 1947-0000011-P
I have 24000 records in a table and the last record TID should look like 'YYYY-0024000-P'.
I would really appreciate if anyone could help me with this. Thanks in advance!!
Here's one way using Pandas str
methods:
df['DOB'] = pd.to_datetime(df['DOB']) # convert DOB to datetime if necessary df['TID'] = df['DOB'].dt.year.astype(str) + '-' + df['UID'].astype(str).str.zfill(7) + '-P' print(df) UID DOB BEDNUM Year TID 0 0 1900-01-01 CICU1 1900 1900-0000000-P 1 1 1927-05-21 CICU1 1927 1927-0000001-P 2 2 1929-10-03 CICU1 1929 1929-0000002-P 3 3 1933-06-29 CICU1 1933 1933-0000003-P 4 4 1936-01-09 CICU1 1936 1936-0000004-P 5 5 1947-11-14 CICU1 1947 1947-0000005-P 6 6 1900-01-01 CICU1 1900 1900-0000006-P 7 7 1927-05-21 CICU1 1927 1927-0000007-P 8 8 1929-10-03 CICU1 1929 1929-0000008-P 9 9 1933-06-29 CICU1 1933 1933-0000009-P 10 10 1936-01-09 CICU1 1936 1936-0000010-P 11 11 1947-11-14 CICU1 1947 1947-0000011-P
Deriving New Columns & Defining Python Functions, Make new columns from existing data and build custom functions. functions using parameters and arguments; Apply functions to DataFrames using .apply(); Select multiple columns You can do this by creating a derived column based on the values in the platform column. REPLACE-WITH-DYANMIC-VENDOR-ID I have a pandas dataframe with one column showing currencies and another showing prices. I want to create a new column that standardizes the prices to USD based on the values from the other two columns. eg. currency price SGD 100 USD 80 EUR 75 the new column would have conditions similar to
This answer assumes that DOB
is datetime
:
year = df.DOB.dt.year nums = df.UID.astype(str).str.zfill(7) df.assign(TID=[f'{y}-{num}-P' for y, num in zip(year, nums)])
UID DOB BEDNUM TID 0 0 1900-01-01 CICU1 1900-0000000-P 1 1 1927-05-21 CICU1 1927-0000001-P 2 2 1929-10-03 CICU1 1929-0000002-P 3 3 1933-06-29 CICU1 1933-0000003-P 4 4 1936-01-09 CICU1 1936-0000004-P 5 5 1947-11-14 CICU1 1947-0000005-P 6 6 1900-01-01 CICU1 1900-0000006-P 7 7 1927-05-21 CICU1 1927-0000007-P 8 8 1929-10-03 CICU1 1929-0000008-P 9 9 1933-06-29 CICU1 1933-0000009-P 10 10 1936-01-09 CICU1 1936-0000010-P 11 11 1947-11-14 CICU1 1947-0000011-P
Pandas create unique id for each row, Please mark as solution if this worksI'd like to apply a function with multiple returns to We will create a column 'id' based on the index of the pandas DataFrame. My recent experience in creating a Custom Unique ID in Google Sheets using Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question. Provide details and share your research! But avoid … Asking for help, clarification, or responding to other answers. Making statements based on opinion; back them up with references or personal experience. To learn more, see our tips on writing great
Another way using .str
accessor:
year = df.DOB.str.split('-').str[0] padded_uid = df.UID.astype(str).str.pad(7, 'left', '0') df['TID'] = year + '-' + padded_uid + '-P'
How do I add a column to a Pandas dataframe based on other rows , One can create a new dataframe having only first entries of new ID, copying num to new column y and merging this with original dataframe: Using iterrows() though is usually a “last resort”.If you’re using it more often than not there is a better way. DataFrame.apply() We can use DataFrame.apply to apply a function to all columns axis=0 (the default) or axis=1 rows.
Create new data frames from existing data frame based on unique , You can groupby company_id column and convert its result into a dictionary of DataFrames: import pandas as pd df = pd.DataFrame({ I want a way where it is possible to identify individual entries to SharePoint Custom list easily. This will require a unique ID for each entry in the SharePoint List. One easy way to do it is just to use the ID field that is a default field in a SharePoint List which is a basically the sequence number based on the order of creation in a list.
10 Python Pandas tricks that make your work more efficient, Pandas is a widely used Python package for structured data. If we'd like to create a new column with a few other columns as inputs, apply function would be Takeaway: Don't use apply if you can get the same work done with other built-in functions (they're often faster). select rows with specific IDs. While working with data in Pandas, we perform a vast array of operations on the data to get the data in the desired form. One of these operations could be that we want to create new columns in the DataFrame based on the result of some operations on the existing columns in the DataFrame.
Pandas create unique id for each row, In this article we will discuss different ways to select rows and columns in My recent experience in creating a Custom Unique ID in Google Sheets using Google We will create a column 'id' based on the index of the pandas DataFrame. Hi, I am reading in multiple log files at once and need to create a unique ID for each row to keep information from each log grouped together. A sample of what I want my data to look like is below date time status status_message
Comments
- Should be faster, because it is vectorized?
- @MisterMonk, No, Pandas
str
methods are not vectorised. A list comprehension with built-instr
methods only + f-strings is probably faster. - Ah nice to know