Python, Remove duplicates and get the max of it based on condition

pandas drop duplicates
pandas remove duplicate rows based on condition
pandas drop duplicates based on condition
pandas drop duplicates not working
python pandas remove duplicates from list
pandas duplicated
python remove duplicates from list of lists
remove duplicates from list python

Imagine the following as the input:

anna-1
anna
anna-0
michael
anna-2

And my expected output:

michael
anna-2

Condition: we will print the string if there's no duplicates for it. Consider the duplicates always contain - in end and after that the total number of duplicates which we will take the max of it.

  1. I suggest going with regular expressions to split each input into a name sub-string and a number sub-string, based on the following pattern that we assume each input follows:

<name>-<number> or just <name>.

Have a look at the re package for details and exact syntax, but this is what my

pattern = "(?P<name>\w+)(?P<number>-\d*)?"

line does.

  1. Going with a dict was indeed a good idea, I use a dictionary to store encountered numbers and gradually keep only the maximum value encountered for each input.

  2. When I'm done analyzing each input, I parse them all again using the .items() method of dictionaries to print out data I want.

Here is the sample code I've come with to sum up:

import re

inputs = ["anna-1", "anna", "anna-0", "michael", "anna-2"]

pattern = "(?P<name>\w+)(?P<number>-\d*)?"

maxNumbers = {} # Remembers the maximum number for each name

# Parse all inputs and split them into name and number
for item in inputs:
    result = re.match(pattern, item)
    # Extract the name
    name = result.group("name")
    # Extract the number (set to zero if there is no number)
    number = result.group("number")
    if number == None:
        number = 0
    else:
        number = int(number[1:])
    # Store the number in the dictionary
    if name not in maxNumbers:
        maxNumbers[name] = number
    else:
        maxNumbers[name] = max(maxNumbers[name], number)

# Parse all names and print their maximum number
for name, maxNumber in maxNumbers.items():
    if maxNumber == 0:
        print(name)
    else:
        print(name + "-" + str(maxNumber))

Note that you didn't specify how the program should react if the input is

anna
anna-0

Should it print anna-0 or just anna? But this you'll be able to fix by yourself.

How to remove duplicate data from python dataframe, Not all data are perfect and we really need to get duplicate data removed the other time you want to delete duplicates based on some random condition. Drop Duplicates in a group but keep the row with maximum value. Sometimes you want to just remove the duplicates from one or more columns and the other time you want to delete duplicates based on some random condition. So we will see in this post how to easily and efficiently you can remove the duplicate data using drop_duplicates() function in pandas. Create Dataframe with Duplicate data

  1. Loop through the list & check if the string has appeared more than once. If it hasn't, print the string.
  2. For more than one occurrence,start checking from the end of the string for the maximum number. Find the index having the maximum number. (You can make another list if you want)

Removing Duplicates #4.1 - Extracting Min and Max values, This post shows how to remove duplicate records and combinations of To do this conditional on a different column's value, you can sort_values(colname) duplicates of the name column, keeping only the observation with the highest age. I recommend reading Python for Data Analysis by Wes McKinney, the creator of  I am trying to remove duplicate customer Ids based on the condition that only if the dates associated with the customer are within 10 days of one another then it should be dropped. The only row which should remain would be the latest date. I know to remove duplicates based on a specific column one would use the following code:

Drop Duplicate Rows in a DataFrame, Set is an un-ordered data structure that contains only unique elements. Now suppose we have a list that contains duplicate elements i.e.. [10, 2  In this article we will discuss different ways to remove duplicate elements from a list in python. List : Containing duplicate elements : [10, 2, 45, 3, 5, 7, 2, 10, 45, 8, 10]

Here's a method using a diction as you attempted

from collections import defaultdict

# Assuming you input names into a list
l = ['anna-1', 'anna', 'anna-0', 'michael', 'anna-2']

# Place list into dictionary, with key as names
# and count as value
d = defaultdict(list)
for i in l:
  name_cnt = i.split('-')
  if len(name_cnt) > 1:
    name, cnt = name_cnt
    d[name].append(int(cnt))
  else:
    k = name_cnt[0]  # no count
    d[k].append(-1)  # use default -1

# Show dictionary d
print(d)

# Show Desired Output
for k, cnts in d.items():
  cnt = max(cnts)
  if cnt == -1: # no versions of name
    print(k)
  else:
    print(f'{k}-{cnt}')

Outputs

Dictionary d

defaultdict(<class 'list'>, 
    {'anna': [1, -1, 0, 2], 
     'michael': [-1]})

Final Result

anna-2
michael

Python : How to Remove Duplicates from a List – thispointer.com, If yes, print the index position and delete the name according to its index, You can either use a second copy of the list for the loop condition or you can Write a Python program to get the largest number from a list without using max() function. '1221'] Output : 2 Write a Python program to remove duplicates from a list. Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys.

You could leverage a defaultdict in combination with a dict comprehension:

from collections import defaultdict

# data in question
data = """
anna-1
anna
anna-0
michael
anna-2"""

# defaultdict
dict_ = defaultdict(int)

dict_ = {name: (number if dict_[name] <= number else dict_[name])
          for line in data.split("\n") if line
          for name, duplicate in [line.split("-") if "-" in line else (line, 0)]
          for number in [int(duplicate)]}

print(dict_)

Comp-Informatic Practices-TB-11-R1, Given a list of tuples, the task is to remove all tuples having duplicate first values visited = set () print ( "List of tuple after removing duplicates:\n " , Output) tuples · Python | Find the tuples containing the given element from a list of tuples and Max value in list of tuples · Python | Summation of tuples in list · Python | Get  I need to remove duplicates based on email address with the following conditions: The row with the latest login date must be selected. The oldest registration date among the rows must be used. I used Python/pandas to do this. How do I optimize the for loop in this pandas script using groupby? I tried hard but I'm still banging my head against it.

Python, Max Heap in Python · Django Basics · Which Python Modules are useful for competitive programming? This article focuses on one of the operations of getting the unique list from a list that contains a possible duplicated. Remove duplicates from list operation has large number of applications and Method 3 : Using set() Sort the List in python; sort a dataframe in pandas; sort a dataframe in pandas by index; Cross tab in pandas; Rank the dataframe in pandas; Drop the duplicate row in pandas; Find the duplicate rows in pandas; Drop the row in pandas with conditions; Drop or delete column in pandas; Get maximum value of column in pandas; Get minimum value of column in pandas

Python - Ways to remove duplicates from list, R · Python · Data Science · Credit Risk · SQL · Excel This tutorial explains how to ignore duplicates while specifying conditions / criteria in SQL queries. You must have used DISTINCT keyword to remove duplicates. Suppose you need to pick the maximum value in variable Y when duplicates in variable "X" and then​  with value specific same remove one example duplicate drop delete columns column based another Remove duplicate values from JS array python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B

SAS SQL : Use Distinct in CASE WHEN, Python max() Function. ❮ Built-in Functions. Example. Return the largest number: x = max(5 

Comments
  • Please show code to prove you've tried solving it yourself.
  • Welcome Anna, usually it is advised to provide a minimal reproducible example that shows what you have tried so far. A question like yours will probably get down voted quickly.
  • @CorentinPane I've tried many different solution such as split and then switch to dict to get rid of duplicate but failed. that's why am waiting for someone to do it from scratch so i can understood the logic of thinking about the problem solving.
  • @TheFool thanks for your advise. i will work on that indeed.
  • it's will never be anna-0 because it's usually be anna and anna-1