Python, Remove duplicates and get the max of it based on condition

Imagine the following as the input:


And my expected output:


Condition: we will print the string if there's no duplicates for it. Consider the duplicates always contain - in end and after that the total number of duplicates which we will take the max of it.

  1. I suggest going with regular expressions to split each input into a name sub-string and a number sub-string, based on the following pattern that we assume each input follows:

<name>-<number> or just <name>.

Have a look at the re package for details and exact syntax, but this is what my

pattern = "(?P<name>\w+)(?P<number>-\d*)?"

line does.

  1. Going with a dict was indeed a good idea, I use a dictionary to store encountered numbers and gradually keep only the maximum value encountered for each input.

  2. When I'm done analyzing each input, I parse them all again using the .items() method of dictionaries to print out data I want.

Here is the sample code I've come with to sum up:

import re

inputs = ["anna-1", "anna", "anna-0", "michael", "anna-2"]

pattern = "(?P<name>\w+)(?P<number>-\d*)?"

maxNumbers = {} # Remembers the maximum number for each name

# Parse all inputs and split them into name and number
for item in inputs:
    result = re.match(pattern, item)
    # Extract the name
    name ="name")
    # Extract the number (set to zero if there is no number)
    number ="number")
    if number == None:
        number = 0
        number = int(number[1:])
    # Store the number in the dictionary
    if name not in maxNumbers:
        maxNumbers[name] = number
        maxNumbers[name] = max(maxNumbers[name], number)

# Parse all names and print their maximum number
for name, maxNumber in maxNumbers.items():
    if maxNumber == 0:
        print(name + "-" + str(maxNumber))

Note that you didn't specify how the program should react if the input is


Should it print anna-0 or just anna? But this you'll be able to fix by yourself.

  1. Loop through the list & check if the string has appeared more than once. If it hasn't, print the string.
  2. For more than one occurrence,start checking from the end of the string for the maximum number. Find the index having the maximum number. (You can make another list if you want)

Here's a method using a diction as you attempted

from collections import defaultdict

# Assuming you input names into a list
l = ['anna-1', 'anna', 'anna-0', 'michael', 'anna-2']

# Place list into dictionary, with key as names
# and count as value
d = defaultdict(list)
for i in l:
  name_cnt = i.split('-')
  if len(name_cnt) > 1:
    name, cnt = name_cnt
    k = name_cnt[0]  # no count
    d[k].append(-1)  # use default -1

# Show dictionary d

# Show Desired Output
for k, cnts in d.items():
  cnt = max(cnts)
  if cnt == -1: # no versions of name


Dictionary d

defaultdict(<class 'list'>, 
    {'anna': [1, -1, 0, 2], 
     'michael': [-1]})

Final Result


Python : How to Remove Duplicates from a List –, If yes, print the index position and delete the name according to its index, You can either use a second copy of the list for the loop condition or you can Write a Python program to get the largest number from a list without using max() function. '1221'] Output : 2 Write a Python program to remove duplicates from a list. Create a dictionary, using the List items as keys. This will automatically remove any duplicates because dictionaries cannot have duplicate keys.

You could leverage a defaultdict in combination with a dict comprehension:

from collections import defaultdict

# data in question
data = """

# defaultdict
dict_ = defaultdict(int)

dict_ = {name: (number if dict_[name] <= number else dict_[name])
          for line in data.split("\n") if line
          for name, duplicate in [line.split("-") if "-" in line else (line, 0)]
          for number in [int(duplicate)]}


  • Please show code to prove you've tried solving it yourself.
  • Welcome Anna, usually it is advised to provide a minimal reproducible example that shows what you have tried so far. A question like yours will probably get down voted quickly.
  • @CorentinPane I've tried many different solution such as split and then switch to dict to get rid of duplicate but failed. that's why am waiting for someone to do it from scratch so i can understood the logic of thinking about the problem solving.
  • @TheFool thanks for your advise. i will work on that indeed.
  • it's will never be anna-0 because it's usually be anna and anna-1