Only drop duplicates if number of duplicates is less than X

drop duplicates stata
duplicates drop if stata
excel remove duplicate rows based on one column
excel remove duplicates based on criteria
excel remove duplicates keep first
excel remove duplicates formula
how to remove duplicates in excel 2016
excel remove duplicates not working

I need to drop duplicate rows in my DataFrame only if the number of duplicates is less than x (e.g. 3) (if more than 3 duplicates, keep them !)

Sample:

where count is number of duplicates and duplicates are in col data

data | count
-------------
a    | 1
b    | 2
b    | 2
c    | 1
d    | 3
d    | 3
d    | 3

Desired result:

data | count
-------------
a    | 1
b    | 1
c    | 1
d    | 3
d    | 3
d    | 3

How can i achieve this? Thanks in advance.

IIUC, you could do the following:

# create mask for non-duplicates and groups larger than 3
mask = (df.groupby('data')['count'].transform('count') >= 3) | ~df.duplicated('data')

# filter
filtered = df.loc[mask].drop('count', axis=1)

# reset count column
filtered['count'] = filtered.groupby('data')['data'].transform('count')

print(filtered)

Output

  data  count
0    a      1
1    b      1
3    c      1
4    d      3
5    d      3
6    d      3

FAQ: Identifying and dropping duplicate observations, This FAQ is likely only of interest to users of previous versions of Stata. Case 1: Identifying duplicates based on a subset of variables. You wish to create a new� Hello, I want to drop duplicates values of a variable if the number of copies are less than five. Thank you for your help

I believe you need chain conditions with Series.duplicated and get greater or equal values of N in boolean indexing, last set 1 for count column:

N = 3
df1 = df[~df.duplicated('data') | df['count'].ge(N)].copy()
df1.loc[df['count'] < N, 'count'] = 1
print (df1)
  data  count
0    a      1
1    b      1
3    c      1
4    d      3
5    d      3
6    d      3

Filter for unique values or remove duplicate values, Filter for unique values, remove duplicate values, and conditionally format Keep in touch and stay productive with Teams and Microsoft 365, even when you' re� Replace duplicates with greater than previous duplicate value Given an array of elements and change the array in such a way that all the elements on the array are distinct. if you are replacing a value, then the replacing value should be great than the previous value and after modification sum of the elements should be as less as possible.

N = 3
df['count'] = df['count'].apply(lambda x: 1 if x < N else x)
result = pd.concat([df[df['count'].eq(1)].drop_duplicates(), df[df['count'].eq(N)]])

result

 data  count
0   a   1
1   b   1
3   c   1
4   d   3
5   d   3
6   d   3

Conditional Removal of Duplicates from Excel, Insert a helper column that gives the number of records with that value in column A. Filter for helper column values greater than 1 and Field2 values (your dates) If you need to remove any other duplicates (not just the duplicates with blank� In accordance with previous observations , deletion of a duplicate was found to be significantly less likely to confer an essential phenotype than deletion of a non-duplicate (only about 8% of duplicates are essential versus about 29% of non-duplicates; P < 1 × 10-3, Pearson's χ 2).

duplicated function, duplicated() determines which elements of a vector or data frame are duplicates of and may be the only value accepted for methods other than the default. logical indicating if duplication should be considered from the reverse side, i.e., the For factors it is automatically set to the smaller of length(x) and the number of� I have to admit I did not mention the reason why I was trying to drop duplicated rows based on a column containing set values. The reason is that the set { 'a' , 'b' } is the same as { 'b' , 'a' } so 2 apparently different rows are considered the same regarding the set column and are then deduplicated but this is not possible because sets are unhashable ( like list )

Python - Ways to remove duplicates from list, Remove duplicates from list operation has large number of applications and hence, it's knowledge is good to have [res.append(x) for x in test_list if x not in res]. Given an array nums containing n + 1 integers where each integer is between 1 and n (inclusive), prove that at least one duplicate number must exist. Assume that there is only one duplicate number, find the duplicate one. Example 1: Input: [1,3,4,2,2] Output: 2 Example 2: Input: [3,1,3,4,2] Output: 3. Note:

Finding Duplicates with SQL, Also see How to remove duplicate rows from a table - Microsoft Knowledge base If you like reading about sql, databases, or duplicates then you might also like: How to find out column name of the table by passing only field number to the and also when the duplicate records is less than 5, So i have the below query,� It highlights that 45 duplicates three times and that 252 duplicates twice within the selected range. So now you can quickly find all repeated values in spreadsheet columns or rows by including absolute cell references within the COUNTIF function. Now you can count any number of duplicate values or items on your Excel spreadsheets with COUNTIF.

Comments
  • Why is b=1 in output?
  • Because there's only one left after dropping the duplicates