Pandas, copy rows whose names are repeated N times

pyspark duplicate rows n times
pandas dataframe repeat
pandas duplicate rows based on value
pandas replicate rows
pandas repeat rows based on column value
repeat a value n times in pandas
copy rows from dataframe pandas
copy row to new dataframe pandas

Example data:

df1 = pd.DataFrame({
    'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file4'],
    'prop1': [True,False,True,False,False,False,True,False,False],
    'prop2': [False,False,False,False,True,False,False,True,False],
    'prop3': [False,True,False,True,False,True,False,False,True]
})

I need to copy rows whose 'file' repeat 3 times, to get something like this:

file    prop1   prop2   prop3
0   file1   True    False   False
1   file1   False   False   True
2   file1   True    False   False
3   file2   False   False   True
4   file2   False   True    False
5   file2   False   False   True

Use GroupBy.transform for Series with same size like column, so possible filter by boolean indexing:

df = df1[df1.groupby('file')['file'].transform('size') == 3]

Detail:

print (df1.groupby('file')['file'].transform('size'))
0    3
1    3
2    3
3    3
4    3
5    3
6    2
7    2
8    1
Name: file, dtype: int64

Or use filtration:

df = df1.groupby('file').filter(lambda x: len(x) == 3)

Or use Series.map with Series.value_counts:

df = df1[df1['file'].map(df['file'].value_counts()) == 3]

print (df)
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

Python Pandas replicate rows in dataframe, reps = [3 if val=="b" else 1 for val in df.col1] df.loc[np.repeat(df.index.values, reps​)]. You could replace the 3 if val=="b" else 1 in the list interpretation with another df = df_try for i in range(4): df = df.append(df_try) # Here, we have df_try times 5 df Name. Email. By clicking “Post Your Answer”, you agree to our terms of  Repeat or replicate the rows of dataframe in pandas python: Repeat the dataframe 3 times with concat function. Ignore_index=True does not repeat the index. So new index will be created for the repeated columns ''' Repeat without index ''' df_repeated = pd.concat([df1]*3, ignore_index=True) print(df_repeated) So the resultant dataframe will be

You could groupby the file name, transform with the size and use the result to index the dataframe:

df1[df1.groupby('file').prop1.transform('size').eq(3)]

   file   prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

Repeat or replicate the rows of dataframe in pandas python (create , Repeat or replicate the dataframe in pandas along with index. Repeat the dataframe 3 times with concat function. pandas python (create duplicate rows) n Repeat or replicate the rows of dataframe in pandas python (create duplicate rows). pandas.Series.repeat¶ Series.repeat (self, repeats, axis = None) [source] ¶ Repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. Parameters repeats int or array of ints. The number of repetitions for each element. This should be a non-negative integer.

IIUC transform

df=df1[df1.groupby('file')['file'].transform('count').eq(3)].copy() # esure you do not have copy warning for future modify .
    file  prop1  prop2  prop3
0  file1   True  False  False
1  file1  False  False   True
2  file1   True  False  False
3  file2  False  False   True
4  file2  False   True  False
5  file2  False  False   True

pandas.Series.repeat, Repeating 0 times will return an empty Series. axisNone. Must be None . Has no effect but is accepted for compatibility with numpy. import pandas as pd import numpy as np df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3)) Say you want to replicate the rows where col1="b". reps = [3 if val=="b" else 1 for val in df.col1] df.loc[np.repeat(df.index.values, reps)]

Also you can use DataFrame.set_index + DataFrame.loc:

new_df=df1.set_index('file').loc[df1.groupby('file').size().eq(3)]
print(new_df)

     prop1  prop2  prop3
file                      
file1   True  False  False
file1  False  False   True
file1   True  False  False
file2  False  False   True
file2  False   True  False
file2  False  False   True

pandas.Series.str.repeat, Duplicate each string in the Series or Index. Parameters: repeats : int or sequence of int. Same value for all (  Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.repeat() function repeat elements of an Index. The function returns a new index where each element of the current index is repeated consecutively a given number of times. Syntax: Index.repeat(repeats, *args, **kwargs) Parameters :

pandas.DataFrame.drop_duplicates, Return DataFrame with duplicate rows removed. Indexes, including time indexes are ignored. If True, the resulting axis will be labeled 0, 1, …, n - 1. Each row should be repeated n times, where n is a field of each row. Delete column from pandas DataFrame using del df.column_name “Large data” work flows

Pandas Series: repeat() function, Pandas Series - repeat() function: The repeat() function is used to repeat elements of a Series. element of the current Series is repeated consecutively a given number of times. Name, Description, Type/Default Value, Required / Optional Returns: Series- Newly created Series with repeated elements. pandas.DataFrame.copy¶ DataFrame.copy (self: ~ FrameOrSeries, deep: bool = True) → ~FrameOrSeries [source] ¶ Make a copy of this object’s indices and data. When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in

Python, Output : As we can see in the output, the function has returned a new index with all the values repeated 2 times. One important thing to  Find Duplicate Rows based on all columns. To find & select the duplicate all rows based on all columns call the Daraframe.duplicate () without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is ‘first’ ).