Pandas, copy rows whose names are repeated N times
pandas dataframe repeat
pandas duplicate rows based on value
pandas replicate rows
pandas repeat rows based on column value
repeat a value n times in pandas
copy rows from dataframe pandas
copy row to new dataframe pandas
Example data:
df1 = pd.DataFrame({ 'file': ['file1','file1','file1','file2','file2','file2','file3','file3','file4'], 'prop1': [True,False,True,False,False,False,True,False,False], 'prop2': [False,False,False,False,True,False,False,True,False], 'prop3': [False,True,False,True,False,True,False,False,True] })
I need to copy rows whose 'file' repeat 3 times, to get something like this:
file prop1 prop2 prop3 0 file1 True False False 1 file1 False False True 2 file1 True False False 3 file2 False False True 4 file2 False True False 5 file2 False False True
Use GroupBy.transform
for Series with same size like column, so possible filter by boolean indexing
:
df = df1[df1.groupby('file')['file'].transform('size') == 3]
Detail:
print (df1.groupby('file')['file'].transform('size')) 0 3 1 3 2 3 3 3 4 3 5 3 6 2 7 2 8 1 Name: file, dtype: int64
Or use filtration
:
df = df1.groupby('file').filter(lambda x: len(x) == 3)
Or use Series.map
with Series.value_counts
:
df = df1[df1['file'].map(df['file'].value_counts()) == 3]
print (df) file prop1 prop2 prop3 0 file1 True False False 1 file1 False False True 2 file1 True False False 3 file2 False False True 4 file2 False True False 5 file2 False False True
Python Pandas replicate rows in dataframe, reps = [3 if val=="b" else 1 for val in df.col1] df.loc[np.repeat(df.index.values, reps)]. You could replace the 3 if val=="b" else 1 in the list interpretation with another df = df_try for i in range(4): df = df.append(df_try) # Here, we have df_try times 5 df Name. Email. By clicking “Post Your Answer”, you agree to our terms of Repeat or replicate the rows of dataframe in pandas python: Repeat the dataframe 3 times with concat function. Ignore_index=True does not repeat the index. So new index will be created for the repeated columns ''' Repeat without index ''' df_repeated = pd.concat([df1]*3, ignore_index=True) print(df_repeated) So the resultant dataframe will be
You could groupby
the file
name, transform with the size
and use the result to index the dataframe:
df1[df1.groupby('file').prop1.transform('size').eq(3)] file prop1 prop2 prop3 0 file1 True False False 1 file1 False False True 2 file1 True False False 3 file2 False False True 4 file2 False True False 5 file2 False False True
Repeat or replicate the rows of dataframe in pandas python (create , Repeat or replicate the dataframe in pandas along with index. Repeat the dataframe 3 times with concat function. pandas python (create duplicate rows) n Repeat or replicate the rows of dataframe in pandas python (create duplicate rows). pandas.Series.repeat¶ Series.repeat (self, repeats, axis = None) [source] ¶ Repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. Parameters repeats int or array of ints. The number of repetitions for each element. This should be a non-negative integer.
IIUC transform
df=df1[df1.groupby('file')['file'].transform('count').eq(3)].copy() # esure you do not have copy warning for future modify . file prop1 prop2 prop3 0 file1 True False False 1 file1 False False True 2 file1 True False False 3 file2 False False True 4 file2 False True False 5 file2 False False True
pandas.Series.repeat, Repeating 0 times will return an empty Series. axisNone. Must be None . Has no effect but is accepted for compatibility with numpy. import pandas as pd import numpy as np df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3)) Say you want to replicate the rows where col1="b". reps = [3 if val=="b" else 1 for val in df.col1] df.loc[np.repeat(df.index.values, reps)]
Also you can use DataFrame.set_index + DataFrame.loc:
new_df=df1.set_index('file').loc[df1.groupby('file').size().eq(3)] print(new_df)
prop1 prop2 prop3 file file1 True False False file1 False False True file1 True False False file2 False False True file2 False True False file2 False False True
pandas.Series.str.repeat, Duplicate each string in the Series or Index. Parameters: repeats : int or sequence of int. Same value for all ( Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.repeat() function repeat elements of an Index. The function returns a new index where each element of the current index is repeated consecutively a given number of times. Syntax: Index.repeat(repeats, *args, **kwargs) Parameters :
pandas.DataFrame.drop_duplicates, Return DataFrame with duplicate rows removed. Indexes, including time indexes are ignored. If True, the resulting axis will be labeled 0, 1, …, n - 1. Each row should be repeated n times, where n is a field of each row. Delete column from pandas DataFrame using del df.column_name “Large data” work flows
Pandas Series: repeat() function, Pandas Series - repeat() function: The repeat() function is used to repeat elements of a Series. element of the current Series is repeated consecutively a given number of times. Name, Description, Type/Default Value, Required / Optional Returns: Series- Newly created Series with repeated elements. pandas.DataFrame.copy¶ DataFrame.copy (self: ~ FrameOrSeries, deep: bool = True) → ~FrameOrSeries [source] ¶ Make a copy of this object’s indices and data. When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in
Python, Output : As we can see in the output, the function has returned a new index with all the values repeated 2 times. One important thing to Find Duplicate Rows based on all columns. To find & select the duplicate all rows based on all columns call the Daraframe.duplicate () without any subset argument. It will return a Boolean series with True at the place of each duplicated rows except their first occurrence (default value of keep argument is ‘first’ ).