create pandas dataframe with repeating values

create pandas dataframe with repeating values

pandas repeat value in column
pandas create dataframe
pandas dataframe insert row
pandas add duplicate column
repeat a value n times in pandas
pandas create dataframe with index
pandas repeat rows based on column value
pandas groupby repeat index

I am trying to create a pandas df that looks like:

   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

To implement, I am for now creating two dataframes

df1 = pd.DataFrame({'AAA' : [4] * 2 , 'BBB' : [10,20], 'CCC' : [100,50]})
df2 = pd.DataFrame({'AAA': [5]*2, 'BBB' : [30,40],'CCC' : [-30,-50]})

and then appending rows of df2 to df1 to create the desired df

I tried to do

df = pd.DataFrame({'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' :
 [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

But I get an error with the key message:

ValueError('arrays must all be same length') ValueError: arrays must all be the same length

I can of course do:

df = pd.DataFrame({'AAA' : [4,4,5,5], 'BBB' : [10,20,30,40],'CCC' :
 [100,50,-30,-50]}); df

But is there not another elegant way to do this? This small example is easy to implement but if I want to scale up to many rows, the input becomes very long.


I believe you need join lists by +:

df = pd.DataFrame({'AAA' : [4]*2 + [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

Or use repeat with concatenate:

df = pd.DataFrame({'AAA' :  np.concatenate([np.repeat(4, 2), np.repeat(5, 2)]),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

Alternative:

df = pd.DataFrame({'AAA' :  np.repeat((4,5), (2, 2)),
                   'BBB' : [10,20,30,40],
                   'CCC' : [100,50,-30,-50]})

print (df)
   AAA  BBB  CCC
0    4   10  100
1    4   20   50
2    5   30  -30
3    5   40  -50

pandas.Series.repeat, Must be None . Has no effect but is accepted for compatibility with numpy. Returns. Series. Newly created Series with repeated elements. import pandas as pd import numpy as np s = pd.Series(np.tile(['aaa', 'bbb', 'ccc', 'ddd'], 20) ) print(s.shape) # size - 80 rows (80,) print(s.head(10)) # shows only the first 10 rows 0 aaa 1 bbb 2 ccc 3 ddd 4 aaa 5 bbb 6 ccc 7 ddd 8 aaa 9 bbb dtype: object If you want a dataframe instead, you would do:


For a general solution you could do:

import pandas as pd

data = [(4, 2), (5, 2)]
df = pd.DataFrame({'AAA' : [value for value, reps in data for _ in range(reps)], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df)

Where data is a list of value, repetitions tuple. So for your particular example you have 4 with 2 repetitions and 5 with 2 repetitions hence [(4, 2), (5, 2)].

Repeat or replicate the rows of dataframe in pandas python (create , Repeat or replicate the rows of dataframe in pandas python (create duplicate rows) can be done in a roundabout way by using concat() function. example of .. pandas.Series.repeat¶ Series.repeat (self, repeats, axis = None) [source] ¶ Repeat elements of a Series. Returns a new Series where each element of the current Series is repeated consecutively a given number of times. Parameters repeats int or array of ints. The number of repetitions for each element. This should be a non-negative integer.


The error you get is quite clear. When you create a dataframe from a dictionary, all of the arrays must be the same length. When you create a dictionary, if you give the same key multiple time, the last one is used. So

{'AAA' : [4] * 2, 'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

is the same as

{'AAA': [5]*2, 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}

When you try to create a dataframe from that dictionnary, you want one column with 2 rows and 2 columns with 4 rows, hence the error. As @jezrael pointed out, you can create the desired column for 'AAA' by joining the list and then creating the dataframe from that list.

Python, As shown in the output image, every string in the series was repeated twice. Example #2: Different values for each string. In this example, a sample data frame of  Repeat or replicate the rows of dataframe in pandas python: Repeat the dataframe 3 times with concat function. Ignore_index=True does not repeat the index. So new index will be created for the repeated columns ''' Repeat without index ''' df_repeated = pd.concat([df1]*3, ignore_index=True) print(df_repeated) So the resultant dataframe will be


Python, It's default value is none. After passing columns, it will consider them only for duplicates. keep: Controls how to consider duplicate value. It has only three distinct  Method 1: typing values in Python to create Pandas DataFrame. To create Pandas DataFrame in Python, you can follow this generic template: import pandas as pd data = {'First Column Name': ['First value', 'Second value',], 'Second Column Name': ['First value', 'Second value',], . } df = pd.DataFrame (data, columns = ['First Column Name','Second Column Name',])


Pandas : Find duplicate rows in a Dataframe based on all or , In Python's Pandas library, Dataframe class provides a member first : All duplicates except their first occurrence will be marked as True; last  An important part of Data analysis is analyzing Duplicate Values and removing them. Pandas duplicated() method helps in analyzing duplicate values only. It returns a boolean series which is True only for Unique elements. Syntax: DataFrame.duplicated(subset=None, keep='first') Parameters:


Pandas Series: repeat() function, Has no effect but is accepted for compatibility with numpy. Default Value: None, Required. Returns: Series- Newly created Series with repeated  We will see three ways to get dataframe from lists. 1. Create pandas dataframe from lists using dictionary. One approach to create pandas dataframe from one or more lists is to create a dictionary first. Let us make a dictionary with two lists such that names as keys and the lists as values.