Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension

I am trying to add column headers to csv file that I have parsed into a dataframe withing Pandas.

dfTrades = pd.read_csv('pnl1.txt',delim_whitespace=True,header=None,);
dfTrades = dfTrades.drop(dfTrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1)     # Note: zero indexed
dfTrades = dfTrades.set_index([dfTrades.index]);
df = pd.DataFrame(dfTrades,columns=['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']);


print df

The structure of the data is:

01/10/2015 05:47.3  190 190 -648 838 EURNOK -648 0  0 611   -1137   -648 H 2     -1000000   9.465   -648    INTERNAL    IOC 287 AS

What Pandas returns is:

  TradeDate  TradeTime  CumPnL  DailyCumPnL  RealisedPnL  UnRealisedPnL  \
0            NaN        NaN     NaN          NaN          NaN            NaN   ...

I would appreciate any advice on the issue.

Thanks

Ps. Thanks to Ed for his answer. I have tried your suggestion with

df = dfTrades.columns=['TradeDate',
                   'TradeTime',
                   'CumPnL',
                   'DailyCumPnL',
                   'RealisedPnL',
                   'UnRealisedPnL',
                   'CCYCCY',
                   'CCYCCYPnLDaily',
                   'Position',
                   'CandleOpen',
                   'CandleHigh',
                   'CandleLow',
                   'CandleClose',
                   'CandleDir',
                   'CandleDirSwings',
                   'TradeAmount',
                   'Rate',
                   'PnL/Trade',
                   'Venue',
                   'OrderType',
                   'OrderID'
                   'Code'];

But now the problem has morphed to:

 ValueError: Length mismatch: Expected axis has 22 elements, new values have     21 elements

I have taken the shape of the matrix and got: dfTrades.shape

(12056, 22)

So sadly i still need some help :(

Assign directly to the columns:

df.columns = ['TradeDate',
                                      'TradeTime',
                                      'CumPnL',
                                      'DailyCumPnL',
                                      'RealisedPnL',
                                      'UnRealisedPnL',
                                      'CCYCCY',
                                      'CCYCCYPnLDaily',
                                      'Position',
                                      'CandleOpen',
                                      'CandleHigh',
                                      'CandleLow',
                                      'CandleClose',
                                      'CandleDir',
                                      'CandleDirSwings',
                                      'TradeAmount',
                                      'Rate',
                                      'PnL/Trade',
                                      'Venue',
                                      'OrderType',
                                      'OrderID'
                                      'Code']

What you're doing is reindexing and because the columns don't agree get all NaNs as you're passing the df as the data it will align on existing column names and index values.

You can see the same semantic behaviour here:

In [240]:
df = pd.DataFrame(data= np.random.randn(5,3), columns = np.arange(3))
df

Out[240]:
          0         1         2
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

In [242]:
df1 = pd.DataFrame(df, columns = list('abc'))
df1

Out[242]:
    a   b   c
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

Alternatively you can pass the np array as the data:

df = pd.DataFrame(dfTrades.values,columns=['TradeDate',

In [244]:
df1 = pd.DataFrame(df.values, columns = list('abc'))
df1

Out[244]:
          a         b         c
0  1.037216  0.761995  0.153047
1 -0.602141 -0.114032 -0.323872
2 -1.188986  0.594895 -0.733236
3  0.556196  0.363965 -0.893846
4  0.547791 -0.378287 -1.171706

How to create new columns derived from existing columns?, To create a new column, use the [] brackets with the new column name at the left side The rename() function can be used for both row labels and column labels. The mapping should not be restricted to fixed names only, but can be a mapping Create a new column by assigning the output to the DataFrame with a new� Original DataFrame Name Age City Country a jack 34 Sydeny Australia b Riti 30 Delhi India c Vikas 31 Mumbai India d Neelu 32 Bangalore India e John 16 New York US f Mike 17 las vegas US *** Add column to Dataframe in Pandas using operator [] **** ****Pandas: Add new column to Dataframe with values in a list ***** Modified DataFrame Name Age

You can try this way: You can use names directly in the read_csv

names : array-like, default None List of column names to use. If the file contains no header row, then you should explicitly pass header=None

Cov = pd.read_csv("path/to/file.txt", sep='\t', 
                  names = ["Sequence", "Start", "End", "Coverage"])
Frame=pd.DataFrame([Cov], columns = ["Sequence", "Start", "End", "Coverage"])

this answer.

10. Working with Data I: Data Cleaning — Computational Economics , We first import this file using the pd.read_csv command from the Pandas The DataFrame has an index for each row and a column header. NaN,6,10], index= df.index) # or insert new column at specific location The outer merge method merges by team whenever possible but keeps the info from both DataFrames, even� pandas.DataFrame.add¶ DataFrame.add (other, axis = 'columns', level = None, fill_value = None) [source] ¶ Get Addition of dataframe and other, element-wise (binary operator add). Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

you need to do dfTrades.values instead of dfTrades when passing to pandas pd.DataFrame.

column_names= ['TradeDate',
               'TradeTime',
               'CumPnL',
               'DailyCumPnL',
               'RealisedPnL',
               'UnRealisedPnL',
               'CCYCCY',
               'CCYCCYPnLDaily',
               'Position',
               'CandleOpen',
               'CandleHigh',
               'CandleLow',
               'CandleClose',
               'CandleDir',
               'CandleDirSwings',
               'TradeAmount',
               'Rate',
               'PnL/Trade',
               'Venue',
               'OrderType',
               'OrderID'
               'Code']


df1 = pd.DataFrame(dfTrades.values, columns = column_names )

df1.head()

Combining DataFrames with Pandas – Data Analysis and , Combine data from multiple files into a single DataFrame using merge and concat. 2 NA M 32 NaN 1 2 7 16 1977 3 NA M 33 NaN 2 3 7 16 1977 2 DM F 37 NaN 3 4 We can use the concat function in pandas to append either columns or rows It will automatically detect whether the column names are the same and will� Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension. Adding new column to existing DataFrame in Python pandas.

How to add header row to a Pandas DataFrame, Add Pandas DataFrame header row (Pandas DataFrame column names) by using dataframe.columns. We can also add header row to dataframe� Using the designated header does select only that column (i.e. print(df['z']) does only print that one column (supposedly) but all of the data in the DataFrame, that displays just fine (i.e. shows the above sample lines exactly and detects the columns properly) when I do not specify columns, suddenly becomes "NaN" when I specify column titles

How to fill a Pandas DataFrame row by row in Python, Start with an empty DataFrame with column names, and use a for-loop with pandas.DataFrame.loc to iteratively add a list as a new row in the DataFrame. print(df). Output. A B x NaN NaN y NaN NaN. df.loc['y'] = [1, 2]. print(df). Output. If I add two columns to create a third, any columns containing NaN (representing missing data in my world) cause the resulting output column to be NaN as well. Is there a way to skip NaNs without

How to create an empty column in a Pandas dataframe in Python, nan as NaN to create a column named new_column containing only NaN values in the Dataframe . print(df). Output. column1 column2 0� Values considered “missing”¶ As data comes in many shapes and forms, pandas aims to be flexible with regard to handling missing data. While NaN is the default missing value marker for reasons of computational speed and convenience, we need to be able to easily detect this value with data of different types: floating point, integer, boolean, and general object.

Comments
  • Your last error is clear you have 22 columns but you're trying to pass a list of column names of 21, it's unclear what you expect this to do: dfTrades.set_index([dfTrades.index]);
  • So what fixed your problem?
  • Hi EdChum - Thank you for your help. Problem fixed, sorry for coming back late.