How to take items in an index as columns in pandas

pandas get row by index
pandas select columns by condition
pandas iloc
pandas index to column
pandas get column name by index
pandas get row number
pandas filter by index
pandas dataframe

I have a pandas data-frame where the information in the index are headers for each column. This one is tricky because each row has data that the other rows may not have. I am trying to replace those values as None. I use the loc method and separated them by index, but I am concern of mismatch data since certain rows do not have all the data like the others.

1. **INDEX**       1.  **INFO**
2. address         2. 123 APPLE STREET
3. phone           555-5555
4. name            APPLE STORE
5. website         APPLE.COM
6. type            BUSINESS
7. address         456 peach ave
8. phone           777-7777
9. name            PEACH STORE
10. website         PEACH.COM
11. type            BUSINESS
12. address         789 banana rd
13. phone           999-9999
14. name            banana store
15. type            BUSINESS

I WANT TO CONVERT TO THIS

20.  **ADDRESS**      **PHONE**      **NAME**    **WEBSITE**        **TYPE**
21.  123 APPLE ST   555-5555        APPLE STORE     APPLE.COM        BUSINESS
22.  345 PEACH AVE  777-7777        PEACH STORE     PEACH.COM        BUSINESS
23.  789 banana rd  999-9999        banana store    None             BUSINESS

This is what I have tried. Import Pandas as pd

address = data.loc['formatted_address']
name = data.loc['name']
phone_number = data.loc['formatted_phone_number']
website = data.loc['website']
field = data.loc['types']
newdf = pd.DataFrame(address['info'])
newdf['Name'] = name['info'].to_list()
newdf['website'] = website['info'].to_list()  # 
print(newdf)

When I print the newdf I get this error. ValueError: Length of values does not match length of index

It is clearly telling me that one column is not same length as the other and that is because of the website. The length for Name is 20 and website is 19. I am trying to find a way to get that make that missing data as None, so the length will be in even again.

Update: I think it would help to show some of the work I put in to make the Data Frame work. This is a google api that I am trying to adjust for a pandas data frame. The raw data is json and when extracting it, it was easy because I use some if statements and I am good to go. However, I am using a for loop which gives me a list of dictionaries and I couldn't use the dictionary .get method to extract columns. So, I append the key and values of the dictionary and put it to separate list.

df1=[]
df2=[]
address= "19.43217, -100.542581"
geocode_url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={}&radius=5000&types=bakery".format(address)
if api_key is not None:
    geocode_url = geocode_url + "&key={}".format(api_key)

    # Ping google for the reuslts:
    results = requests.get(geocode_url)
    # Results will be in JSON format - convert to dict using requests functionality
    places_result = results.json()
#     Ping google for the results:
    for place in places_result['results']:
        my_place_id= place['place_id']
        my_fields= ['name', 'formatted_phone_number','website', 'type','formatted_address']
        place_details= gmaps.place(place_id= my_place_id, fields= my_fields)

#         df1.append(place_details['result'])
#         print(place_details['result'].values())
        for key, value in place_details['result'].items():

            df1.append(key)
            df2.append(value)
data=pd.DataFrame(df2,df1)
data.columns=['info']
data

You could create a grouping variable, then reshape using pivot

df.assign(grp=df.iloc[:,0].str.contains('address').cumsum()).pivot('grp','INDEX','INFO')
Out: 
INDEX              address          name     phone      type    website
grp                                                                    
1      2. 123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
2            456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
3            789 banana rd  banana store  999-9999  BUSINESS        NaN

how your df looks like:

     INDEX                 INFO
0   address  2. 123 APPLE STREET
1     phone             555-5555
2      name          APPLE STORE
3   website            APPLE.COM
4      type             BUSINESS
5   address        456 peach ave
6     phone             777-7777
7      name          PEACH STORE
8   website            PEACH.COM
9      type             BUSINESS
10  address        789 banana rd
11    phone             999-9999
12     name         banana store
13     type             BUSINESS

pandas.DataFrame.items — pandas 1.1.0 documentation, Iterates over the DataFrame columns, returning a tuple with the column name and the content as a Series Iterate over DataFrame rows as (index, Series) pairs. The syntax of set_index() to setup a column as index is. myDataFrame.set_index('column_name') where myDataFrame is the DataFrame for which you would like to set column_name column as index. To setup MultiIndex, use the following syntax. myDataFrame.set_index(['column_name_1', column_name_2]) You can pass as many column names as required.

It is pivoting table. I use unstack. As in your comment, I guess your INDEX isn't a column. It is the index of the dataframe, so I change the code accordingly.

s = df.groupby('INDEX').cumcount()
df_out = df.set_index(s, append=True).INFO.unstack(0, fill_value='None')

Out[111]:
INDEX           address          name     phone      type    website
0      123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
1         456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
2         789 banana rd  banana store  999-9999  BUSINESS       None

Note: since you want None, I fill NaN with string None. If you want python None, you should just leave it as NaN because they work same way. If you want NaN, take out the option fill_value='None'

Indexing and selecting data — pandas 1.1.0 documentation, The axis labeling information in pandas objects serves many purposes: Identifies .loc , .iloc , and also [] indexing can accept a callable as indexer. You may access an index on a Series or column on a DataFrame directly as an attribute:. In the rare event that you want to preserve the index and turn the index into a column, you can do the following: # for a single level df.assign(State=df.index.get_level_values('State')) # for all levels df.assign(**df.index.to_frame())

This should do the trick,

    import pandas as pd

    INDEX = ['address', 'phone', 'name', 'website', 'type', 'address', 'phone', 'name', 'website', 'type', 'address', 'phone', 'name', 'type']
    INFO = ['123 APPLE STREET', '555-5555', 'APPLE STORE', 'APPLE.COM', 'BUSINESS', '456 peach ave', '777-7777', 'PEACH STORE', 'PEACH.COM', 'BUSINESS', '789 banana rd', '999-9999', 'banana store', 'BUSINESS']
    df = pd.DataFrame(index=INDEX, data=INFO, columns=['INFO'])
    df.index.name = 'INDEX'
    df2 = df.groupby('INDEX').agg(INFO=('INFO', list))
    pd.DataFrame(df2['INFO'].to_list(), index=df2.index).transpose()

Here's the output you get,

    Out[132]: 
    INDEX           address          name     phone      type    website
    0      123 APPLE STREET   APPLE STORE  555-5555  BUSINESS  APPLE.COM
    1         456 peach ave   PEACH STORE  777-7777  BUSINESS  PEACH.COM
    2         789 banana rd  banana store  999-9999  BUSINESS       None

Indexing and Selecting Data — pandas 0.13.1 documentation, Oftentimes you'll want to match certain values with certain columns. Just make values a dict where the key is the column, and the value is a list of items you want � An array of ints indicating which positions to take. axis {0 or ‘index’, 1 or ‘columns’, None}, default 0. The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns. is_copy bool. Before pandas 1.0, is_copy=False can be specified to

I figured out the issue. The majority of the answers can accomplish this task. however there was a bug in the dataframe. It was still giving me an error of a list no matter what I did, so I did something that was unorthodox in Python. I saved the PDF as an excel sheet and bring it back to a pandas data frame. Once I did that, the traceback disappear. Weird huh? The bigger question is to how to prevent it from happening. But thank you for all your responses.

Indexing and selecting data — pandas 0.8.1 documentation, This allows you to select rows where one or more columns have values you want: indexing we are referring to a special .ix attribute on pandas objects which� Select multiple columns by Index range. Select columns in column index range [0 to 2), dfObj.iloc[: , [0, 2]] It will return a DataFrame object i.e, Name Age a jack 34 b Riti 30 c Aadi 16. Select multiple columns by Indexes in a list. Select columns at column index 0 and 2, dfObj.iloc[: , [0, 2]] It will return a DataFrame object i.e,

pandas.DataFrame.set_index — pandas 1.1.0 documentation, Set the DataFrame index (row labels) using one or more existing columns or arrays (of the correct length). The index can replace the existing index or expand on� When slicing, both the start bound AND the stop bound are included, if present in the index. Integers are valid labels, but they refer to the label and not the position. The .loc attribute is the primary access method. The following are valid inputs: A single label, e.g. 5 or 'a' (Note that 5 is interpreted as a label of the index.

pandas.DataFrame.take — pandas 1.1.0 documentation, An array of ints indicating which positions to take. axis{0 or 'index', 1 or 'columns', None}, default 0. The axis on which to select elements. 0 means that we are� Assigning an index column to pandas dataframe ¶ df2 = df1.set_index("State", drop = False) Note: As you see you needed to store the result in a new dataframe because this is not an in-place operation. Also note that you should set the

Indexing and Selecting Data — pandas 0.15.0 documentation, For production code, we recommended that you take advantage of the You may access an index on a Series, column on a DataFrame, and a item on a Panel� In this chapter, we will discuss how to slice and dice the date and generally get the subset of pandas object. The Python and NumPy indexing operators "[ ]" and attribute operator "." provide quick and easy access to Pandas data structures across a wide range of use cases. However, since the type of

Comments
  • Edit your post, don't add info in comment
  • Seems like a simple pivot or crosstab
  • Hey Datanovice. that was my first go was to use a pivot and call it a day. but I got an error saying it was an unhashable list. Which was weird. So I converted it to tuples and I still got the same issue.
  • Maybe newdf.T and go from there?
  • newdf.T is a nice thought, but remember it only transpose the rows to columns. So I get one big row. BTW, that was second thing I did besides the pivot. :)
  • Hi Onyambu, thank you for your response. Unfortunately, I am still getting that typeerror. TypeError: unhashable type: 'list'
  • @SolomonRichberg what do you mean by that? where is the error?
  • When I use the pivot. For some reason it is treating the dataframe as a list, which pivot tables deem unhashable in pandas. So, it gives that whenever I use a pivot
  • Hi Andy, I am working on this script and so far, I feel I am getting close. The cumcount will give me a count for each item in the index. Which makes sense. However, I am getting a KeyError: 'INDEX'. Not sure why?
  • you mean df.groupby('INDEX').cumcount() works, but df.set_index([s, 'INDEX']) gave KeyError: 'INDEX' ?
  • Correct, that is what I meant Andy
  • @SolomonRichberg: INDEX is the index or the column of the dataframe? Could you run df.columns to see whether the string INDEX is in the output?
  • This is what it returned. Index(['info'], dtype='object')