Parsing a JSON string which was loaded from a CSV using Pandas

pandas read json
json_normalize
nested json to csv python pandas
python parse json
flatten nested json python
pandas nested json
pandas read json from url
pandas json to csv

I am working with CSV files where several of the columns have a simple json object (several key value pairs) while other columns are normal. Here is an example:

name,dob,stats
john smith,1/1/1980,"{""eye_color"": ""brown"", ""height"": 160, ""weight"": 76}"
dave jones,2/2/1981,"{""eye_color"": ""blue"", ""height"": 170, ""weight"": 85}"
bob roberts,3/3/1982,"{""eye_color"": ""green"", ""height"": 180, ""weight"": 94}"

After using df = pandas.read_csv('file.csv'), what's the most efficient way to parse and split the stats column into additional columns?

After about an hour, the only thing I could come up with was:

import json
stdf = df['stats'].apply(json.loads)
stlst = list(stdf)
stjson = json.dumps(stlst)
df.join(pandas.read_json(stjson))

This seems like I'm doing it wrong, and it's quite a bit of work considering I'll need to do this on three columns regularly.

Desired output is the dataframe object below. Added following lines of code to get there in my (crappy) way:

df = df.join(pandas.read_json(stjson))
del(df['stats'])
In [14]: df

Out[14]:
          name       dob eye_color  height  weight
0   john smith  1/1/1980     brown     160      76
1   dave jones  2/2/1981      blue     170      85
2  bob roberts  3/3/1982     green     180      94

There is a slightly easier way, but ultimately you'll have to call json.loads There is a notion of a converter in pandas.read_csv

converters : dict. optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

So first define your custom parser. In this case the below should work:

def CustomParser(data):
    import json
    j1 = json.loads(data)
    return j1

In your case you'll have something like:

df = pandas.read_csv(f1, converters={'stats':CustomParser},header=0)

We are telling read_csv to read the data in the standard way, but for the stats column use our custom parsers. This will make the stats column a dict

From here, we can use a little hack to directly append these columns in one step with the appropriate column names. This will only work for regular data (the json object needs to have 3 values or at least missing values need to be handled in our CustomParser)

df[sorted(df['stats'][0].keys())] = df['stats'].apply(pandas.Series)

On the Left Hand Side, we get the new column names from the keys of the element of the stats column. Each element in the stats column is a dictionary. So we are doing a bulk assign. On the Right Hand Side, we break up the 'stats' column using apply to make a data frame out of each key/value pair.

How can I parse JSON string loaded in CSV file (with pandas , There is a slightly easier way, but ultimately you'll have to call json.loads There is a notion of a converter in pandas.read_csv converters : dict. Applying the json.load is a great idea, but from there you can just simply directly convert it to dataframe columns instead of writing/loading it again: stdf = df['stats'].apply(json.loads) pd.DataFrame(stdf.tolist()) # or stdf.apply(pd.Series) or you can also do this in one step: df.join(df['stats'].apply(json.loads).apply(pd.Series))

I think applying the json.load is a good idea, but from there you can simply directly convert it to dataframe columns instead of writing/loading it again:

stdf = df['stats'].apply(json.loads)
pd.DataFrame(stdf.tolist()) # or stdf.apply(pd.Series)

or alternatively in one step:

df.join(df['stats'].apply(json.loads).apply(pd.Series))

Pandas, I have very little Python experience - please bear with me! I'm working with a CSV file where one column is JSON string while the other columns are normal. df = pandas.read_csv('file.csv') I would like to split the Responses column into separate columns, with the desired output being: RT Type Q0 Q1 Q2 8767 survey1 -1 -1 -1 6756 survey2 6 2 -1 6587 survey3 5 2 -1 Now, I have managed to produce the desired output with. df.join(df['Responses'].apply(json.loads).apply(pd.Series))

Paul's original answer was very nice but not correct in general, because there is no assurance that the ordering of columns is the same on the left-hand side and the right-hand side of the last line. (In fact, it does not seem to work on the test data in the question, instead erroneously switching the height and weight columns.)

We can fix this by ensuring that the list of dict keys on the LHS is sorted. This works because the apply on the RHS automatically sorts by the index, which in this case is the list of column names.

def CustomParser(data):
  import json
  j1 = json.loads(data)
  return j1

df = pandas.read_csv(f1, converters={'stats':CustomParser},header=0)
df[sorted(df['stats'][0].keys())] = df['stats'].apply(pandas.Series)

pandas.read_json, loads() in Python · json.load() in Python · Convert JSON to CSV in Python · JSON Formatting in Python · ankurtripathi. Check  Parsing of JSON Dataset using pandas is much more convenient. Pandas allow you to convert a list of lists into a Dataframe and specify the column names separately. A JSON parser transforms a JSON text into another representation must accept all texts that conform to the JSON grammar. It may accept non-JSON forms or extensions.

json_normalize function in pandas.io.json package helps to do this without using custom function.

(assuming you are loading the data from a file)

from pandas.io.json import json_normalize
df = pd.read_csv(file_path, header=None)
stats_df = json_normalize(data['stats'].apply(ujson.loads).tolist())
stats_df.set_index(df.index, inplace=True)
df.join(stats_df)
del df.drop(df.columns[2], inplace=True)

Quick Tutorial: Flatten Nested JSON in Pandas, Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of If parsing dates, then parse the default datelike columns​. python - read - Parsing a JSON string which was loaded from a CSV using Pandas read large json file python (2) import json stdf = df [ 'stats' ]. apply (json. loads) stlst = list (stdf) stjson = json. dumps (stlst) df. join (pandas. read_json (stjson))

Option 1

If you dumped the column with json.dumps before you wrote it to csv, you can read it back in with:

import json
import pandas as pd

df = pd.read_csv('data/file.csv', converters={'json_column_name': json.loads})

Option 2

If you didn't then you might need to use this:

import json
import pandas as pd

df = pd.read_csv('data/file.csv', converters={'json_column_name': eval})

Option 3

For more complicated situations you can write a custom converter like this:

import json
import pandas as pd

def parse_column(data):
    try:
        return json.loads(data)
    except Exception as e:
        print(e)
        return None


df = pd.read_csv('data/file.csv', converters={'json_column_name': parse_column})

JSON with Python Pandas, Explore and run machine learning code with Kaggle Notebooks | Using data from NY Philharmonic Performance History. Hope you enjoyed this quick tutorial and JSON parsing is a bit less daunting. Fork this notebook to Concert-level csv. Looking to load a JSON string into Pandas DataFrame? If so, you can use the following template to load your JSON string into the DataFrame: import pandas as pd pd.read_json (r'Path where you saved the JSON file\File Name.json') In this short guide, I’ll review the steps to load different JSON strings into Python using pandas.

Tutorial: Working with Large Data Sets using Pandas and JSON in , First load the json data with Pandas read_json method, then it's loaded into a The example below parses a JSON string and converts it to a Pandas DataFrame​. In the next example, you load data from a csv file into a dataframe, that you  JSON with Python Pandas. Read json string files in pandas read_json(). You can do this for URLS, files, compressed files and anything that’s in json format. In this post, you will learn how to do that with Python. First load the json data with Pandas read_json method, then it’s loaded into a Pandas DataFrame.

Convert a column of json strings into columns of data, In this Python programming and data science tutorial, learn to work with with We can both convert lists and dictionaries to JSON, and convert strings to lists and data we're used to working with when we operate on CSV files or SQL tables. We can accomplish this using the ijson package. ijson will iteratively parse the  Parsing a JSON string which was loaded from a CSV using Pandas. There is a slightly easier way, but ultimately you'll have to call json.loads There is a notion of a converter in pandas.read_csv. converters : dict. optional Dict of functions for converting values in certain columns. Keys can either be integers or column labels

How to Load JSON String into Pandas DataFrame, Pandas parse json column. Parsing a JSON string which was loaded from a CSV using Pandas , After using df = pandas.read_csv('file.csv') , what's the most  Pandas JSON to CSV Example. Now when we have loaded a JSON file into a dataframe we may want to save it in another format. For instance, we may want to save it as a CSV file and we can do that using Pandas to_csv method. It may be useful to store it in a CSV, if we prefer to browse through the data in a text editor or Excel.

Comments
  • thanks, this is great, i expect i'll need to deal with more mutant data in the future and this will help.
  • The last line in this answer does not guarantee that the dict elements get matched to the correct column names. .apply(pandas.Series) converts each row into a Series and automatically sorts the index, which in this case is the list of dictionary keys. So for consistency, you have to ensure that the list of keys on the LHS is sorted.
  • I would import json and then use: pandas.read_csv(f1, converters={'stats': json.loads}). You don't need to define a new function, and you definitely don't need to import inside it.
  • Hello. I tried this in Python 3 and got the error: ValueError: Columns must be same length as key. My requirement and expected output is exactly the same except that I have nested values in my JSON.
  • only issue is when the json keys are inconsistent, Columns must be same length as key error pops
  • ty, this was perfectly sufficient for my current task but i marked the other one as the answer since it's more broadly applicable
  • I was wondering how to parallelise this statement df.join(df['stats'].apply(json.loads).apply(pd.Series)). Any help please?
  • Thx for spotting that. I have updated my answer with your additional sorted for completeness
  • Thanks for your answer. Shouldn't the ujson.loads actually be json.loads?
  • Hello, I have got nan value in my JSON sting 'sv': [nan, nan, nan, nan, nan, 1.0] and I got the error "name 'nan' is not defined". Do you know how to handle that case?
  • Hmm you could try Option 3, the custom parser and do something like data = data.replace('nan,', 'None,') and then return eval(data), be careful though with the replacement, and other values that you don't want to replace being replaced. I'm not sure what your data looks like. You could maybe get a bit smarter and use regex something like this (?<=[\[,\s\]])(nan)(?=[\,\s\]]) which should match all the nan but not stuff like bnan or *nan - This is a good tool to play around on regexr.com