Convert a standard python key value dictionary list to pyspark data frame

pyspark convert list of dictionaries to dataframe
pyspark create dataframe
dataframe to nested dictionary
pyspark create dataframe from list
pandas dataframe from dict of dicts
dictionary to dataframe python
pandas unpack dictionary
python flatten dictionary to dataframe

Consider i have a list of python dictionary key value pairs , where key correspond to column name of a table, so for below list how to convert it into a pyspark dataframe with two cols arg1 arg2?

 [{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]

How can i use the following construct to do it?

df = sc.parallelize([

Where to place arg1 arg2 in the above code (...)

Old way:

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()

New way:

from pyspark.sql import Row
from collections import OrderedDict

def convert_to_row(d: dict) -> Row:
    return Row(**OrderedDict(sorted(d.items())))

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \
    .map(convert_to_row) \ 

Convert Python Dictionary List to PySpark DataFrame, Creates a DataFrame from an RDD , a list or a pandas. When those change outside of Spark SQL, users should call this function to invalidate the cache. name – name of the user-defined function in SQL statements. f – a Python function, or a user-defined like attributes ( row.key ); like dictionary values ( row​[key] ). A pandas DataFrame can be converted into a python dictionary using the method to_dict(). The to_dict() method can be specified of various orientations that include dict, list, series, split, records and index.

I had to modify the accepted answer in order for it to work for me in Python 2.7 running Spark 2.0.

from collections import OrderedDict
from pyspark.sql import SparkSession, Row

spark = (SparkSession

schema = StructType([
    StructField('arg1', StringType(), True),
    StructField('arg2', StringType(), True)

dta = [{"arg1": "", "arg2": ""}, {"arg1": "", "arg2": ""}]

dtaRDD = spark.sparkContext.parallelize(dta) \
    .map(lambda x: Row(**OrderedDict(sorted(x.items()))))

dtaDF = spark.createDataFrame(dtaRdd, schema) 

pyspark.sql module, Make a Pandas DataFrame with two-dimensional list | Python · Python | Convert List of Dictionary to Tuple list · Python - Convert Key-Value list Dictionary to List of​  Pandas has a cool feature called Map which let you create a new column by mapping the dataframe column values with the Dictionary Key. Let’s understand this by an example: Create a Dataframe: Let’s start by creating a dataframe of top 5 countries with their population Create a Dictionary This dictionary contains the countries and

For anyone looking for the solution to something different I found this worked for me: I have a single dictionary with key value pairs - I was looking to convert that to two PySpark dataframe columns:


{k1:v1, k2:v2 ...}


| col1   |  col2 |
| k1     |  v1   |
| k2     |  v2   |

lol= list(map(list, mydict.items()))
df = spark.createDataFrame(lol, ["col1", "col2"])

Python, For example, 'list' would return a dictionary of lists with Key=Column name and Value=List (Converted series). into: class, can pass an actual class or instance. Varun June 30, 2018 Python : How to convert a list to dictionary ? In this article we will discuss different ways to convert a single or multiple lists to dictionary in Python. Following conversions from list to dictionary will be covered here, Convert List items as keys in dictionary with enumerated value. Suppose we have a list of strings i.e.

Python, Pandas offer several options to create DataFrames from lists or dictionaries. to take a standard python datastructure and create a panda's DataFrame. In this case each dictionary key is used for the column headings. The nice thing about data in a DataFrame is that it is very easy to convert into other  In this tutorial, we will see How To Convert Python Dictionary to Dataframe Example. Pandas DataFrame is one of these structures which helps us do the mathematical computation very easy. The Data frame is the two-dimensional data structure; for example, the data is aligned in the tabular fashion in rows and columns.

Creating Pandas DataFrames from Lists and Dictionaries, in finally. How to create a new column in PySpark Dataframe? You can check out the functions list here. To use Spark UDFs, we need to use the F.udf function to convert a regular python function to a Spark UDF. We also need to Add a new key in the dictionary with the new column name and value. dict = {k:v for k,v in (x.split(':') for x in list) } * If you want the conversion to int, you can replace k:v with int(k):int(v) dict(map(int, x.split(':')) for x in list) should work. Note: this assumes all list elements are formatted as two colon-separated integers, as in the example in the question details.

5 Ways to add a new column in a PySpark Dataframe, Creates a DataFrame from an RDD of tuple / list , list or pandas. Returns the value of Spark SQL configuration property for the given key. Registers a python function (including lambda function) as a UDF so it can be used When the return type is not given it default to a string and conversion will automatically be done. Create DataFrame from Dictionary using default Constructor. DataFrame constructor accepts a data object that can be ndarray, dictionary etc. i.e. Python. 1. pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

  • You should edit your question, instead of "..." please show us where the "arg1" and "arg2" should go.
  • @betterworld ok done how to do
  • thanks, can you please answer the related question…
  • Isn't this scala? def convert_to_row(d: dict) -> Row:
  • Great. I just have a question, why 'sorted'?
  • @rado That is a Python 3 function annotation.
  • @Andre85 I think because the order of keys in each dictionary may difference that why we need to be sorted.
  • This does not answer the question asked on this page