How to create new DataFrame with dict

pandas dataframe from dict of dicts
pandas from dict
nested dictionary to dataframe
pandas dataframe from list of dicts
dataframe to dictionary by row
valueerror: if using all scalar values, you must pass an index
create dataframe
pandas unpack dictionary

I had one dict, like:

cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}

and one DataFrame A, like:

+---+
|key|
+----
| k1|
| k2|
| k3|
| k4|
+---+

to create the DataFame above with code:

data = [('k1'),
    ('k2'),
    ('k3'),
    ('k4')]
A = spark.createDataFrame(data, ['key'])

I want to get the new DataFrame, like:

+---+----------+----------+
|key|   v1     |    v2    |
+---+----------+----------+
| k1|true      |false     |
| k2|true      |false     |
| k3|false     |true      |
| k4|false     |true      |
+---+----------+----------+

I wish to get some suggestions, thanks!

I just wanted to contribute a different and possibly easier way to solve this.

In my code I convert a dict to a pandas dataframe, which I find is much easier. Then I directly convert the pandas dataframe to spark.

data = {'visitor': ['foo', 'bar', 'jelmer'], 
        'A': [0, 1, 0],
        'B': [1, 0, 1],
        'C': [1, 0, 0]}

df = pd.DataFrame(data)
ddf = spark.createDataFrame(df)

Output:
+---+---+---+-------+
|  A|  B|  C|visitor|
+---+---+---+-------+
|  0|  1|  1|    foo|
|  1|  0|  0|    bar|
|  0|  1|  0| jelmer|
+---+---+---+-------+

Python Pandas : How to create DataFrame from dictionary , Pass dictionary in Dataframe constructor to create a new object. keys will be the column Creating dataframe by converting dict to list of items. Python Pandas : How to add new columns in a dataFrame using [] or dataframe.assign() Pandas : Change data type of single or multiple columns of Dataframe in Python Pandas : Find duplicate rows in a Dataframe based on all or selected columns using DataFrame.duplicated() in Python

The dictionary can be converted to dataframe and joined with other one. My piece of code,

data = sc.parallelize([(k,)+(v,) for k,v in cMap.items()]).toDF(['key','val'])
keys = sc.parallelize([('k1',),('k2',),('k3',),('k4',)]).toDF(["key"])
newDF = data.join(keys,'key').select("key",F.when(F.col("val") == "v1","True").otherwise("False").alias("v1"),F.when(F.col("val") == "v2","True").otherwise("False").alias("v2"))

 >>> newDF.show()
 +---+-----+-----+
 |key|   v1|   v2|
 +---+-----+-----+
 | k1| True|False|
 | k2| True|False|
 | k3|False| True|
 | k4|False| True|
 +---+-----+-----+

If there are more values, you can code that when clause as a UDF and use it.

Convert Python dict into a dataframe, Pandas have built-in function for conversion of dict to data frame. Accepts a dict as argument and returns a dataframe with the keys of the dict as index and values as a column. pandas.DataFrame.to_dict¶. Convert the DataFrame to a dictionary. The type of the key-value pairs can be customized with the parameters (see below). Determines the type of the values of the dictionary. ‘dict’ (default) : dict like {column -> {index -> value}}. ‘list’ : dict like {column -> [values]}.

I parallelize cMap.items() and check if value equal to v1 or v2 or not. Then joining back to dataframe A on column key

# example dataframe A
df_A = spark.sparkContext.parallelize(['k1', 'k2', 'k3', 'k4']).map(lambda x: Row(**{'key': x})).toDF()

cmap_rdd = spark.sparkContext.parallelize(cMap.items())
cmap_df = cmap_rdd.map(lambda x: Row(**dict([('key', x[0]), ('v1', x[1]=='v1'), ('v2', x[1]=='v2')]))).toDF()

df_A.join(cmap_df, on='key').orderBy('key').show()

Dataframe

+---+-----+-----+
|key|   v1|   v2|
+---+-----+-----+
| k1| true|false|
| k2| true|false|
| k3|false| true|
| k4|false| true|
+---+-----+-----+

How to convert a dictionary into a Pandas DataFrame in Python, How do I convert a dictionary to a DataFrame in Python? Create multiple, new columns from dict values by mapping against a single column 0 How to add value to a pandas dataframe column by row depending a key value in a dictionary?

Thanks everyone for some suggestions, I figured out the other way to resolve my problem with pivot, the code is:

cMap = {"k1" : "v1", "k2" : "v1", "k3" : "v2", "k4" : "v2"}
a_cMap = [(k,)+(v,) for k,v in cMap.items()] 
data = spark.createDataFrame(a_cMap, ['key','val'])

from pyspark.sql.functions import count
data = data.groupBy('key').pivot('val').agg(count('val'))
data.show()

+---+----+----+
|key|  v1|  v2|
+---+----+----+
| k2|   1|null|
| k4|null|   1|
| k1|   1|null|
| k3|null|   1|
+---+----+----+

data = data.na.fill(0)
data.show()

+---+---+---+
|key| v1| v2|
+---+---+---+
| k2|  1|  0|
| k4|  0|  1|
| k1|  1|  0|
| k3|  0|  1|
+---+---+---+

keys = spark.createDataFrame([('k1','2'),('k2','3'),('k3','4'),('k4','5'),('k5','6')], ["key",'temp'])

newDF = keys.join(data,'key')
newDF.show()
+---+----+---+---+
|key|temp| v1| v2|
+---+----+---+---+
| k2|   3|  1|  0|
| k4|   5|  0|  1|
| k1|   2|  1|  0|
| k3|   4|  0|  1|
+---+----+---+---+

But, I can't convert 1 to true, 0 to false.

How to convert a list of lists into a Pandas DataFrame in Python, will be converted to one column i.e. key will become Column Name and list in the value field will be the column data i.e. orient {‘columns’, ‘index’}, default ‘columns’ The “orientation” of the data. If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

I just wanted to add an easy way to create DF, using pyspark

values = [("K1","true","false"),("K2","true","false")]
columns = ['Key', 'V1', 'V2']
df = spark.createDataFrame(values, columns)

Creating Pandas DataFrames from Lists and Dictionaries, import pandas as pd from collections import OrderedDict from datetime import date. The “default” manner to create a DataFrame from python is  Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more How to merge Pandas DataFrame with dict of lists

Create pandas dataframe from lists using dictionary, Creating pandas data-frame from lists using dictionary can be achieved in from lists using zip · Python | Create a Pandas Dataframe from a dict of equal length  To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length. # Python code demonstrate creating.

Python, There are various ways of creating a DataFrame in Pandas. One way is to convert a dictionary containing lists of equal lengths as values. Let's discuss how to  The syntax to create a DataFrame from dictionary object is shown below. mydataframe = DataFrame(dictionary) Each element in the dictionary is translated to a column, with the key as column name and the array of values as column values.

How to Convert Dictionary to Pandas DataFrame, In this guide, I'll show you the steps to convert a Dictionary to Pandas from pandas import DataFrame my_dict = {key:value,key:value,key:value,. code to create the following tool to convert your dictionary to a DataFrame:. Creating pandas data-frame from lists using dictionary can be achieved in multiple ways. Method #1: Using pandas.DataFrame With this method in Pandas we can transform a dictionary of list to a dataframe.

Comments
  • Actually, there are more values, could you tell me how to construct the UDF?