how to generate pandas dataframe basis list with condition

pandas create new column based on multiple condition
create pandas column with new values based on values in other columns
pandas create new column based on condition
pandas select rows by multiple conditions
pandas np.where multiple conditions
pandas dataframe filter multiple conditions
pandas dataframe get cell value by condition
pandas select columns by condition

I have following list in python

 movie_list = [11, 21, 31, 41, 51, 62, 55]

and following movie dataframe

 userId      movieId
 1           11
 1           21
 1           31
 2           62
 2           55

Now what I want to do is generate similar dataframe, where movieId is not in dataframe, but there in movie_list

My desired dataframe would be

 userId      movieId
 1           41
 1           51
 1           62
 1           55
 2           11
 2           21
 2           31
 2           41
 2           51 

How can I do it in pandas?

IIUC, we can do the agg with list , then find the different between the original value in df with the movie_list

s=df.groupby('userId').movieId.agg(list).\
    map(lambda x : list(set(movie_list)-set(x))).explode().reset_index()
   userId movieId
0       1      41
1       1      51
2       1      62
3       1      55
4       2      41
5       2      11
6       2      51
7       2      21
8       2      31

Python, Solution #1 : We can use Python's list comprehension technique to achieve added a new column to the dataframe based on some condition. Let’s see how to Select rows based on some conditions in Pandas DataFrame. Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator.. Code #1 : Selecting all the rows from the given dataframe in which ‘Percentage’ is greater than 80 using basic method.

One approach would be to use itertools.product to create all combinations of userId & movieId, then concat and drop_duplicates:

from itertools import product

movie_list = [11, 21, 31, 41, 51, 62, 55]
df_all = pd.DataFrame(product(df['userId'].unique(), movie_list), columns=df.columns)

df2 = pd.concat([df, df_all]).drop_duplicates(keep=False)

print(df2)

[out]

    userId  movieId
3        1       41
4        1       51
5        1       62
6        1       55
7        2       11
8        2       21
9        2       31
10       2       41
11       2       51

Selecting rows in pandas DataFrame based on conditions , Selecting rows in pandas DataFrame based on conditions. Last Updated: create a dataframe Code #1 : Selecting all the rows from the given dataframe in which 'Stream' is present in the options list using basic method. Python list is easy to work with and also list has a lot of in-built functions to do a whole lot of operations on lists. Pandas dataframe’s columns consist of series but unlike the columns, Pandas dataframe rows are not having any similar association.

prod = pd.MultiIndex.from_product([df.userId.unique().tolist(), movie_list]).tolist()
(
    pd.DataFrame(set(prod).difference([tuple(e) for e in df.values]), 
                 columns=['userId', 'movieId'])
    .sort_values(by=['userId', 'movieId'])
)


userId  movieId
7   1   41
6   1   51
2   1   55
8   1   62
5   2   11
4   2   21
3   2   31
1   2   41
0   2   51

Add a Column in a Pandas DataFrame Based on an If-Else Condition, Need to add a column to your pandas DataFrame based on values create a list of the values we want to assign for each condition values� I am trying to create a new column based on the multiple conditions shown in my code. I have a dictionary for jp_hol which has the holidays in japan and my dataframe has the that date column which

I think you need:

 df = df.groupby("userId")["movieId"].apply(list).reset_index()
 df["movieId"] = df["movieId"].apply(lambda x: list(set(movie_list)-set(x)))

 df = df.explode("movieId")
 print(df)

Output:

    userId  movieId
0   1       41
0   1       51
0   1       62
0   1       55
1   2       41
1   2       11
1   2       51
1   2       21
1   2       31

How do I select a subset of a DataFrame? — pandas 1.1.0 , The inner square brackets define a Python list with column names, whereas the outer For basic information on indexing, see the user guide section on indexing and selecting data. To select rows based on a conditional expression, use a condition inside the selection brackets [] . How to create plots in pandas? pandas boolean indexing multiple conditions. It is a standrad way to select the subset of data using the values in the dataframe and applying conditions on it. We are using the same multiple conditions here also to filter the rows from pur original dataframe with salary >= 100 and Football team starts with alphabet ‘S’ and Age is less than 60

pandas.DataFrame.loc — pandas 0.23.1 documentation, 10 Minutes to pandas � Tutorials � Cookbook � Intro to Data Structures � Essential Basic DataFrame.get � pandas. Note that contrary to usual python slices, both the start and the stop are included Boolean list with the same length as the row axis Conditional that returns a boolean Series with column labels specified. Selecting pandas DataFrame Rows Based On Conditions. 20 Dec 2017. Preliminaries # Import modules import pandas as pd import numpy as np # Create a dataframe raw_data

Indexing and Selecting Data — pandas 0.13.1 documentation, Thus, as per above, we have the most basic indexing using []: In [6]: s In [12]: sa = Series([1,2,3],index=list('abc')) In [13]: dfa = df.copy() List comprehensions and map method of Series can also be used to produce more complex criteria:. A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

How To Create a Column Using Condition on , Often while cleaning data, one might want to create a new variable or column How to Create a Column Using A Condition in Pandas using NumPy? In this example, we check of the variable is in a list and use if condition if present. How to Drop Rows Based on a Column Value in Pandas Dataframe? Output : As we can see in the output, we have successfully added a new column to the dataframe based on some condition. Solution #2 : We can use DataFrame.apply() function to achieve the goal.

Comments
  • How exactly did you get from your list and first dataframe to your second dataframe?
  • you need to explain properly in order to get the answer cause how can you assigning the userid from dataframe1 to the movie_list element's