How to subset a dataframe with a conditional statement based on multiple column values

pandas dataframe filter multiple conditions
pandas select columns by condition
pandas select rows by multiple conditions
pandas replace values in column based on multiple condition
pandas dataframe filter by column value
pandas create new column based on multiple condition
pandas dataframe filter multiple columns
python subset dataframe by column name

I'm trying to subset a dataframe on the basis of conditions from multiple columns. Here is my dataframe.

var1 <- c(x,x,x,y,y,z,z,z,z) 
var2 <- c(a,b,c,a,b,a,b,c,d) 
var3 <- c(2,4,1,4,1,6,2,5,8)
data1 <- data.frame(var1,var2,var3)
# -------------------------------------------------------------------------
#     var1 var2 var3
# 1    x    a    2
# 2    x    b    4
# 3    x    c    1
# 4    y    a    4
# 5    y    b    1
# 6    z    a    6
# 7    z    b    2
# 8    z    c    5
# 9    z    d    8
Output

The output I expect is:

#     var1
# 1    y
# 2    z
Condition

The following are the conditions leading to the output:

  1. The output is a dataframe where only values of var1 are selected.
  2. Values of var3 where var2 is equal to a is greater than values of var3 where var2 is equal to b.

I'm unable to create a code based on this complicated condition from multiple columns.

Thank you.

This can give you a factor:

subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"]

# [1] y z
# Levels: x y z

You can use data.frame to get what you want as follows:

data.frame(var1 = subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"])
#   var1
# 1    y
# 2    z

How To Filter Pandas Dataframe By Values of Column?, How do you select rows of pandas DataFrame using multiple conditions? Select Rows based on value in column. Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, Python. subsetDataFrame = dfObj[dfObj['Product'] == 'Apples'] 1. subsetDataFrame=dfObj[dfObj['Product']=='Apples'] It will return a DataFrame in which Column ‘Product‘ contains ‘Apples‘ only i.e.

The most intuitive solution might be to use a for-loop. Probably, there are shorter and more elegant ways to solve this problem, but this should work:

selection <- c()

for(i in unique(var1)) {
  var_store <- data1 %>%
    filter(var1 == i, var2 == a | var2 == b)

  if(filter(var_store, var2 == a) %>% 
    select(var3) %>% 
    as.numeric() > 
  filter(var_store, var2 == b) %>% 
    select(var3) %>% 
    as.numeric()) {

    selection <- c(selection , unique(var_store$var1))
  }
}

data1 %>% 
  filter(var1 %in% selection)


# # A tibble: 6 x 3
#   var1  var2   var3
#   <chr> <chr> <dbl>
# 1 y     a         4
# 2 y     b         1
# 3 z     a         6
# 4 z     b         2
# 5 z     c         5
# 6 z     d         8

Python Pandas : Select Rows in DataFrame by conditions on , How do you filter a DataFrame based on a column value? To select multiple columns, use a list of column names within the selection brackets []. Note The inner square brackets define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame as seen in the previous example.

I found that reshaping the dataframe can solve my problem. I have been transposed var2 using dcast() to get the desired result

Selecting rows in pandas DataFrame based on conditions , How do I copy a column from one DataFrame to another? Often, you may want to subset a pandas dataframe based on one or more values of a specific column. Essentially, we would like to select rows based on one value or multiple values present in a column. Here are SIX examples of using Pandas dataframe to filter rows or select rows based values of a column(s).

How do I select a subset of a DataFrame?, Select Rows based on any of the multiple values in column  Selecting pandas DataFrame Rows Based On Conditions. 20 Dec 2017. Preliminaries # Import modules import pandas as pd import numpy as np # Create a dataframe raw_data

Pandas dataframe filter with Multiple conditions, Code #3 : Selecting all the rows from the given dataframe in which Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator. Python | Creating a Pandas dataframe column based on a given condition  Set values for selected subset data in DataFrame Using “.loc”, DataFrame update can be done in the same statement of selection and filter with a slight change in syntax. You can update values in columns applying different conditions. For example, we will update the degree of persons whose age is greater than 28 to “PhD”.

Filtering a dataframe in R based on multiple Conditions, To select multiple columns, use a list of column names within the selection brackets [] . To select rows based on a conditional expression, use a condition inside the Such a Series of boolean values can be used to filter the DataFrame by putting When combining multiple conditional statements, each condition must be  Instead of passing an entire dataFrame, pass only the row/column and instead of returning nulls what that's going to do is return only the rows/columns of a subset of the data frame where the conditions are True. Take a look at the 'A' column, here the value against 'R', 'S', 'T' are less than 0 hence you get False for those rows,

Comments
  • Please add the expected output, and your attempts to solve this problem.
  • stackoverflow.com/questions/5963269/…
  • I have been able to get the desired answer by transposing the dataframe using dcast()
  • @sayandesarkar, in that case, you can answer your own question and accept it as an answer.