Drop rows in a subset of columns in mixed data type data frame

r subset dataframe by column value
r subset dataframe by list of values
r subset dataframe by multiple column value
r subset matrix by column names
r subset dataframe by column name
subset in r
r extract rows with certain value
r subset dataframe based on vector

Hello I am trying to drop values that are not equal to 1 or 0 across several columns but not including some columns

this is what I started with

     Age  Prod1  Prod2  Day 4  Day 5 ...  Region
0    18     1      0      1.0    5.0  0     1
1    89     3      1      1.0    1.0  1     1
2    100    4      7      0.0    1.0  1     0
3    200    0      1      0.0    0.0  1     0
4    300    1      1      0.0    1.0  1     1
5    19     1      1      1.0    1.0  6     1

there are a total of 10,000 rows and 34 columns

the first two columns I have cleaned successfully because they have numeric values that are different from the rest.

Here is what I did

ageindex = df[ (df['Age'] < 18) & (dfl['Age'] > 150) ].index

I want to drop the rows from columns Prod1 through the end Region. It is only 34 columns but I cannot seem to figure out how to do this.

I have found a way to drop NaN values here but not how to drop using conditions based on values.

Here is what I have tried

prodindex1 = df[ (df.loc['Prod1':'Region'] > 1) ].index

but that just returns the same dataframe. I also tried

prodindex = df[ (df.loc['Prod1':'Region'] > 1) & (df.loc['Prod1':'Region'] < 0) ].index

The Expected output should be

     Age  Prod1  Prod2  Day 4  Day 5 ...  Region
3    200    0      1      0.0    0.0  1     0
4    300    1      1      0.0    1.0  1     1

I think I have some problems because some of them are whole numbers and some are floats. Any guidance is appreciated.

EDIT: i want to drop where values are not equal to or not equal to 0

import pandas as pd 
import numpy as np 

# Sample data
d = np.array([[18, 1, 0, 1.0, 5.0, 0, 1],
                [89, 3, 1, 1.0, 1.0, 1, 1],
                [100, 4, 7, 0.0, 1.0, 1, 0],
                [200, 0, 1, 0.0, 0.0, 1, 0],
                [300, 1, 1, 0.0, 1.0, 1, 1],
                [19, 1, 1, 1.0, 1.0, 6, 1]])

df = pd.DataFrame(data=d, columns = ['Age','Prod1','Prod2', 'Day 4', 'Day 5', 'Day 6', 'Region'])
df = df.drop(df[~df.loc[:, 'Prod1':'Region'].isin([0, 1]).all(axis=1)].index)    

should give the expected output:

     Age  Prod1  Prod2  Day 4  Day 5  Day 6  Region
3  200.0    0.0    1.0    0.0    0.0    1.0     0.0
4  300.0    1.0    1.0    0.0    1.0    1.0     1.0

Comment on your code: Your conditions are wrong but this is not the reason why you are getting the same dataframe. This happens because you are not passing df.drop(prodindex) to a variable, i.e:

# Your code
prodindex = df[ (df.loc['Prod1':'Region'] > 1) & (df.loc['Prod1':'Region'] < 0) ].index
df = df.drop(prodindex)

Empty DataFrame
Columns: [Age, Prod1, Prod2, Day 4, Day 5, Day 6, Region]
Index: []

4 Subsetting, Subsetting operators interact differently with different vector types (e.g., atomic vectors, x[c(-1, 2)] #> Error in x[c(-1, 2)]: only 0's may be mixed with negative subscripts Each row in the matrix specifies the location of one value, and each column corresponds Data frames with a single column will return just that column:. Drop specified labels from rows or columns. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. Parameters labels single label or list-like. Index or column labels to drop.

This should work:

df[df.loc[:, 'Prod1':'Region'].isin([0, 1]).all(axis=1)]

pandas.DataFrame.select_dtypes — pandas 1.1.0 documentation, Return a subset of the DataFrame's columns based on the column dtypes. Parameters Return Series with the data type of each column. Notes. To select all� The most easiest way to drop columns is by using subset () function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset () function.

If you wish to remove rows containing values 1 or 0, following works:

df.loc[~df.loc[:, 'Prod1':'Region'].isin([0, 1]).any(axis=1), :]

Subsetting � Advanced R., You'll start by learning the six types of data that you can use to subset atomic vectors. x[c(-1, 2)] #> Error in x[c(-1, 2)]: only 0's may be mixed with negative Blank subsetting is now useful because it lets you keep all rows or all columns. Omitting drop = FALSE when subsetting matrices and data frames is one of the most� When using the column names, row labels or a condition expression, use the loc operator in front of the selection brackets []. For both the part before and after the comma, you can use a single label, a list of labels, a slice of labels, a conditional expression or a colon. Using a colon specificies you want to select all rows or columns.

15 Easy Solutions To Your Data Frame Problems In R, Discover how to create a data frame in R, change column and row Each column needs to consist of values of the same type, since they Note that you can also define this subset with the variable names. If you want to remove values or entire columns, you can assign a NULL value to the desired unit:. Another way to subset the data frame with brackets is by omitting row and column references. Take a look at this code: ed_exp2 - education[-c(1:9,22:50),-c(1,3:5)] Here, instead of subsetting the rows and columns we wanted returned, we subsetted the rows and columns we did not want returned and then omitted them with the “-” sign.

[PDF] Subsetting Data in R, Subset rows of a data.frame. 4. Subset Add/remove new columns to a data. frame. 6. What about selecting rows based on the values of two variables? namespace:dplyr, which means when you type filter, it will use. This version of the subset command narrows your data frame down to only the elements you want to look at. Other Ways to Subset A Data Frame in R. There are actually many ways to subset a data frame using R. While the subset command is the simplest and most intuitive way to handle this, you can manipulate data directly from the data frame syntax.

Subsetting data – Environmental Computing, We often want to subset our data, whether it's to examine particular rows or columns of our dataset, or to pull out observations with particular� subset: It’s an array which limits the dropping process to passed rows/columns through list. inplace: It is a boolean which makes the changes in data frame itself if True. Code #1: Dropping rows with at least 1 null value.

  • Do you delete rows whose columns (between Prod1..Region) are not equal to 1 and 0, or those which are between 0 and 1?
  • Not equal to 1 or 0 ill edit to make more clear
  • If i use this I get "'DataFrame' object is not callable" error
  • Works fine for me. I updated my answer in order to include the full script.
  • Ok that dropped rows but it dropped all rows. I started with a csv file that has 34 columns. Do I need to specify a dataframe and type out all the column names for all 34 columns?
  • If this hasn’t worked as expected then you have desribed your problem or dataset incorrectly.
  • I added more details I am not sure what I am missing. I started with a csv file if that makes a difference. Do i need to explicitly name all 34 column names like in your example? I started with just df=pd.read_csv('df.csv')
  • There is at least one row that does not drop when I use this. The row has a value equal to 1 0 and it is still present.
  • @aardvark I cannot reproduce that. When run on your provided data, only rows 3 and 4 are left, as stated in the question.
  • Weird. I am missing something in this so I'm sorry. I have to assume your solution works and it is on my end that I am messing up. thanks
  • @aardvark do you mean that it doesn't work on your full dataset? Could you provide more details please?
  • Yes on the full data set this is not working. It does not drop one row where Day6=10 I added details to the original post. I started with a csv file. I know how to go through each column individually and drop the values just not how to do it quicker. I have removed NaN values and dropped values using the Age column before this step.