Delete rows from a dataframe from multi variables in a database

pandas drop rows based on multiple column values
pandas drop rows with condition
pandas drop rows with value in list
pandas delete row by index
pandas drop rows based on value
pandas remove specific index row
pandas drop multiple columns
pandas drop rows with string

I have the following data.frame:

dage ded dht dwt marital inc smoke time number
31   5  65 110       1   1     0    0      0
38   5  70 148       1   4     0    0      0
32   1  99 999       1   2     1    1      1
28   4  99 999       1  98     3    4      2
35   4  99 999       1   7     0    0      0
33   4  98 998       1  99     0    0      0

I want to remove any row that has the number 99 or 999 (or both).

data.frame structure:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

Using rowSums

df[rowSums(df[,c('dht','dwt')]==99|df[,c('dht','dwt')]==999)==0,]
  ded dht dwt
1   5  65 110
2   5  70 148
6   4  98 998

Python, Python | Delete rows/columns from DataFrame using Pandas.drop() level: Used to specify level in case data frame is having multiple level index. Those values were dropped and the changes were made in the original data frame since  Pandas provide data analysts a way to delete and filter data frame using .drop() method. Rows or columns can be removed using index label or column name using this method. Syntax: DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’) Parameters:

You can replace 99 and 999 with NA first.

dat[dat == 99 | dat == 999] <- NA

And then use na.omit or complete.cases.

na.omit(dat)
#   dage ded dht dwt marital inc smoke time number
# 1   31   5  65 110       1   1     0    0      0
# 2   38   5  70 148       1   4     0    0      0

dat[complete.cases(dat), ]
#   dage ded dht dwt marital inc smoke time number
# 1   31   5  65 110       1   1     0    0      0
# 2   38   5  70 148       1   4     0    0      0

DATA

dat <- read.table(text = "dage ded dht dwt marital inc smoke time number
31   5  65 110       1   1     0    0      0
38   5  70 148       1   4     0    0      0
32   1  99 999       1   2     1    1      1
28   4  99 999       1  98     3    4      2
35   4  99 999       1   7     0    0      0
33   4  98 998       1  99     0    0      0",
                  header = TRUE)

Python MySQL Delete table Data [Complete Guide], Python MySQL delete query to delete single row, multiple rows, all rows, variable in a parameterized query to delete a row from MySQL table. You cannot actually delete a row, but you can access a dataframe without some rows specified by negative index. This process is also called subsetting in R language. To delete a row, provide the row number as index to the Dataframe. The syntax is shown below:

If your dataframe is called df1:

require(dplyr)
filter_all(df1, all_vars(.!=99 & .!=999))

Result:

  dage ded dht dwt marital inc smoke time number
1   31   5  65 110       1   1     0    0      0
2   38   5  70 148       1   4     0    0      0

Dropping Rows Using Pandas, Cleaning your Pandas Dataframes: dropping empty or problematic data. The pandas .drop() method is used to remove entire rows or columns based values in the form of an array when dropping multiple rows/columns at once. can drop all rows in your Customer database who have the name Chad. How to Add Rows To A Dataframe (Multiple) If we needed to insert multiple rows into a r data frame, we have several options. First, we can write a loop to append rows to a data frame. This is good if we are doing something like web scraping, where we want to add rows to the data frame after we download each page. We can still use this basic

Here's a solution using any() and apply() that doesn't require any supplemental packages:

#fake data
d <- data.frame(a = c(1,2,3,4,99), b = c(99, 1,2,999,4))
#subset rows that don't contain a 99 or 999
d[!apply(d, 1, function(x) any(x %in% c(99,999))),]

Yields:

  a b
2 2 1
3 3 2

Python Pandas DataFrame: load, edit, view data, There can be multiple rows and columns in the data. If your data is in some other form, such as an SQL database, or an Excel (XLS / XLSX) file, you can look To delete rows and columns from DataFrames, Pandas uses the “drop” function. Drop a row by row number (in this case, row 3) Note that Pandas uses zero based numbering, so 0 is the first row, 1 is the second row, etc. df.drop(df.index) can be extended to dropping a range

Create data.frame as shown in original question:

df <- structure(list(dage = c(31L, 38L, 32L, 28L, 35L, 33L), ded = c(5L, 
5L, 1L, 4L, 4L, 4L), dht = c(65L, 70L, 99L, 99L, 99L, 98L), dwt = c(110L, 
148L, 999L, 999L, 999L, 998L), marital = c(1L, 1L, 1L, 1L, 1L, 
1L), inc = c(1L, 4L, 2L, 98L, 7L, 99L), smoke = c(0L, 0L, 1L, 
3L, 0L, 0L), time = c(0L, 0L, 1L, 4L, 0L, 0L), number = c(0L, 
0L, 1L, 2L, 0L, 0L)), row.names = c(NA, -6L), class = "data.frame")

data.table solution:

library(data.table)
dt <- as.data.table(df)
dt[rowSums(df == 99)==0 & rowSums(df == 999)==0]

base R solution:

 df[!apply(df, 1, function(x) any(x %in% c(99,999))),]

dplyr solution:

require(dplyr)
filter_all(df, all_vars(.!=99 & .!=999))

Benchmarks:

microbenchmark::microbenchmark(dt = dt[rowSums(df == 99)==0 & rowSums(df == 999)==0], 
base = df[!apply(df, 1, function(x) any(x %in% c(99,999))),], 
dplyr = filter_all(df, all_vars(.!=99 & .!=999)), times = 10000)
# Unit: microseconds
  #expr      min       lq      mean    median        uq        max neval
  #dt    588.000  645.801  701.4309  675.6005  723.2515   5203.801 10000
  #base  264.601  296.901  324.2588  314.4005  335.7020   3435.600 10000
  #dplyr 3671.400 3854.301 4036.3976 3915.3010 3983.0010 139226.802 10000

How To Drop One or More Columns in Pandas Dataframe?, Often while working with a bigger pandas dataframe with multiple Pandas drop function allows you to drop/remove one or more columns from a dataframe. After filtering, we will have a smaller dataframe with just four rows  As df.drop () function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop (). Suppose we want to delete the first two rows i.e. rows at index position 0 & 1 from the above dataframe object. Let’s see how to do that,

Drop multiple rows pandas, Let's see two cases such as deleting a row with its index and deleting a row with Let's discuss how to drop one or multiple columns in Pandas Dataframe. py to remove unnecessary data from the database or data frame. drop_duplicates  Now pass this to dataframe.drop() to delete these rows i.e. dfObj.drop( dfObj[ dfObj['Age'] == 30 ].index , inplace=True) It will delete the all rows for which column ‘Age’ has value 30. Delete rows based on multiple conditions on a column. Suppose Contents of dataframe object dfObj is, Original DataFrame pointed by dfObj

Python Pandas Tutorial 5, This tutorial describes how to subset or extract data frame rows based on certain criteria. In this We will also show you how to remove rows with missing values in a given column. Multiple-column based criteria: Extract rows where Sepal. Default value of ‘how’ argument in dropna () is ‘any’ & for ‘axis’ argument it is 0. It means if we don’t pass any argument in dropna () then still it will delete all the rows with any NaN. We can also pass the ‘how’ & ‘axis’ arguments explicitly too i.e. #Drop rows which contains any NaN or missing value

Subset Data Frame Rows in R, Python is no exception, and a library to access SQLite databases, It has several advantages over the query we did above: Read the results into a DataFrame, and store them to the variable routes . We can use the sqlite3 package to modify a SQLite database by inserting, updating, or deleting rows. After sorted "HHID" variable, I detected 36 observations either were blank or the codes were badly typed. Then, I used the code you've posted, and it worked but it deleted 43 observations, seven more than I've founded. Looking for an explanation, I observed what kind of variable was HHID, and it resulted to be a string type, more exactly str15.

Comments
  • It was better with the textual data. Having an image means people can't just copy-paste your data to try it out on their own system.
  • All the 3 solutions are already included. Am I missing something ?
  • @RonakShah You are not ... except that my data.table solution is a tad different to the base solution mentioned above. I was interested myself how the three approaches benchmarked, hence why I provided my answer. Not sure if speed is of concern or a certain method is preferred (e.g. tidyverse over data.table).