Subseting data frame based on multiple criteria for deletion of rows
Consider the following data frame consisting of column names "id" and "x", where each id is repeated four times. Data is as follows:
The question is about how to subset the data frame by the following criteria:
(1) keep all entries of each id, if its corresponding values in column x does not contain 3 or it has 3 as the last number.
(2) for a given id with multiple 3s in column x, keep all the numbers up to the first 3 and delete the remaining 3s. The expected output would look like:
id x 1 1 2 2 1 2 3 1 1 4 1 1 5 2 2 6 2 3 7 3 1 8 3 2 9 3 2 10 3 3 11 4 2 12 4 2 13 4 3
I am familiar with the use of the 'filter' function in dplyr package to subset data, but this particular situation confuses me because of the complexity of the above criteria. Any help on this would be greatly appraciated.
Here's one solution that uses / creates some new columns to help you filter on:
library(dplyr) df<-data.frame("id"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), "x"=c(2,2,1,1,2,3,3,3,1,2,2,3,2,2,3,3)) df %>% group_by(id) %>% # for each id mutate(num_threes = sum(x == 3), # count number of 3s flag = ifelse(unique(num_threes) > 0, # if there is a 3 min(row_number()[x == 3]), # keep the row of the first 3 0)) %>% # otherwise put a 0 filter(num_threes == 0 | row_number() <= flag) %>% # keep ids with no 3s or up to first 3 ungroup() %>% select(-num_threes, -flag) # remove helpful columns # # A tibble: 13 x 2 # id x # <dbl> <dbl> # 1 1 2 # 2 1 2 # 3 1 1 # 4 1 1 # 5 2 2 # 6 2 3 # 7 3 1 # 8 3 2 # 9 3 2 # 10 3 3 # 11 4 2 # 12 4 2 # 13 4 3
Conditionally Remove Row from Data Frame in R (Example , How to remove rows from a data frame in R - 3 example codes - Reproducible R We can remove rows based on multiple conditions by using the &- or the |- operator. Alternatively to Examples 1 and 2, we can use the subset function:� To create the new data frame ‘ed_exp1,’ we subsetted the ‘education’ data frame by extracting rows 10-21, and columns 2, 6, and 7. Pretty simple, right? Another way to subset the data frame with brackets is by omitting row and column references. Take a look at this code: ed_exp2 - education[-c(1:9,22:50),-c(1,3:5)]
this works for me:
library(dplyr) df <- mutate(df, before = lag(x)) df$condition1 <- 1 df$condition1[df$x == 3 & df$before == 3] <- 0 final_df <- df[df$condition1 == 1, 1:2]
x id 1 2 1 2 1 1 1 1 2 2 2 3 3 1 3 2 3 2 3 3 4 2 4 2 4 3`
Subsetting Data, This includes keeping or deleting variables, observations, random samples. interactively, try the selection of data frame elements exercises in the Data frames chapter based on variable values In the following example, we select all rows that have a value of age greater than or We keep the ID and Weight columns. Now that you’ve reviewed the rules for creating subsets, you can try it with some data frames in R. You just have to remember that a data frame is a two-dimensional object and contains rows as well as columns. This means that you need to specify the subset for rows and columns independently. To do so, you combine the operators.
One idea is to pick out the rows with
x==3 and use
unique() over them. Then append the unique rows with just single
3 to the rest part of the data frame, and finally order the rows.
Here is a solution with
base R for the idea above:
res <- (r <- with(df,rbind(df[x!=3,],unique(df[x==3,]))))[order(as.numeric(rownames(r))),] rownames(res) <- seq(nrow(res))
> res id x 1 1 2 2 1 2 3 1 1 4 1 1 5 2 2 6 2 3 7 3 1 8 3 2 9 3 2 10 3 3 11 4 2 12 4 2 13 4 3
Subset Data Frame Rows in R, Manipulation in R. Home Data Manipulation in R Subset Data Frame Rows in R Multiple-column based criteria: Extract rows where Sepal.Length > 6.7 and� Subsetting rows using multiple conditional statements There is no limit to how many logical statements may be combined to achieve the subsetting that is desired. The data frame x.sub1 contains only the observations for which the values of the variable y is greater than 2 and for which the variable V1 is greater than 0.6.
Delete or Drop rows in R with conditions, Drop rows in R with conditions can be done with the help of subset () function. Let's see how to delete or drop rows with multiple conditions in R with an example. Drop rows with Drop NaN) : Let's first create the dataframe with NA values as shown below Remove Duplicates based on a column using duplicated() function. Filter Rows based on Value/Condition and Then Delete it. One of the fastest ways to delete rows that contain a specific value or fulfill a given condition is to filter these. Once you have the filtered data, you can delete all these rows (while the remaining rows remain intact).
[PDF] Subsetting Data in R, condition. 2. Renaming columns of a data.frame. 3. Subset rows of a data.frame. 4. Subset columns You can put a minus (-) before integers inside brackets to remove What about selecting rows based on the values of two variables? We use You can have multiple logical conditions using the following:. We can use Delete method of Rows to delete the rows based on multiple criteria. In this example we will see how to delete the rows in excel worksheet using VBA based on multiple criteria. VBA code for deleting rows based on multiple criteria macro should work for all the version of Microsoft Excel 2003, Excel 2007, Excel 2010, and Excel 2013.
4 Subsetting, Each row in the matrix specifies the location of one value, and each column corresponds When subsetting with two indices, they behave like matrices, so df [1:3, ] selects the you check your code with a data frame or matrix with multiple columns, and it works. With lists, you can use x[[i]] <- NULL to remove a component. In this tutorial, we will learn how to delete a row or multiple rows from a dataframe in R programming with examples. You cannot actually delete a row, but you can access a dataframe without some rows specified by negative index. This process is also called subsetting in R language. To delete a row, provide the row number as index to the Dataframe.