Remove df rows using information about unrepeated levels between two vectors

r extract rows with certain value
r subset dataframe by list of values
r subset dataframe by column value
r subset by condition
extract values from vector in r
r data frame manipulation
subset in r
drop observations in r
df <- data.frame(X = c("a", "b", "c", "a", "b", "c", "a", "b", "c", "d" , "a", "b", "c", "d", "e"), 
                  Y = c("w", "w", "w", "K", "K", "K", "L", "L", "L", "L", "Z", "Z", "Z", "Z", "Z"))

Note that the first vector has 5 levels and the second has 4 levels. My goal is to select df lines that have all levels of vector 1 in common as vector 2. That is, I want to select lines that have levels "a", "b" and "c" since " d "appears only twice" and "appears only in vector 1.

I tried to make a list with the common levels and leave only the lines with the common levels by subset. However, it doesn't work because this level list doesn't generate the address of the lines I want to remove. Ex:

common <- c ("a", "b", "c")
df2 <- df [c(common),]

In my real df, there are 64 levels in common, so it doesn't happen "to do by hand". Can someone help me?

I think this is what you want. Essentially splitting X by Y, then looking for all intersecting values that are in every set.

df[df$X %in% Reduce(intersect, split(df$X, df$Y)),]

#   X Y
#1  a w
#2  b w
#3  c w
#4  a K
#5  b K
#6  c K
#7  a L
#8  b L
#9  c L
#11 a Z
#12 b Z
#13 c Z

Subsetting · Advanced R., What is the result of subsetting a vector with positive integers, negative integers, Each row in the matrix specifies the location of one value, where each column Fix each of the following common data frame subsetting errors: Factor: drops any unused levels. There are two ways to remove columns from a data frame. You need to execute df.drop_duplicates() to remove duplicate rows from your data frame. In case, there are no duplicates, you can use the drop() method to remove the rows from your data frame. # Check out the DataFrame ‘df’ print(_) # Drop the index at position 1 df.____(df.index[_])?

Another way could be to group_by X and select groups which has all distinct values in Y.

library(dplyr)

df %>%
  group_by(X) %>%
  filter(n_distinct(Y) == n_distinct(.$Y))

#   X     Y    
# <fct> <fct>
# 1 a     w    
# 2 b     w    
# 3 c     w    
# 4 a     K    
# 5 b     K    
# 6 c     K    
# 7 a     L    
# 8 b     L    
# 9 c     L    
#10 a     Z    
#11 b     Z    
#12 c     Z    

In base R, that would be using ave

subset(df, as.logical(ave(as.character(Y), X, 
          FUN = function(x) length(unique(x)) == length(unique(Y)))))

R for Reproducible Scientific Analysis: Subsetting Data, To be able to skip and remove elements from various data structures. R has many Let's start with the workhorse of R: a simple numeric vector. x <- c(5.4, 6.2​,  the lesson “Identify and Remove Duplicate Data in R” was extremely helpful for my task, Question: two dataframes like “iris”, say iris for Country A and B, the dataframes are quite large, up to 1 mio rows and > 10 columns, I’d like to check, whether a row in B contains the same input in A. E.g. in ‘iris’ row 102 == 143;

Using data.table

library(data.table)
setDT(df)[, .SD[uniqueN(Y) == uniqueN(df$Y)], by = X]

R for Reproducible Scientific Analysis, Describe the purpose and use of each pane in the RStudio IDE mathematics, a vector in R describes a set of values in a certain order of the same data type. In modelling functions, it's important to know what the baseline levels are. Each row is an observation of different variables, itself a data.frame, and thus can be  There’s two empty cells, and one with “Nan”. These are obviously missing values. We can see how R recognizes these using the is.na function.. First let’s print out that column and then apply is.na.

15 Easy Solutions To Your Data Frame Problems In R, Discover how to create a data frame in R, change column and row names, access With the data frame, R offers you a great first step by allowing you to store your However, it's a list with vector structures of the same length. You can use factor() to remove the factor levels that are no longer present, you  If you want to identify and remove duplicate rows in a DataFrame, there are two methods that will help: duplicated and drop_duplicates. Each takes as an argument the columns to use to identify duplicated rows. duplicated returns a boolean vector whose length is the number of rows, and which indicates whether a row is duplicated.

4 Subsetting, What is the result of subsetting a vector with positive integers, negative This means that subsetting will use the underlying integer vector, not the character levels. Each row in the matrix specifies the location of one value, and each column By default, subsetting a matrix or data frame with a single number, a single  If the 'rows' flag is not specified, then C is a column vector unless both A and B are row vectors, in which case C is a row vector. If the 'rows' flag is specified, then C is a matrix containing the rows in common from A and B .

[PDF] Package 'qdap', packages that undertake higher level analysis and visualization of text. NULL or a character vector giving the row names for the data frame. A character vector of words to remove from the text. qdap has a number of data For non-​repeated measures data/plotting use gantt; for repeated measures data  In such case, we know the possible values beforehand and these predefined, distinct values are called levels. Following is an example of factor in R. > x [1] single married married single Levels: married single Here, we can see that factor x has four elements and two levels. We can check if a variable is a factor or not using class() function.

Comments
  • Solved! Thanks!
  • Solved! Thanks!
  • Solved! Thanks!