## R: Filter vectors by 'two-way' partial match

subset vector r

filter in r

r filter list

r filter logical vector

r subset list

filter r documentation

subset in r

With two vectors

x <- c("abc", "12") y <- c("bc", "123", "nomatch")

is there a way to do a filter of both by 'two-way' partial matching (remove elements in one vector if they contain or are contained in any element in the other vector) so that the result are these two vectors:

x1 <- c() y1 <- c("nomatch")

To explain - every element of x is either a substring or a superstring of one of the elements of y, hence x1 is empty. Update - it is not sufficient for a substring to match the initial chars - a substring might be found anywhere in the string it matches. Example above has been updated to reflect this.

I originally thought `?pmatch`

might be handy, but your edit clarifies you don't just want to match the start of items. Here's a function that should work:

remover <- function(x,y) { pmx <- sapply(x, grep, x=y) pmy <- sapply(y, grep, x=x) hit <- unlist(c(pmx,pmy)) list( x[!(seq_along(x) %in% hit)], y[!(seq_along(y) %in% hit)] ) } remover(x,y) #[[1]] #character(0) # #[[2]] #[1] "nomatch"

It correctly does nothing when no match is found (thanks @Frank for picking up the earlier error):

remover("yo","nomatch") #[[1]] #[1] "yo" # #[[2]] #[1] "nomatch"

**How do I filter a numeric vector? : rstats,** I'd strongly recommend getting familiar with the basic syntax of R (try An Introduction to R) before jumping in to dplyr . dplyr is a great tool, but you won't be able to In R, a special object known as a data frame resolves this problem. A data frame is like a matrix in that it represents a rectangular array of data, but each column in a data frame can be of a different mode, allowing numbers, character strings and logical values to coincide in a single object in their original forms.

We can do the following:

# Return data.frame of matches of a in b m <- function(a, b) { data.frame(sapply(a, function(w) grepl(w, b), simplify = F)); } # Match x and y and remove x0 <- x[!apply(m(x, y), 2, any)] y0 <- y[!apply(m(x, y), 1, any)] # Match y and x and remove x1 <- x0[!apply(m(y0, x0), 1, any)] y1 <- y0[!apply(m(y0, x0), 2, any)] x1; #character(0) x2; #[1] "nomatch"

**Subsetting · Advanced R.,** Think of filtering your sock drawer by color, and pulling out only the black socks. Whenever I need to filter in R, I turn to the dplyr filter function. filtering a dataframe with a vector of rownames. Hello, Here's my problem. I have a large data frame and a vector with some of its row names. I'd like to have a new data frame only with those

I build a matrix of all possible matches in both directions, then combine both with `|`

as a match in any direction is equally a match, and then and use it to subset `x`

and `y`

:

x <- c("abc", "12") y <- c("bc", "123", "nomatch") bool_mat <- sapply(x,function(z) grepl(z,y)) | t(sapply(y,function(z) grepl(z,x))) x1 <- x[!apply(bool_mat,2,any)] # character(0) y1 <- y[!apply(bool_mat,1,any)] # [1] "nomatch"

**How to Filter in R: A Detailed Introduction to the dplyr Filter Function ,** Description Usage Arguments Value Examples. View source: R/list.filter.R. Description. The function recursively filters the data by a given series of conditions. The d in **d**plyr stands for data frame, and the first argument of most any dplyr verb (such as filter) is a data frame. You don't have a data frame, so don't use dplyr. vector[vector != 1] will do what you want. I'd strongly recommend getting familiar with the basic syntax of R (try An Introduction to R) before jumping in to dplyr.

**Accessing vectors,** This video looks at how to index and filter vectors in R. Note that the video shows basic R Duration: 4:59
Posted: Mar 4, 2013 R For Dummies. You use the same indexing rules for character vectors that you use for numeric vectors (or for vectors of any type). The process of referring to a subset of a vector through indexing its elements is also called subsetting. In other words, subsetting is the process of extracting a subset of a vector.

**list.filter: Filter a list or vector by a series of conditions in rlist: A ,** Filtering. List filtering is to select list elements by given criteria. In rlist package, more than ten functions are related with list filtering. Basically R : int 3 # .. ..$ Java: In this case, they simply take out the first/last element from the list or vector. R - Vectors Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical, integer, double, complex, character and raw.

**3.2 Indexing and filtering vectors in R,** Return subsets of vectors, matrices or data frames which meet conditions. Usage. subset(x, ) ## Default S3 method: subset(x, subset, ) ## S3 method Filter( isGoodNumber, v) #[1] 5 5 5 5 There would be the option of making a function that was vectorized, either by by the use of the Vectorize function (already illustrated) or writing it with ifelse (also mentioned) and there would be the option of a function that was "Filter"-like

##### Comments

- How much partial matching do you want to do? for
`abc`

, does`a`

match? Does`abcd`

match? Does`cab`

match? - If "a" is an element, then it matches with any string containing "a" as it is a substring of them. However, if "ab" is an element, then it doesn't match "ac" because neither is a substring or superstring of the other.
- This seems to only work if the substring matches the initial chars. For example, it should also match "bc" with "abc". I'll clarify in the question.
- @Vlad "Partial matching" means matching the initial chars, fyi. Btw, generally bad form to edit the question in a way that invalidates an answer...
- @Frank - I think I've sorted it now, thanks for the comment.
- @Frank - please could you provide a reference to 'partial matching' matching initial characters.
- @Vlad Eg
`?charmatch`

says "partial matches (those where the value to be matched has an exact match to the initial part of the target, but the target is longer)." This terminology might be unique to R... my knowledge of regex/string parsing is not very broad.