r dplyr ends_with multiple string matches
dplyr select multiple conditions
dplyr::select multiple columns
r select rows containing string
r partial string match
contains in r
r contains text
Can I use dplyr::select(ends_with) to select column names that fit any of multiple conditions. Considering my column names, I want to use ends with instead of contains or matches, because the strings I want to select are relevant at the end of the column name, but may also appear in the middle in others. For instance,
df <- data.frame(a10 = 1:4, a11 = 5:8, a20 = 1:4, a12 = 5:8)
I want to select columns that end with 1 or 2, to have only columns a11 and a12. Is select(ends_with) the best way to do this?
You can also do this using regular expressions. I know you did not want to use matches initially, but it actually works quite well if you use the "end of string" symbol
$. Separate your various endings with
df <- data.frame(a10 = 1:4, a11 = 5:8, a20 = 1:4, a12 = 5:8) df %>% select(matches('1$|2$')) a11 a12 1 5 5 2 6 6 3 7 7 4 8 8
If you have a more complex example with a long list, use
collapse = '|'.
dff <- data.frame(a11 = 1:3, a12 = 2:4, a13 = 3:5, a16 = 5:7, my_cat = LETTERS[1:3], my_dog = LETTERS[5:7], my_snake = LETTERS[9:11]) my_cols <- paste0(c(1,2,6,'dog','cat'), '$', collapse = '|') dff %>% select(matches(my_cols)) a11 a12 a16 my_cat my_dog 1 1 2 5 A E 2 2 3 6 B F 3 3 4 7 C G
selecting vars with `starts_with`, `ends_with`, `contains` and , selecting vars with `starts_with`, `ends_with`, `contains` and `matches` return wrong result when given pattern does not exist #498. Closed. These selection helpers match variables according to a given pattern. starts_with() : Starts with a prefix. ends_with() : Ends with a suffix. contains() : Contains a literal string. matches() : Matches a regular expression. num_range() : Matches a numerical range like x01, x02, x03.
I don't know if
ends_with() is the best way to do this, but you could also do this in base R with a logical index.
# Extract the last character of the column names, and test if it is "1" or "2" lgl_index <- substr(x = names(df), start = nchar(names(df)), stop = nchar(names(df))) %in% c("1", "2")
With this index, you can subset the dataframe as follows
df[, lgl_index] a11 a12 1 5 5 2 6 6 3 7 7 4 8 8
select(df, which(lgl_index)) a11 a12 1 5 5 2 6 6 3 7 7 4 8 8
keeping only columns that end with either 1 or 2.
Subset columns using their names and types, Tidyverse selections implement a dialect of R where operators make it easy to select These helpers select variables by matching patterns in their names: starts_with() : Starts with a prefix. ends_with() : Ends with a suffix. contains() : Contains a literal string. Select multiple variables by separating them with commas. ends_with(): ends with a prefix. contains(): contains a literal string. matches(): Group by multiple columns in dplyr, using string vector input. 170.
From version 1.0.0, you can combine multiple selections using Boolean logic such as
& (and) and
### Install development version on GitHub first until CRAN version is available # install.packages("devtools") # devtools::install_github("tidyverse/dplyr") library(dplyr, warn.conflicts = FALSE) df <- data.frame(a10 = 1:4, a11 = 5:8, a20 = 1:4, a12 = 5:8) df %>% select(ends_with("1") | ends_with("2")) #> a11 a12 #> 1 5 5 #> 2 6 6 #> 3 7 7 #> 4 8 8
num_range() to select the desired columns
df %>% select(num_range(prefix = "a", range = 11:12)) #> a11 a12 #> 1 5 5 #> 2 6 6 #> 3 7 7 #> 4 8 8
Created on 2020-02-17 by the reprex package (v0.3.0)
select_helpers: Select helpers in tidyselect: Select from a Set of Strings, ends_with() : Ends with a suffix. all_of() : Matches variable names in a character vector. When called from inside selecting functions like dplyr::select() these are automatically set to the names of the table. 2)) # With multiple matchers, the union of the matches is selected: R Package Documentation. # '^' anchors the match at the beginning of the string and # '$' anchors the match at the end of the string. select(df, variable_1_name, matches("^variable_2_name$")) this should just match variable_2_name exactly.
The complete catalog of argument variations of select() in dplyr, When I read the dplyr vignette, I found a convenient way to select select(data, ends_with("time")) To pick multiple columns, you can write the following. picks columns based on a regular expression matching string. Aggregating to String and Summing the Values Associated to Aggregate in R 3.3.0 Dplyr v 0.5.0 5 Using dplyr to filter rows which contain partial string of column
select_helpers function, ends_with() : ends with a prefix. contains() : contains a literal string. matches() : matches a regular expression. num_range() : a numerical range like x01, x02, Here you will find daily news and tutorials about R, contributed by hundreds of bloggers. There are many ways to follow us - By e-mail:
select function, This is to prevent accidental matching of data frame variables when you refer to starts_with("Petal")) select(iris, ends_with("Width")) # Move Species variable to This is unlike other verbs where strings would be # ambiguous. vars <- c(var1 Documentation reproduced from package dplyr, version 0.7.8, License: MIT + Partial matching. Suppose you would like to filter all Mercs; the Mercs include “Merc 240D”, “Merc 280C” and other. So we cannot filter for “Merc” as an exact search string. We need to tell R, “hey if ‘Merc’ is a part of this string, then filter it, otherwise leave it”.
- If you don't have many conditions, it may be viable to simply use
ends_withmultiple times, i.e.
df %>% select(ends_with("1"), ends_with("2")).