R dplyr drop column that may or may not exist select(-name)

dplyr select columns by index
dplyr check if column exists
r select multiple columns by name
r add column to dataframe if not exists
dplyr rename if exists
r add a column if it doesn t exist
dplyr filter if variable exists
dplyr starts_with
library(ggplot2)
library(dplyr)

diamonds <- diamonds %>% select(-clarity)

# this works fine
# but doing it again gives me an error
diamonds %>% select(-clarity)

Error in is_character(x) : object 'clarity' not found

How do I do a safe drop/deselect?

You can do:

diamonds %>% 
 select(-one_of("clarity"))

If there is a non-existing variable:

diamonds %>% 
 select(-one_of("clarity", "clearness"))

it returns a warning:

Warning message:
Unknown columns: `clearness` 

Subset columns using their names and types — select • dplyr, You can also use predicate functions like is.numeric to select variables based Tidyverse selections implement a dialect of R where operators make it easy any_of() : Same as all_of() , except that no error is thrown for names that don't exist. Question: I have a list where each element in the list is a data frame. > df.list[[1]] Change Diff VarName 1 10.433354 5.311973e-02 a 2 4.587958 1.517604e-02 b 3 4.566829 1.082679e-02 c 4 4.464458 1.345807e-02 d 5 4.146909 7.758011e-03 e 6 4.141556 1.416043e-02 f > df.list[[2]] Change Diff VarName 1 12.443354 5.311973e-02 j 2 3.587958 1.517604e-02 k 3 4.566829 1.082679e-02 a 4 4.464458 1

Here's a simple modification to the one_of method shown by tmfmnk to work with symbols like select. The input is converted to quosures then to character.

library(tidyverse) # or just dplyr and purrr

drop_cols <- function(df, ...){
  df %>% 
    select(-one_of(map_chr(enquos(...), quo_name)))
}

diamonds %>% 
  drop_cols(clarity, color, zebra)

# # A tibble: 53,940 x 8
#    carat cut       depth table price     x     y     z
#    <dbl> <ord>     <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#  1 0.23  Ideal      61.5    55   326  3.95  3.98  2.43
#  2 0.21  Premium    59.8    61   326  3.89  3.84  2.31
#  3 0.23  Good       56.9    65   327  4.05  4.07  2.31
#  4 0.290 Premium    62.4    58   334  4.2   4.23  2.63
#  5 0.31  Good       63.3    58   335  4.34  4.35  2.75
#  6 0.24  Very Good  62.8    57   336  3.94  3.96  2.48
#  7 0.24  Very Good  62.3    57   336  3.95  3.98  2.47
#  8 0.26  Very Good  61.9    55   337  4.07  4.11  2.53
#  9 0.22  Fair       65.1    61   337  3.87  3.78  2.49
# 10 0.23  Very Good  59.4    61   338  4     4.05  2.39
# # ... with 53,930 more rows
# Warning message:
# Unknown columns: `zebra`

Selecting columns and renaming are so easy with dplyr, This is because there are certain rules on which letters can be used or not for the column names. In this case, “NA” is not a valid name for the column so I had to use the back-ticks. In other Select or Drop columns with Github data. Dropping select(-starts_with("user. R Packages used in this post� may or may not have to be removed, therefore, be sure that it is necessary to do so before eliminating outliers. Other Ways of Removing Outliers . Now that you know what outliers are and how you can remove them, you may be wondering if it’s always this complicated to remove outliers. Fortunately, R gives you faster ways to get rid of them as

Here's a slight twist using dplyr::select_if() that will not throw an Unknown columns: warning if the column name does not exist, in this case 'bad_column':

diamonds %>% 
  select_if(!names(.) %in% c('carat', 'cut', 'bad_column'))

Smartly select and mutate data frame columns, using dict, Motivation Column operations Add Modify Remove Benchmark Summary The dplyr functions select and mutate nowadays are commonly it may not if the data .frame has a lot of columns or if column names As we saw above, if a column does not exist, mutate silently creates it for you. select(-Sepal. Methods tbl_df implements four important base methods: print By default only prints the first 10 rows (at most 20), and the columns that fit on screen; see print.tbl() [ Does not simplify (drop) by default, returns a data frame [[, $ Calls .subset2() directly, so is considerably faster. Returns NULL if column does not exist, $ warns.

Compute and Add new Variables to a Data Frame in R, It preserves existing variables. transmute(): compute new columns but drop existing transmute_at(): apply a function to specific columns selected with a character mutate_if() / transmute_if() can be used to modify multiple columns at once. how to add new variable columns into a data frame using the dplyr functions:� If you are accustomed to manipulating data with SQL, you may prefer SQL notation to the sometimes convoluted calling conventions of the analogous R operations like aggregate(), or the functions in the dplyr package. In this case you have probably already discovered the sqldf package, which allows you to manipulate data frames using SQL.

select function, You can also pass the column names as strings. Again, this is the same result. library(dplyr) warpbreaks %>% as_data_frame %>% select_("breaks"� Those heights of 152 and 170 are in centimeters while everything else is inches. There are various ways to fix it, but one way is to check which values are less than, say 90, which is probably a safe cutoff and create a new column that keeps those values under 90 but converts the values over 90.

5 Data transformation, It tells you that dplyr overwrites some functions in base R. If you want to use the There are three other common types of variables that aren't used in this dataset but For example, we can select all flights on January 1st with: It takes a data frame and a set of column names (or more complicated expressions) to order by. similarly I'm already using kind of dt[,.SD][, b:=5:7]. I'm not sure if adding another arg to [is a good idea because we would have soon tons of useful args there. Surely the copy feature control is extremely important for data.table, at the moment I don't have any better ideas other than additional arg proposed by brodieG.