Simplify selection of similar named columns by condition

r subset dataframe by list of values
r select rows multiple conditions
r subset dataframe by column name
r subset matrix by column names
r subset dataframe by column value
r extract multiple elements from list
r select rows by condition
subset vector r

The example code below is working but my question is how can I write this with less code and more elegant?

The point is I have columns with similar names. In this example I want to select all columns beginning with B and where all are with value FALSE.

set.seed(0)

df <- data.frame(A = sample(c(T, F), 100, replace=T),
                 B1 = sample(c(T, F), 100, replace=T),
                 B2 = sample(c(T, F), 100, replace=T),
                 B3 = sample(c(T, F), 100, replace=T))

n <- names(df)[startsWith(names(df), 'B')]

result <- df[df$B1 == FALSE & df$B2 == FALSE & df$B3 == FALSE, n]

print(result)

The result is

      B1    B2    B3
1  FALSE FALSE FALSE
26 FALSE FALSE FALSE
31 FALSE FALSE FALSE
35 FALSE FALSE FALSE
51 FALSE FALSE FALSE
66 FALSE FALSE FALSE
70 FALSE FALSE FALSE
84 FALSE FALSE FALSE

What I tried it this with unexpected results

df[df[,n] == FALSE, n]

With tidyverse :

df %>% select(matches("^B")) %>% filter_all(all_vars(.==FALSE))

or, if you want to check the row numbers :

df %>% mutate(id=row_number()) %>%                  # copy row number to new variable 
  select(id,matches("^B")) %>%                      # keeps id and variables beginning with B
  filter_at(vars(matches("^B")),                    # for variables beginning with B
            all_vars(.==FALSE))                     # keep rows where all are FALSE
#  id    B1    B2    B3
#1  1 FALSE FALSE FALSE
#2 26 FALSE FALSE FALSE
#3 31 FALSE FALSE FALSE
#4 35 FALSE FALSE FALSE
#5 51 FALSE FALSE FALSE
#6 66 FALSE FALSE FALSE
#7 70 FALSE FALSE FALSE
#8 84 FALSE FALSE FALSE

Subsetting · Advanced R., Character vectors to return elements with matching names. a #> 2 2 b #> 3 3 c # There's an important difference if you select a single # column: matrix subsetting simplifies by default Selecting rows based on a condition (logical subsetting). Yes, it looks similar to the second example. Typically, each keyword is ranked 5-6 times per day. I’ll try to provide a better picture of my schema in the morning. longneck August 30, 2014, 11

In base R, we can select the columns which start with "B" and then select rows whose sum is equal to 0 using rowSums.

inds <- grepl("^B", names(df))
df[rowSums(df[inds]) == 0, inds]

#      B1    B2    B3
#1  FALSE FALSE FALSE
#26 FALSE FALSE FALSE
#31 FALSE FALSE FALSE
#35 FALSE FALSE FALSE
#51 FALSE FALSE FALSE
#66 FALSE FALSE FALSE
#70 FALSE FALSE FALSE
#84 FALSE FALSE FALSE

Or as @snoram mentions , to make it more concise we can do

df[!rowSums(df[inds]), inds]

CBSE Simplified C++: For Class 12, Delete command can also be used with condition to delete a particular row. DESC Syntax:- SELECT column-names FROM table-name WHERE condition  Let's assume the names are in Column A and the dollar amounts are in Column B, then use this formula to get your total =SUMIF(A:A,<<YourName>>,B:B) where <<YourName>> can be either quoted text or a cell reference to a cell containing the name.

A fast base-R alternative:

df[!do.call(pmax, df[n]), n]

       B1    B2    B3
1  FALSE FALSE FALSE
26 FALSE FALSE FALSE
31 FALSE FALSE FALSE
35 FALSE FALSE FALSE
51 FALSE FALSE FALSE
66 FALSE FALSE FALSE
70 FALSE FALSE FALSE
84 FALSE FALSE FALSE

EDIT

Staying closer to the original attempt you could do:

df[apply(df[n] == FALSE, 1, all), n] 
# or
df[apply(!df[n], 1, all), n]

APA Style Simplified: Writing in Psychology, Education, Nursing, , Step 2 Make a 5 × 5 table by selecting Insert in the toolbar at the top of the In the first column, second row, indicate the first variable name (Sex of participant) and, below the variable name, the conditions for that variable (female and male),​  Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Percentage’ is greater than 80 using basic method. Output : Code #2 : Selecting all the rows from the given dataframe in which ‘Percentage’ is greater than 80 using loc[].

I would do it like this:

your data:

df <- data.frame(A = sample(c(T, F), 100, replace=T),
                 B1 = sample(c(T, F), 100, replace=T),
                 B2 = sample(c(T, F), 100, replace=T),
                 B3 = sample(c(T, F), 100, replace=T))

code:

df<- as.data.frame(!df[,grepl("^B",names(df))])

!df[apply(df,1,all),]

result:

#      B1    B2    B3
#1  FALSE FALSE FALSE
#26 FALSE FALSE FALSE
#31 FALSE FALSE FALSE
#35 FALSE FALSE FALSE
#51 FALSE FALSE FALSE
#66 FALSE FALSE FALSE
#70 FALSE FALSE FALSE
#84 FALSE FALSE FALSE

LEARNING ORACLE SQL & PL/SQL: A SIMPLIFIED GUIDE, A SIMPLIFIED GUIDE RAJEEB C. CHATTERJEE If the condition is satisfied, named columns of the row are updated as Comparison of stored value of more than one column-name of same datatype in the same verify that the condition is valid to identify the row using 'select' command which we will learn in Chapter 7. How to name columns in Excel (names from selection) If your data is arranged in a tabular form, you can quickly create names for each column and/or row based on their labels: Select the entire table including the column and row headers. Go to the Formulas tab > Define Names group, and click the Create from Selection button.

In base R, we can do

df[!Reduce(`|`, df[grep("^B", names(df))]),]
#       A    B1    B2    B3
#1  FALSE FALSE FALSE FALSE
#26  TRUE FALSE FALSE FALSE
#31  TRUE FALSE FALSE FALSE
#35  TRUE FALSE FALSE FALSE
#51 FALSE FALSE FALSE FALSE
#66 FALSE FALSE FALSE FALSE
#70  TRUE FALSE FALSE FALSE
#84  TRUE FALSE FALSE FALSE

Sql Simplified:: Learn to Read and Write Structured Query Language, After the SELECT keyword the alternate table names are combined with the column names The asterisk is used to retrieve every column from the Sales table. It additionally sets a condition to specify only the SupplyID's that equal CP100. Select cells based on certain criteria with Kutools for Excel . Kutools for Exce l: with more than 300 handy Excel add-ins, free to try with no limitation in 30 days. Get it Now. After installing Kutools for Excel, please do as following steps: 1. Select the data range that you want and click Kutools > Select Tools > Select Specific Cells

Professional SQL Server 2005 Integration Services, You can give the column a new name under the Output Alias column. Lastly, select the operation you wish to perform on the inputted column. The available  =SUMIF(range,criteria,sum_range) range is the range of cells you want to add up. It is required for the function to work. criteria is the criteria which must be met for a cell to be included in the total. It is also required. sum_range is the range of cells that will be added up.

4 Subsetting, How can you use a named vector to relabel categorical variables? If the vector is named, you can also use character vectors to return elements with matching names. an important difference if you select a single # column: matrix subsetting simplifies by 4.5.7 Selecting rows based on a condition (logical subsetting). =SUMIFS is an arithmetic formula. It calculates numbers, which in this case are in column D. The first step is to specify the location of the numbers: =SUMIFS(D2:D11, In other words, you want the formula to sum numbers in that column if they meet the conditions.

Simplifying lists, hoist() is similar to unnest_wider() but only plucks out selected components, and can reach down multiple levels. A very large number of data rectangling problems  Select variables (columns) in R using Dplyr – select () Function. Select function in R is used to select variables (columns) in R using Dplyr package. Dplyr package in R is provided with select() function which select the columns based on conditions. We will be using mtcars data to depict the select() function.

Comments
  • you can use starts_with('B') instead of matches(^B) as it's designed just for this use case.
  • If concision matters: df[!rowSums(df[inds]), inds]
  • @Nicolas2 didn't run the set.seed function. Thanks, updated the answer :)
  • Produces Error in [.data.frame(df, !do.call(pmax, df[n]), n) : objet 'n' introuvable
  • @Nicolas2 you also have to define n as in the OP: n <- names(df)[startsWith(names(df), 'B')]