Rearrange columns based on coverage of previous columns

how to move columns in excel
how do you rearrange columns in excel on a mac?
how to move rows in excel without replacing
how to move cells in excel
how to move rows in excel mac
how to switch columns in excel
how to reorder rows in excel
how to copy a column in excel

I'm working on a test coverage analysis and I would like to rearrange a matrix so that the columns are ordered by number of "additional" test failures.

As an example I have a matrix with TRUE and FALSE where TRUE indicates a failure.

df <- structure(c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE), .Dim = c(10L, 3L), .Dimnames = list(NULL, c("t1", "t2", "t3")))

t2 has the highest number of failures and should be the first column. t1 has the next highest but all its failures (per row) are covered by t2. t3 however has fewer failures but the last two failures are not covered by t2 thus should be the second column.

Desired column order based on fail coverage:

df <- structure(c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE), .Dim = c(10L, 3L), .Dimnames = list(NULL, c("t2", "t3", "t1")))

I was able to get a count of "additional" fails per test using a for loop in conjunction with apply function but performance is really bad when there is a lot of columns and rows in the data set. I do however prefer to rearrange the column for further processing.

for (n in 2:ncol(out)) {
  idx <- which.max(apply(out, 2, sum, na.rm = T))
  col.list <- c(col.list, names(idx))
  val.list <- c(val.list, sum(out.2[ ,idx], na.rm = T))
  out[out[ ,idx] == T, ] <- F
  out <- out[ ,-idx]
}

Can anyone suggest a better approach to do this? Maybe not using a for loop?

Thanks.

Here's a somewhat similar approach to OP's but I hope it will perform slightly better (not tested though):

select_cols <- names(tail(sort(colSums(df)), 1)) # first col
for(i in seq_len(ncol(df)-1)) {
  remaining_cols <- setdiff(colnames(df), select_cols)
  idx <- rowSums(df[, select_cols, drop=FALSE]) > 0
  select_cols <- c(select_cols, 
                   names(tail(sort(colSums(df[!idx, remaining_cols, drop=FALSE])), 1)))
}
df <- df[, select_cols]
df

#        t2    t3    t1
# [1,]  TRUE FALSE  TRUE
# [2,]  TRUE FALSE  TRUE
# [3,]  TRUE FALSE  TRUE
# [4,]  TRUE FALSE  TRUE
# [5,]  TRUE FALSE  TRUE
# [6,]  TRUE FALSE  TRUE
# [7,]  TRUE FALSE FALSE
# [8,]  TRUE  TRUE FALSE
# [9,] FALSE  TRUE FALSE
# [10,] FALSE  TRUE FALSE

Update: try this slightly modified version - it is a lot faster and I think it will produce correct results:

  select_cols <- names(tail(sort(colSums(m)), 1)) # first col
  idx <- rowSums(m[, select_cols, drop = FALSE]) > 0
  for(i in seq_len(ncol(m)-1)) {
    remaining_cols <- setdiff(colnames(m), select_cols)
    idx[!idx] <- rowSums(m[!idx, select_cols, drop=FALSE]) > 0
    select_cols <- c(select_cols, 
                     names(tail(sort(colSums(m[!idx, remaining_cols, drop=FALSE])), 1)))
  }
  m <- m[, select_cols]
  m

The main difference between the two is this line:

idx[!idx] <- rowSums(m[!idx, select_cols, drop=FALSE]) > 0

which means we don't need to compute rowSums for rows where any previously selected column is already true.

How to move columns in Excel to organize spreadsheet data , The ability to move columns in Excel is a great benefit when dealing Select a whole column by clicking on the heading of the column so it If you move the column to an empty space, it will move its formatting as well, leaving its former column empty Related coverage from How To Do Everything: Tech:  A simple VBA script for rearranging the order of columns in an Excel spreadsheet. In the following worksheet example, the columns are arranged in the order Header 1, Header 2, Header 3, Header 4, Header 5 and Header 6. An Excel spreadsheet for which the columns need to be rearranged.

Here my solution which is based on a shortcut.

df <- as.data.frame(df)
df_new <- df
index <- NULL
for (i in 1:dim(df)[2]) {
  var <- names(sort(apply(X=df, MARGIN=2, sum), decreasing = T))[1]
  index = c(index, var)
  df<-df[df[,var]==F,]
}
df_new[,c(index)]

If only new failure counts we can iterate a loop by:

  1. take the variable with more failures
  2. remove data where previous variable had failures
  3. retake another variable with more failures.

Step 2 allows to makes the loop faster, steps 1 and 3 are based on apply.

Hope it helps!

Rearranging Columns | Sorting Data in Microsoft Excel 2016, Rearranging Columns. By default, data is sorted by rows, but you can choose to sort columns instead. And with the following trick, you aren't  Though you can just drag and drop (while holding the ‘Shift’ key) to move columns in the same worksheet, moving columns to a different worksheet is a different story. In this case, you have to either cut (if you want to remove it from its previous sheet) or copy it (if you want to retain the column). Back to our example.

Here's an alternative working with data in long format instead. I use data.table functions, but it could be adapted to base if desired. I hope I understood your logic correctly ;) At least I try to explain my understanding in the commented code.

# convert matrix to data.table
dt <- as.data.table(df)

# add row index, 'ci'
dt[ , ri := 1:.N]

# melt to long format
d <- melt(dt, id.vars = "ri", variable.factor = FALSE, variable.name = "ci")

# determine first column
# for each 'ci' (columns in 'df'), count number of TRUE
# select 'ci' with max count
first_col <- d[ , sum(value), by = ci][which.max(V1), ci]

# for each 'ri' (rows in 'df'),
# check if number of unique 'ci' is one (i.e. "additional" test failures)    
d[(value), new := uniqueN(ci) == 1, by = ri]

# select rows where 'new' is TRUE
# for each 'ci', count the number of rows, i.e the number of 'new'
# -> number of rows in 'df' where this column is the only TRUE
d_new <- d[(new), .(n_new = .N), ci]

# set order to descending 'n_new'
setorder(d_new, -n_new)

# combine first column and columns which contribute with additional TRUE
cols <- c(first_col, setdiff(d_new[ , ci], first_col)) 

# set column order. 
# First 'cols', then any columns which haven't contributed with new values
# (none in the test data, but needed for more general cases)  
setcolorder(dt, c(cols, setdiff(names(dt), cols)))

dt
#        t2    t3    t1 ri
#  1:  TRUE FALSE  TRUE  1
#  2:  TRUE FALSE  TRUE  2
#  3:  TRUE FALSE  TRUE  3
#  4:  TRUE FALSE  TRUE  4
#  5:  TRUE FALSE  TRUE  5
#  6:  TRUE FALSE  TRUE  6
#  7:  TRUE FALSE FALSE  7
#  8:  TRUE  TRUE FALSE  8
#  9: FALSE  TRUE FALSE  9
# 10: FALSE  TRUE FALSE 10

Tried it on a matrix of the size mentioned in comment:

set.seed(1)
nr <- 14000
nc <- 1400
df <- matrix(sample(c(TRUE, FALSE), nr*nc, replace = TRUE), nr, nc,
             dimnames = list(NULL, paste0("t", 1:nc)))

Finished in < 5 seconds.

Analytical Models For Decision-Making, Epidemiology and norms for 'coverage' Behind the use of activity-based norms click on 'move or copy'; tick the 'create a copy' box, in the 'Before sheet' click on Insert five new columns between the population columns (D, E and F) and the  Reorder or Rearrange the column of dataframe in pandas python Reordering or Rearranging the column of dataframe in pandas python can be done by using reindex function. In order to reorder or rearrange the column in pandas python.

On the Move to Meaningful Internet Systems: OTM 2019 Conferences: , In 2nd and 3rd part, every square in 1st column was used; every row was used; ER2018 papers based on only title, abstract, and conclusions Coverage of CM  To apply conditional formatting based on a value in another column, you can create a rule based on a simple formula. In the example shown, the formula used to apply conditional formatting to the range D5:D14 is:

Sonar Code Quality Testing Essentials, Use thecolumn controls to rearrange columns or remove them from thelist. management, it is time to create a metricsoriented filter focusing on Test Coverage. Sort rows to match another column . To sort rows to match another column, here is a formula can help you. 1. Select a blank cell next to the column you want to resort, for instance B1, and then enter this formula =MATCH(A1,C:C,FALSE), and drag autofill handle down to apply this formula.

[PDF] Matrix Reordering Methods for Table and Network Visualization, how to select appropriate algorithms depending on the structure and size of the matrix require a reordering of rows, respectively columns, to reveal higher There has been several prior surveys describing matrix reorder- Our coverage is. Transpose cells in one column based on unique values with VBA code. May be the formulas are complex for you to understand, here, you can run the following VBA code to get the desired result you need. 1. Hold down the ALT + F11 keys to open the Microsoft Visual Basic for Applications window. 2.

Comments
  • It seems rowSums is very slow. I have a matrix with 12k observations and 1400 variables. It takes ages to process. Do you know of an alternative? Thanks.
  • @docendo_discimus Thanks for your comments. I've also updated my version, I am really curious if can be of help!
  • I don't think that ranking sums can lead to random results.
  • You were right, sort instead of rank solved the issue. I worked around this solution because is based on apply as requested by the question owner.
  • I understand from your code that you are sorting columns by the number of fails per column. However what I really need is to sort the columns by the number of additional fails, meaning that if two column have matching fails per row then one of the two will probably rank last because it's not adding new fails to the total.