How to remove rows that contain identical pairs in opposite order in 2 columns

how to remove duplicate columns in sql query
removing duplicate pairs in sql
remove duplicate pairs mysql
how to find duplicate records in sql without group by
sql remove duplicates
sql select pairs of rows
relational algebra find duplicates
sql find distinct pairs

In a correlation matrix I would like to get rid of the rows that are basically containing the same information as another row, except instead of "A" and "B" in var1 and var2 column contain "B" and "A" respectively

   var1 var2      value
1   cyl  mpg -0.8521620
2  disp  mpg -0.8475514
3    wt  mpg -0.8676594
4   mpg  cyl -0.8521620
5  disp  cyl  0.9020329
6    hp  cyl  0.8324475
7    vs  cyl -0.8108118
8   mpg disp -0.8475514
9   cyl disp  0.9020329
10   wt disp  0.8879799
11  cyl   hp  0.8324475
12  mpg   wt -0.8676594
13 disp   wt  0.8879799
14  cyl   vs -0.8108118

Here we could drop for instance row 4 with mpg vs cyl since we have cyl vs mpg in row 1 already

I know I could filter for unique values in column value, BUT i don't want to do this as with my enormous data set there is actually a chance of getting identical correlation score with multiple pairs of columns. So it has to be done by name matching col var1 and var2

I have this code so far to filter out data rows that are above a certain correlation value, but are not 1 (variable vs itself)

mtcars %>% 
  as.matrix %>%
  cor %>% %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1) %>%
  filter(value > 0.8 | value < -0.8) %>%
  filter(value != 1)


Andre's answer

cor %>% {(function(x){x[upper.tri(x)]<-NA; x})(.)} %>%

is faster, but Rui's answer is more generic and can be applied to other situations other than cor matrix calculations.

Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval cld
   Andre 4.818793 5.113676 5.630160 5.408955 5.704825 22.33730   100  a 
   Rui   5.413692 5.761669 7.531146 6.003656 6.583750 78.02836   100   b

Help with removing duplicate reversed pairs in relational algebra , order of columns doesn't matter SELECT DISTINCT CASE WHEN If you want to delete those rows you might us the 2nd logic, the actual  Open Live Script The easiest way to remove a row or column from a matrix is to set that row or column equal to a pair of empty square brackets []. For example, create a 4-by-4 matrix and remove the second row. A = magic(4)

Another way is simply to filter by var1 < var2.

mtcars %>% 
  as.matrix %>%
  cor %>% %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1) %>%
  filter(value > 0.8 | value < -0.8) %>%
  filter(value != 1) %>%
  filter(var1 < var2)
#  var1 var2      value
#1  cyl  mpg -0.8521620
#2 disp  mpg -0.8475514
#3  cyl disp  0.9020329
#4  cyl   hp  0.8324475
#5  mpg   wt -0.8676594
#6 disp   wt  0.8879799
#7  cyl   vs -0.8108118

Databases and SQL: Sorting and Removing Duplicates, How can I remove duplicate values from a query's results? quant measurement​, we can use the DISTINCT keyword on multiple columns. We can sort in the opposite order using DESC (for “descending”): Remember to use ORDER BY if you want the rows returned to have any sort of consistent or predictable order. Hold Ctrl key to select the two data columns separately, and then click Kutools > Select > Select Same & Different Cells, see screenshot: 2.

Using base:

x <- cor(mtcars)
x[ upper.tri(x, diag = TRUE) | abs(x) < 0.8  ] <- NA
#    Var1 Var2       Freq
# 2   cyl  mpg -0.8521620
# 3  disp  mpg -0.8475514
# 6    wt  mpg -0.8676594
# 14 disp  cyl  0.9020329
# 15   hp  cyl  0.8324475
# 19   vs  cyl -0.8108118
# 28   wt disp  0.8879799

Compared to accepted tidy answer:

  base = {
    x <- cor(mtcars)
    x[ upper.tri(x, diag = TRUE) | abs(x) < 0.8  ] <- NA
  tidy = {
    mtcars %>% 
      as.matrix %>%
      cor %>% {(function(x){x[upper.tri(x, diag = T)]<-NA; x})(.)} %>% %>%
      rownames_to_column(var = 'var1') %>%
      gather(var2, value, -var1) %>%
      filter(value > 0.8 | value < -0.8)    
# Unit: microseconds
# expr      min        lq      mean   median        uq      max neval
# base  683.994  718.1025  790.9333  750.099  796.2825  2288.63   100
# tidy 3278.397 3405.3260 3660.0932 3488.334 3676.3870 10212.20   100

Excel VBA 24-Hour Trainer, Delete 'Clear the Error object in case a run time error would have occurred, 'that is, This section shows a “this way or that way” pair of macros that use an array to hold In Figure 15-2, an original list has clothing items in column A that are the opposite: It deletes all rows where Red, White, or Blue are found in column B,​  Summary: in this tutorial, you will learn how to use the SQL Server DELETE statement to remove one or more rows from a table. To remove one or more rows from a table completely, you use the DELETE statement. The following illustrates its syntax: First, you specify the name of the table from which the rows are to be deleted in the FROM clause.

Filter for unique values or remove duplicate values, Keep in touch and stay productive with Teams and Microsoft 365, even when you'​re working remotely. Learn more. Row & columns. Filter for unique values or  The following is my sample dataset. First column contains the Date and other columns are firms. I have just included only four firms here; I have many more firms in my original dataset. If you see, observations are zeros for almost all firms in specific dates (for example, 01/01/2004, 19/01/2004 etc.). I need to remove those rows from my dataset.

Mechanics of Mechanisms and Machines, For a directed edge j, a pair of two numbers Intail, nheadl is assigned, with the as: E(j)={0-,14], [0–,2+], [0–,3+], [1-,5+], [4–,5+], [5–,6+], [2–,5+], [3–,4+], [4–,6+]} 0 C (1.12) Each column has two equal and opposite entries, since each edge is Matrix K11 results from removing the first row and column from the matrix K,  Conditionally Delete Rows in Excel. We will discuss how to delete rows in excel based on certain condition: Delete the entire row based on No value: If you have a datasheet containing the value of clients as Yes and NO. This method is used to delete the row in which the client’s value is no and keep the yes value clients.

Table functions, The tables must all have the same row type structure. ReverseRows, Returns a table with the rows in reverse order. can be provided to handle multiple occurrence of the same key value in the attribute column. Unpivot, Given a list of table columns, transforms those columns into attribute-value pairs. If 2 columns are located on one worksheet, next to each other (adjacent) or not touching each other (nonadjacent), the removing duplicates is a bit more complex. We cannot delete entire rows that contain duplicate values because this would delete corresponding cells in the 2nd column too.