Count all possible combinations no matter the column order

Related searches

I have a dataset which looks like

df <- data.frame(rbind(c("A","B","D",NA,NA,NA,3),
c("B","A","D","C",NA,NA,1),
c("B","C","E","A","D",NA,5),
c("A","B",NA,NA,NA,NA,4),
c("A","B","C","D","E","F",2)))


    X1      X2       X3        X4       X5        X6    N
    A        B        D        NA       NA        NA    3 
    B        A        D        C        NA        NA    1 
    B        C        E        A         D        NA    5 
    A        B        NA       NA       NA        NA    4 
    A        B        C        D         E         F    2 

Where the column N is the number of times that combination happens.

and I want like to have a cumulative sum by unordered columns (no matter on which column the letter is located) such that:

     X1      X2       X3        X4       X5        X6     N
    A       NA        NA       NA        NA        NA     15
    B       NA        NA       NA        NA        NA     15
    C       NA        NA       NA        NA        NA     8
    D       NA        NA       NA        NA        NA     11
    E       NA        NA       NA        NA        NA     7
    F       NA        NA       NA        NA        NA     2
    A        B        NA       NA        NA        NA     15 
    A        B        C        NA        NA        NA     8 
    A        B        C        D         NA        NA     8 
    A        B        C        D         E         NA     7 
    A        B        C        D         E         F      1 
    B        C        NA       NA        NA        NA     8
    ....

So the idea is to have all possible combinations and the frequency but taking into account that the order of appearance in Column is not relevant.

Here's one approach that generates a list of the combinations of the values by row adds it to the original dataframe, unnests and tallies N by group.

library(dplyr)
library(tidyr)

df %>%
  mutate(comblist = apply(.[1:6], 1, function(x) {
    x <- sort(na.omit(x))
    unlist(sapply(seq_along(x), function(y)
      list(combn(x, y,
        FUN = function(l)
          list(toString(l))
      ))))
  })) %>%
  select(comblist, N) %>%
  unnest(comblist) %>%
  group_by(comblist) %>%
  summarise(x = sum(N))

# A tibble: 63 x 2
   comblist             N
   <chr>            <dbl>
 1 A                   15
 2 A, B                15
 3 A, B, C              8
 4 A, B, C, D           8
 5 A, B, C, D, E        7
 6 A, B, C, D, E, F     2
 7 A, B, C, D, F        2
 8 A, B, C, E           7
 9 A, B, C, E, F        2
10 A, B, C, F           2
# ... with 53 more rows

Count all possible combinations, Picking the first number from 1-20 amounts to picking a column from the grid. Otherwise, if order does not matter ie (a,b) is the same as (b,a) then from the� & vbNewLine & "Please, choose a smaller integer for the (places)." Loop 'To Calculate the number of combinations: first place takes all the possible elements and the rest of the places can be calculated using nCr 'Where n is (total elements -1) and r is (total places -1). 'So the total number of combinations will be: elmntsCount * nCr.

Here is a base R solution

l <- Map(function(x) c(na.omit(x)),data.frame(t(df[1:6]),stringsAsFactors = FALSE))
lout <- Map(function(x) c(na.omit(x)),data.frame(t(dfout),stringsAsFactors = FALSE))

dfout$N <- sapply(lout, function(x) sum(as.numeric(df$X7)[sapply(l, function(v) all(x %in% v))]))

such that

> dfout
   X1   X2   X3   X4   X5   X6  N
1   A <NA> <NA> <NA> <NA> <NA> 15
2   B <NA> <NA> <NA> <NA> <NA> 15
3   C <NA> <NA> <NA> <NA> <NA>  8
4   D <NA> <NA> <NA> <NA> <NA> 11
5   E <NA> <NA> <NA> <NA> <NA>  7
6   F <NA> <NA> <NA> <NA> <NA>  2
7   A    B <NA> <NA> <NA> <NA> 15
8   A    B    C <NA> <NA> <NA>  8
9   A    B    C    D <NA> <NA>  8
10  A    B    C    D    E <NA>  7
11  A    B    C    D    E    F  2
12  B    C <NA> <NA> <NA> <NA>  8

DATA

df <- structure(list(X1 = structure(c(1L, 2L, 2L, 1L, 1L), .Label = c("A", 
"B"), class = "factor"), X2 = structure(c(2L, 1L, 3L, 2L, 2L), .Label = c("A", 
"B", "C"), class = "factor"), X3 = structure(c(2L, 2L, 3L, NA, 
1L), .Label = c("C", "D", "E"), class = "factor"), X4 = structure(c(NA, 
2L, 1L, NA, 3L), .Label = c("A", "C", "D"), class = "factor"), 
    X5 = structure(c(NA, NA, 1L, NA, 2L), .Label = c("D", "E"
    ), class = "factor"), X6 = structure(c(NA, NA, NA, NA, 1L
    ), .Label = "F", class = "factor"), X7 = structure(c(3L, 
    1L, 5L, 4L, 2L), .Label = c("1", "2", "3", "4", "5"), class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

dfout <- structure(list(X1 = c("A", "B", "C", "D", "E", "F", "A", "A", 
"A", "A", "A", "B"), X2 = c(NA, NA, NA, NA, NA, NA, "B", "B", 
"B", "B", "B", "C"), X3 = c(NA, NA, NA, NA, NA, NA, NA, "C", 
"C", "C", "C", NA), X4 = c(NA, NA, NA, NA, NA, NA, NA, NA, "D", 
"D", "D", NA), X5 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, "E", 
"E", NA), X6 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "F", 
NA)), row.names = c(NA, -12L), class = "data.frame")

R: Count all combinations in a list of strings (Specific Order), Combination without repetition in R, Try something like: x <- c("a","b","c","d","e") Permutation implies that the order does matter, with combinations it does not (e.g. Get all possible combinations of two columns by a variable id, The below is� The number of combinations is the number of ways to arrange the people on the chairs when the order does not matter. In our example, let the 5 people be A, B, C, D, and E. So some of the permutations would be ABC, ACB, BAC, BCA, CAB and CBA.

An option using RcppAlgos::comboGeneral to generate combinations and data.table::cube to (quoting from ?cube) Calculate aggregates at various levels of groupings producing multiple (sub-)totals.:

library(data.table)
library(RcppAlgos)

v <- unique(unlist(df[-ncol(df)]))
v <- sort(v[!is.na(v)])
nc <- length(v)

DT <- melt(setDT(df)[, rn:=.I], id.vars=c("rn", "X7"), na.rm=TRUE, variable.factor=FALSE)
combi <- DT[, as.data.table(do.call(rbind, lapply(1L:.N, function(m) {
    rcom <- comboGeneral(value, m)
    M <- matrix("", nrow=nrow(rcom), ncol=nc)

    M[cbind(rep(1L:nrow(rcom), ncol(rcom)), match(rcom, v))] <- rcom
    M
}))), .(rn, COUNT=as.integer(X7))]
ans <- cube(combi, .(COUNT=sum(COUNT)), by=paste0("V", 1:6))
setorderv(ans[complete.cases(ans)], paste0("V", 6:1))[]

output:

    V1 V2 V3 V4 V5 V6 COUNT
 1:  A                   15
 2:     B                15
 3:  A  B                15
 4:        C              8
 5:  A     C              8
 6:     B  C              8
 7:  A  B  C              8
 8:           D          11
 9:  A        D          11
10:     B     D          11
11:  A  B     D          11
12:        C  D           8
13:  A     C  D           8
14:     B  C  D           8
15:  A  B  C  D           8
16:              E        7
17:  A           E        7
18:     B        E        7
19:  A  B        E        7
20:        C     E        7
21:  A     C     E        7
22:     B  C     E        7
23:  A  B  C     E        7
24:           D  E        7
25:  A        D  E        7
26:     B     D  E        7
27:  A  B     D  E        7
28:        C  D  E        7
29:  A     C  D  E        7
30:     B  C  D  E        7
31:  A  B  C  D  E        7
32:                 F     2
33:  A              F     2
34:     B           F     2
35:  A  B           F     2
36:        C        F     2
37:  A     C        F     2
38:     B  C        F     2
39:  A  B  C        F     2
40:           D     F     2
41:  A        D     F     2
42:     B     D     F     2
43:  A  B     D     F     2
44:        C  D     F     2
45:  A     C  D     F     2
46:     B  C  D     F     2
47:  A  B  C  D     F     2
48:              E  F     2
49:  A           E  F     2
50:     B        E  F     2
51:  A  B        E  F     2
52:        C     E  F     2
53:  A     C     E  F     2
54:     B  C     E  F     2
55:  A  B  C     E  F     2
56:           D  E  F     2
57:  A        D  E  F     2
58:     B     D  E  F     2
59:  A  B     D  E  F     2
60:        C  D  E  F     2
61:  A     C  D  E  F     2
62:     B  C  D  E  F     2
63:  A  B  C  D  E  F     2
    V1 V2 V3 V4 V5 V6 COUNT

Excel formula: Count paired items in listed combinations, To build a summary table with a count of paired items that appear in a list of existing combinations, you can use a helper column and a formula based on the A simple workaround is to join all items together in a single cell, then use sides of the item to ensure a match will be counted no matter where it appears in the cell. By definition, the number of ways to do this is $\binom{16}{5}$. We ask a quite different question. How many $5$-letter "words" are there, with all the letters different? We will count the number of such words in two different ways. When we are counting words, the order of letters matters. First way: The first letter can be chosen in $16$ ways.

Not sure what you want to do and it would be helpful to add a minimal reproducible example as well as an expected example. That way you can help others to help you! But if you want the counts for the letters in any and all columns:

library(dplyr)
library(tidyr)
df %>% 
pivot_longer(cols =everything()) %>%
group_by(value) %>%
summarise(N = n())

Permutations and Combinations, The topics covered are: (1) counting the number of possible orders, (2) counting using the multiplication rule, (3) counting Table 1 lists all the possible orders. When order of choice is not considered, the formula for combinations is used. All Possible Number Combinations. We've generated every possible number combination for most lottery games in the US and around the world. Number combinations are stored in basic text files with approximately 50K - 1M number combinations per file so that you can easily open them in any word processor or Excel.

Combinations and permutations in R, A biological example of this are all the possible codon permutations. without repetition/replacement and the order does not matter; all four TFs just need to be added. #calculate the number of combinations without replacement/repetition combinations from 2 large dataframes (100k rows x 2 columns)?. 7 years ago. Favourite answer. Combinations are used when order does not matter. Combinations (and permutations) also assume NO repetition, so you don't have to worry about that. Thus, 10 C 4 = 210

How to Load Data from Excel Files when Number of Columns can decrease or order is changed in Excel Sheet - SSIS tutorial Scenario: Link to Script Let's say you are working for Auto Insurance Company as ETL developer, you get different excel files from different regional office those you need to load to Dbo.Customer Table in TechBrothersIT database.

Theorem 4 is very important, it tells us that the following statements are either all true or all false, for any m n matrix A: (a) For every b, the equation Ax = b has a solution. (b) Every column vector b (with m entries) is a linear combination of the columns of A.

Comments
  • why column N is 3,1,5,4,2? could you explain a bit?
  • Those numbers are the frequency of that specific combination. For instance in the desired data frame, the combination (A,B,C) has an N=8 because it's the sum of N that happen to have these three letters e.g. (B,A,D,C) + (B,C,E,A,D) + (A,B,C,D,E,F) = 1 + 5 + 2 = 8.
  • Not directly about the question: By calling rbind, you've made a matrix rather than a data frame, so your column N is coerced to character, and then data.frame by default turns character columns (in this case, all your columns) to factors. So now N is a factor trying to represent a number. No need for rbind; just put the vectors all in a data frame
  • Sorry, I misread the purpose of rbind. It's actually by creating vectors of mixed types (string and numeric) that you've turned your counts into strings, which are later turned into factors. Don't want to derail too much but do want to point out data type issues that may come up
  • The dfout you posted is just a subset of the total rows, right?
  • @camille I just copied from the OP's post, not generated from my code. My code just computes N column
  • Oh, okay. But isn't getting the combinations of A...F part of the task as well, not just adding up their occurrences?
  • @camille I am not sure if OP need to generate all those combinations, or just wants the occurrences with respect to given combinations....but I guess yours is the former one
  • Thanks for your response. I added a small example of how the data looks like in R. The whole idea is to count occurrences of each element (A, B, C...). So for instance, the individuals in the first row will contribute to the groups (A), (B), (D), (A,B), (A,D), (B,D), (A,B,D), matter what the order of the columns is.