How to sample from Cartesian product without repetition

python cartesian product
itertools permutations with replacement
product python
python cycle list
product function python
python chain functions
python grouper
python ifilter

I have a list of sets, and I wish to sample n different samples each containing an item from each set. What I do not want is to have it in order, so, for example, I will get all the samples necessarily with the same item from the first set. I also don't want to create all the Cartesian products as that might not be possible in terms of efficiency... Any idea of how to do it? Or even something to approximate this behaviour?

Example that does not work:

(prod for i, prod in zip(range(n), itertools.product(*list_of_sets)))

Solved: Cartesian Product within Groups without Repetition, With this example data, the output should result in 6 rows. However, I've only got it to the point in which repetitions are included and the output is  I have a dataset with more than one record per customer. I want to get output in which for every customer, there is every possible combinations between its records, however without repetition, i.e. once combining record 1 and 2 for customer 111 the combination 2 and 1 should not be added as it is co

The following generator function generates non-repetitive samples. It will only work performantly if the number of samples generated is much smaller than the number of possible samples. It also requires the elements of the sets to be hashable:

def samples(list_of_sets):
    list_of_lists = list(map(list, list_of_sets))  # choice only works on sequences
    seen = set()  # keep track of seen samples
    while True:
        x = tuple(map(random.choice, list_of_lists))  # tuple is hashable
        if x not in seen:
            seen.add(x)
            yield x

>>> lst = [{'b', 'a'}, {'c', 'd'}, {'f', 'e'}, {'g', 'h'}]
>>> gen = samples(lst)
>>> next(gen)
('b', 'c', 'f', 'g')
>>> next(gen)
('a', 'c', 'e', 'g')
>>> next(gen)
('b', 'd', 'f', 'h')
>>> next(gen)
('a', 'c', 'f', 'g')

Lists, Decisions and Graphs, Unit CL: Basic Counting and Listing Section 1: Lists with Repetitions . permutation, k-set, k-list, k-multiset, k-lists with repetition, rule of product, Cartesian product, 28 sample space, selections done uniformly at random, event, probability  Cartesian product of sets Cartesian product of sets A and B is denoted by A x B. Set of all ordered pairs (a, b)of elements a∈ A, b ∈B then cartesian product A x B is {(a, b): a ∈A, b ∈ B} Example – Let A = {1, 2, 3} and B = {4, 5}. Find A x B and B x A and show that A x B ≠ B x A.

You can use sample from the random lib:

import random
[[random.sample(x,1)[0] for x in list_of_sets] for _ in range(n)]

for example:

list_of_sets = [{1,2,3}, {4,5,6}, {1,4,7}]
n = 3

A possible output will be:

[[2, 4, 7], [1, 4, 7], [1, 6, 1]]

EDIT:

If we want to avoid repetitions we can use a while loop and collect the results to a set. In addition you can check that n is valid and return the Cartesian product for invalid n values:

chosen = set()
if 0 < n < reduce(lambda a,b: a*b,[len(x) for x in list_of_sets]):
    while len(chosen) < n:
        chosen.add(tuple([random.sample(x,1)[0] for x in list_of_sets]))
else:
    chosen = itertools.product(*list_of_sets)

Mathematics for Algorithm and Systems Analysis, Unit CL: Basic Counting and Listing Section 1: Lists with Repetitions . k-lists with repetition, rule of product, Cartesian product, lexicographic order (lex 28 sample space, selections done uniformly at random, event, probability function,  how to find cartesian product of two sets If A and B are two non-empty sets, then the set of all ordered pairs (a, b) such that a ∈ A, b ∈ B is called the Cartesian Product of A and B, and is denoted by A x B .

Matmarbon's answer is valid, this is a complete version with an example and some modifies for easy understanding and easy use:

import functools
import random

def random_order_cartesian_product(factors):
    amount = functools.reduce(lambda prod, factor: prod * len(factor), factors, 1)
    print(amount)
    print(len(factors[0]))
    index_linked_list = [None, None]
    for max_index in reversed(range(amount)):
        index = random.randint(0, max_index)
        index_link = index_linked_list
        while index_link[1] is not None and index_link[1][0] <= index:
            index += 1
            index_link = index_link[1]
        index_link[1] = [index, index_link[1]]
        items = []
        for factor in factors:
            items.append(factor[index % len(factor)])
            index //= len(factor)
        yield items


factors=[
    [1,2,3],
    [4,5,6],
    [7,8,9]
]

n = 5

all = random_order_cartesian_product(factors)

count = 0

for comb in all:
  print(comb)
  count += 1
  if count == n:
    break

Introduction to Probability: Models and Applications, FINITE SAMPLE SPACES – COMBINATORIAL METHODS Corollary 2.1 The of the Cartesian product, we see that thek-element permutations with repetitions  1 How to sample from Cartesian product without repetition Jun 27 '19 1 perl: subroutine returns 0 instead of specified array Jan 2 '18 1 How to provide Go cmd app as productive app Dec 28 '17

As I want no repetition, and sometimes it is not possible the code is not that short. But as @andreyF said, random.sample does the work. Perhaps there is also a better way that avoids resampling with repetition until enough non repetitive ones exist, this is the best I have so far.

import operator
import random
def get_cart_product(list_of_sets, n=None):
    max_products_num = reduce(operator.mul, [len(cluster) for cluster in list_of_sets], 1)
    if n is not None and n < max_products_num:
        refs = set()
        while len(refs) < n:
            refs.add(tuple(random.sample(cluster, 1)[0] for cluster in list_of_sets))
        return refs
        return (prod for i, prod in zip(range(n), itertools.product(*list_of_sets)))
    return itertools.product(*list_of_sets)

Note that the code assumes a list of frozen sets, a conversion of random.sample(cluster, 1)[0] should be done otherwise.

Tuples in cartesian product without duplicates, Whether this is an improvement over what you're currently doing depends on the structure of the sets; in some cases it would be a substantial  Today, I’d like to share with you what I’ve learned about Cartesian Products over the years, so that you can spot them and banish them from your SELECT queries forever. How to Generate a Cartesian Product. The following query extracts data from two tables without filtering of any sort.

Discrete Calculus: Methods for Counting, (1) The case of extractions with replacement: we choose the sample space Ω The event A coincides with the cartesian product (R∪N) × (R∪ N) × R = {(x, y,z) : x, without repetition in R ∪ N. The elements of Ω are equiprobable and by the  2 How to sample from Cartesian product without repetition Feb 8 '18 1 boolean_mask or sparse dot product in tensorflow May 17 '18 1 sparse to dense einsum in tensorflow May 17 '18

Introduction to Probability, Much of probability deals with repetitions of a simple experiment, such as the roll of In such cases Cartesian product spaces arise naturally as sample spaces. Products Customers; Use cases Matmarbon. Apparently, this user prefers to keep an air of mystery about them. 4 How to sample from Cartesian product without

9.7. itertools, [repeat=1], cartesian product, equivalent to a nested for-loop combinations(), p, r, r-length tuples, in sorted order, no repeated elements r = len(pool) if r is None else r return tuple(random.sample(pool, r)) def random_combination(iterable,  with replacement: one can produce all permutations n**r via product; without replacement: one can filter from the latter; Permutations with replacement, n**r [x for x in it.product(iter_, repeat=r)] Permutations without replacement, n! [x for x in it.product(iter_, repeat=r) if len(set(x)) == r] # Equivalent list(it.permutations(iter_, r))

Comments
  • Say your first two sets look like {1, 2, 3} and {2}. You randomly pick 2 from {1, 2, 3}. What do you randomly pick from {2}?
  • you "randomly" pick 2. of course :-)
  • list(map(random.choice, map(list, list_of_sets))) will generate such a sample, doing it repeatedly will not avoid repetitions, though.
  • @schwobaseggl It is clean and readable, but probably the inner map should be taken out and saved if done multiple times, for better efficiency.
  • @borgr Yup, you'd absolutely store the the list of lists if done repeatedly. Didn't want to clutter the comments :)
  • Valid! Remove '*' in "def random_order_cartesian_product(*factors):" if your "factors" is a list etc
  • Nice edit, but note that currently the code might never stop if n is too large.
  • @borgr I assumed n is smaller the the Cartesian products
  • @AndreyF sample(x, 1) is the same as choice(x)
  • @schwobaseggl yes. However, choice does not work with sets and I wanted to avoid mapping the sets into other collection types
  • @borgr - I edited my answer to avoid endless while loops