Algorithm to partition graph into complete subgraphs

graph partitioning example
partition graph into subgraphs
graph partitioning python
graph partitioning algorithms ppt
weighted graph partitioning
balanced graph partitioning
split graph into subgraphs
weighted graph partitioning in data mining

I need an algorithm to partition the vertices of an undirected graph into one or more subgraphs, such that each subgraph is a complete graph (every vertex adjacent to every other vertex). Each vertex needs to be in exactly one of the subgraphs.

Here's an example:

input = [
    (A, B),
    (B, C),
    (A, C),
    (B, D),
    (D, E),
]
output = myalgo(input)  # [(A, B, C), (D, E)]

Here's an image that better describes the problem:

The input list is sorted in decreasing order by distance, that's why I connect A-B-C instead of B-D.

I thought this might be called "strongly connected components", and have already tried the following solutions:

  • Finding a Strongly Connected Components in unDirected Graphs: it's looking for something different

  • Finding all cycles in undirected graphs: it gives me many cycles and not the best, it doesn't care about the input order.

  • An algorithm to create clusters from data pairs in python: it connects all the components, just because there is a path between them (A-B-C-D-E).

  • Kosaraju's algorithm: it works only with a directed graph.

Here's a class that implements the segmentation into complete subgraphs. It's by no means optimized and can likely be improved significantly, but it's a starting point

class SCCManager:
    def __init__(self, edges):
        self.clusters = []
        self.edges = edges

    def clusters_in(self, conn):
        first, second = conn
        f_clust = None
        s_clust = None
        for i, clust in enumerate(self.clusters):
            if first in clust:
                f_clust = i
            if second in clust:
                s_clust = i
            # break early if both already found
            if f_clust and s_clust:
                break
        return (f_clust, s_clust)

    def all_connected(self, cluster, vertex):
        for v in cluster:
            connected = (v, vertex) in self.edges or (vertex, v) in self.edges
            # break early if any element is not connected to the candidate
            if not connected:
                return False
        return True

    def get_scc(self):
        for edge in self.edges:
            c_first, c_second = self.clusters_in(edge)

            # case 1: none of the vertices are in an existing cluster
            # -> create new cluster containing the vertices
            if c_first == c_second == None:
                self.clusters.append([edge[0], edge[1]])
                continue

            # case 2: first is in a cluster, second isn't
            # -> add to cluster if eligible
            if c_first != None and c_second == None:
                # check if the second is connected to all cluster components
                okay = self.all_connected(self.clusters[c_first], edge[1])
                # add to cluster if eligible
                if okay:
                    self.clusters[c_first].append(edge[1])
                continue

            # case 3: other way round
            if c_first == None and c_second != None:
                okay = self.all_connected(self.clusters[c_second], edge[0])
                if okay:
                    self.clusters[c_second].append(edge[0])
                continue

            # case 4: both are in different clusters
            # -> merge clusters if allowed
            if c_first != c_second:
                # check if clusters can be merged
                for v in self.clusters[c_first]:
                    merge = self.all_connected(self.clusters[c_second], v)
                    # break if any elements are not connected
                    if not merge:
                        break
                # merge if allowed
                if merge:
                    self.clusters[c_first].extend(self.clusters[c_second])
                    self.clusters.remove(self.clusters[c_second])

            # case 5: both are in the same cluster
            # won't happen if input is sane, but doesn't require an action either way


        return self.clusters

... and here's a working example:

inp = [
    ('A', 'B'),
    ('B', 'C'),
    ('A', 'C'),
    ('B', 'D'),
    ('D', 'E'),
    ('C', 'E')
]

test = SCCManager(inp)
print(test.get_scc())

[['A', 'B', 'C'], ['D', 'E']]

[PDF] On the complexity of partitioning graphs into connected subgraphs, not the vertices of a graph can be partitioned into equal sized subsets so that each subset induces a particular Each of the following problems is NP- complete for any fixed k 2 3. We will sketch the method in each case for k = 3 and k = m/2. Phrased differently, Proposition 4.2 states informally that if one can find a partition into two subgraphs with prescribed degeneracy, then one can also find a partition of the same graph into more than two subgraphs with prescribed degeneracy, provided some condition on the sum of the prescribed degeneracies is met.

from collections import defaultdict


def create_adjacency_matrix(connections):
    matrix = defaultdict(dict)
    for a, b in connections:
        matrix[a][b] = 1
        matrix[b][a] = 1
    return matrix


def is_connected_to_all(vertex, group, matrix):
    for v in group:
        if vertex != v and vertex not in matrix[v]:
            return False
    return True


def group_vertexes(input):
    matrix = create_adjacency_matrix(input)
    groups = []
    current_group = set()
    for vertex in matrix.keys():
        if is_connected_to_all(vertex, current_group, matrix):
            current_group.add(vertex)
        else:
            groups.append(current_group)
            current_group = {vertex}
    groups.append(current_group)
    return groups
input = [("A", "B"), ("B", "C"), ("A", "C"), ("B", "E"), ("E", "D")]
print(group_vertexes(input))
# [{'C', 'B', 'A'}, {'D', 'E'}]

WARNING: This relies on the fact that dict keeps insertion order in python 3.7+. In older versions you have to use matrix = DefaultOrderedDict(dict) https://stackoverflow.com/a/35968897/9392216:

How can I partition a graph into connected subgraphs?, The goal is to build the disjoint subgrahs to maximize the fulllment of demands. A mixed integer programming (MIP) model is developed and various algorithms are � How can I partition a graph into connected subgraphs? But this is not a problem, since you can generate the constraints as needed, in a cutting-plane or branch-and-cut algorithm.

Another attempt:

lst = [
    ('A', 'B'),
    ('B', 'C'),
    ('A', 'C'),
    ('B', 'D'),
    ('D', 'E'),
]

d = {}
for i, j in lst:
    d.setdefault(i, []).append(j)
    d.setdefault(j, []).append(i)

from itertools import combinations

rv, seen_segments, seen_vertices = [], set(), set()
for k, v in d.items():
    if len(v) == 1:
        segment = set((k, v[0])).difference(seen_vertices)
        seen_vertices.update(segment)
        rv.append([tuple(segment), ])
    else:
        graph = []
        for i, j in combinations([k] + v, 2):
            if not j in d[i]:
                break
            else:
                graph.append(tuple(sorted((i, j))))
        else:
            if graph:
                graph = [segment for segment in graph if segment not in seen_segments]
                seen_segments.update(graph)
                if graph:
                    rv.append(graph)

from pprint import pprint
pprint(rv)

Prints:

[[('A', 'B'), ('A', 'C'), ('B', 'C')], [('D', 'E')]]

For input

lst = [
    ('A', 'B'),
    ('B', 'C'),
]

Prints:

[[('A', 'B')], [('C',)]]

For input:

lst = [
    ('A', 'B'),
    ('B', 'C'),
    ('C', 'D'),
]

Prints:

[[('B', 'A')], [('D', 'C')]]

Graph partition, In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning We already know that (2,1) cut is the minimum bisection problem and it is NP-complete. Next, we assess a 3-partition A multi-level graph partitioning algorithm works by applying one or more stages. Each stage reduces the size� useful to partition the vertices of the graph so that every part induces a highly connected subgraph. For example, Hartuv and Shamir [11] designed a clustering algorithm where the vertices of a graph G are partitioned into highly connected induced subgraphs. It is

You can find all paths, and then group by connectivity:

from itertools import groupby as gb
d = [('A', 'B'), ('B', 'C'), ('A', 'C'), ('B', 'D'), ('D', 'E')]
def connect_num(node):
    return [sum(a == node for a, _ in d), sum(b == node for _, b in d)]

def group_paths(data):
   new_d = sorted([[i, connect_num(i)] for i in data], key=lambda x:max(x[1]))
   return [[k for k, _ in b] for _, b in gb(new_d, key=lambda x:max(x[1]))]

def get_paths(start, c = [], seen = []):
   new_vals = [a for a, _ in d if a not in seen+c]
   if (vals:=[b for a, b in d if a == start]):
      for i in vals:
         yield from get_paths(i, c=c+vals, seen=seen)
   else:
      yield c
      for i in new_vals:
         yield from get_paths(i, c = [i], seen=c+seen)

result = sorted(map(set, get_paths(d[0][0])), key=len, reverse=True)
new_result = [a for i, a in enumerate(result) if not any(all(k in c for k in a) for c in result[:i])]
final_result = [group_paths(i) for i in new_result]

Output:

#final_result[0]
[['E', 'D'], ['A', 'C', 'B']]

[PDF] On partitioning a graph into two connected subgraphs, nected Subgraphs problem is NP-complete and the authors of [12] present an algorithm that solves it for n-vertex graphs in the class Gk,r in O∗((f(r))n). As for the uncoarsening system, the KL(1) Re nement algorithm is used. The projected partition of the coarsened graph is assumed as a good initial partition for the upper level graph and vertices are swapped during the rst uncoarsening phase, to improve the global equilibrium condition. 3.2 Bisectioning graphs into subgraphs with di erent weights

Partition graph into complete disjoint subgraphs while maximising , In addition to Correlation Clustering, this is known as Clique Partitioning or Weighted Cluster Editing, see e.g.. M. Gr�tschel and Y. For unweighted case, any 2-connected graph can be partitioned into two connected subgraphs whose numbers of vertices differ by at most one. A simple algorithm uses st-numbering. For any 2-connected graph, we can label the vertices by $[1n]$ such that any vertex has simultaneously a neighbor with smaller label and a neighbor with larger label.

On Partitioning a Graph into Two Connected Subgraphs, International Symposium on Algorithms and Computation On Partitioning a Graph into Two Connected Subgraphs It is already NP-complete for the class of n-vertex graphs G = (V,E) in which Z 1 and Z 2 each contain a connected set that� Given a graph and a positive integer k, the biclique vertex-partition problem asks whether the vertex set of the graph can be partitioned into at most k bicliques (connected complete bipartite subgraphs). It is known that this problem is NP-complete for bipartite graphs. In this paper we investigate the computational complexity of this problem in

Partitioning a Weighted Graph to Connected Subgraphs of Almost , One wish to partition G into connected components by deleting edges from Gso. All these problems are NP-complete or NP-hard even for series-parallel graphs . The algorithms can be easily extended for partial k-trees, that is, graphs with� Problem: Partition the vertices into \(m\) subsets such that each subset has size at most \(j\), while the cost of the edges spanning subsets is bounded by \(k\). Excerpt from The Algorithm Design Manual : Graph partitioning arises as a preprocessing step to divide-and-conquer algorithms, where it is often a good idea to break things into

Comments
  • by "strong connection" you mean "all elements within the component must be directly linked to each other"?
  • @LukasThaler yes
  • Sounds like a fun exercise. Let me see what I can come up with :D
  • Say the input is a large clique with one edge removed. Correct answers are all vertex partitions that separate the vertices of the missing edge. Which would you prefer?
  • That is dictated by user input. The OP mentions in their question that the order the edges are given in matters. If the (B,D) edge in the above graph was the first to be input, we would end up with the components (A,C) and (B,D), (and (E), if you want to consider that a component, which my implementation as of now doesn't)
  • The matrix is a sweet idea, but the algorithm breaks if I add another ('C', 'E') to the input. I think the update logic needs to be a little more robust to deal with more complex inputs
  • Could you provide the whole wrong input, because I tested it and it works for ('C','E') at the end of current input.
  • inp = [('A', 'B'),('B', 'C'),('A', 'C'),('B', 'D'),('D', 'E'),('C', 'E')] results in [{'B', 'C', 'A'}, {'E', 'D'}, {'E', 'C'}] for me
  • Running on 3.7, actually. What output do you get for this input?
  • Yeah, you're right :P. I was running modified version with for vertex in matrix.keys(): which was an alternative option in my answer. I modified the first one to use correct logic.