Find Connected Components in an Undirected Graph

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

External

Internal

Overview

Finding connected components in an undirected graph is a form of clustering heuristics: connected vertices represent clusters where the objects represented by the vertices are related in some way. Connected components in an undirected graph are computed with a variation of the breadth-first search (BFS) algorithm. This BFS-based algorithm computes undirected connectivity in a graph in O(n + m) time. For a more formal definition of a connected component of a graph see:

Undirected Graph Connectivity

Determining connectivity is useful in a variety of cases:

  • Check if a physical network is broken into pieces.
  • If you want to visualize network data, you may want to know if there are multiple pieces and display the pieces differently.
  • Clustering and similarity data. Determine "clusters" of highly related objects.

⚠️ Directed connectivity is given by a different, DSF-based algorithm named Kosaraju's two-pass algorithm.

Algorithm

The algorithm, which is linear in n + m iterates over all nodes in the graph and for unexplored nodes only, invokes BFS. The BFS invocation will explore the entire connected component the source node is part of. The essential part of the algorithm is that when we mark a node as explored, that is also visible to the outer loop.

Compute_Connected_Components_of_Undirected_Graph(graph G) 
  mark all nodes as unexplored
  # Assume nodes are labeled from 1 to n
  for i = 1 to n
    if i not yet explored
       BFS(G, i)

Each BFS invocation we identify a connected component and mark all its nodes as "explored". The above algorithm is underspecified, how do we actually get the list of components, and how do we do the bookkeeping?