Kruskal's Algorithm: Difference between revisions
Line 32: | Line 32: | ||
The non-optimized solution checks whether there is no path from u to v using either [[Graph_Search#Breadth-First_Search_.28BFS.29|BFS]] or [[Graph_Search#Depth-First_Search_.28DFS.29|DFS]] starting from u. In their general form, both methods have a running time O(n + m) linear in the number of n and m. However, in this case we only need to search through a limited number of edges, those that are already in T. T has at most n vertices, to it has at most n-1 = O(n) edges, so the O(m + n) running time from the general case becomes O(n). The search is repeated for m times, so the "growing" phase of the spanning tree is O(m n). | The non-optimized solution checks whether there is no path from u to v using either [[Graph_Search#Breadth-First_Search_.28BFS.29|BFS]] or [[Graph_Search#Depth-First_Search_.28DFS.29|DFS]] starting from u. In their general form, both methods have a running time O(n + m) linear in the number of n and m. However, in this case we only need to search through a limited number of edges, those that are already in T. T has at most n vertices, to it has at most n-1 = O(n) edges, so the O(m + n) running time from the general case becomes O(n). The search is repeated for m times, so the "growing" phase of the spanning tree is O(m n). | ||
The total running time of the non-optimized algorithm is O(m log n) + O(m n) = O(m n). | The total running time of the non-optimized algorithm is O(m log n) + O(m n) = O(m n). This is the same running time as for the [[Prim%27s_Algorithm#Non-Optimized_Implementation|non-optimized implementation of the Prim's algorithm]]. | ||
=Correctness Proof= | =Correctness Proof= |
Revision as of 19:32, 22 October 2021
External
Internal
Overview
Kruskal's algorithms is a greedy algorithm that computes the minimum cost spanning tree for an undirected connected graph. It has a different approach than Prim's algorithm, in that it does not grow the connected explored territory one node at a time, but it grows the trees in parallel with a lot of simultaneous little pieces, unconnected at first, which eventually merge into the minimum cost spanning tree. It does that by sorting the edges in the ascending order of their cost, and picking edges in that order, one by one, careful not to create cycles. The greediness of the algorithm comes from the fact that at each step, it chooses an edge by attempting to minimize a greedy score, which is the cost of the edge.
Non-Optimized Implementation
sort edges in the order of increasing cost initialize T = ∅ # the tree we are growing for edge e = (u,v) in the order of increasing cost: # at most m iteration if T ⋃ e has no cycles # there is no path in T from u to v add e to T
We could count the edges added to the spanning tree, and once we have n-1 edges, which means the spanning tree is complete, we can abort the loop.
One critical step of the algorithm is checking whether a new edge e=(u,v) introduces no cycles in T, and this can be checked in several ways. The non-optimized way is to run a graph search, BFS or DFS, starting from u, attempting to find a path from u to v. Note that we don't need to run a full BFS or DFS on all edges, we only need to test connectivity between the nodes included in the partial spanning trees discovered so far, so that will reduce the graph search running time.
A better way would be to use a union-find.
Non-Optimized Implementation Running Time
Sorting the edges will take O(m log m) time. However, since we assume that the number of edges is at most quadratic of the number of vertices, m = O(n2), so O(log m) = O(log n2)=2O(log n)=O(log n). In consequence, the running time of the edge sorting step is O(m log n).
The main loop has at most m iterations. In each iteration, we need to ensure that the current edge e=(u,v) does not form a cycle in T, which means we need to ensure that there is no path from u to v.
The non-optimized solution checks whether there is no path from u to v using either BFS or DFS starting from u. In their general form, both methods have a running time O(n + m) linear in the number of n and m. However, in this case we only need to search through a limited number of edges, those that are already in T. T has at most n vertices, to it has at most n-1 = O(n) edges, so the O(m + n) running time from the general case becomes O(n). The search is repeated for m times, so the "growing" phase of the spanning tree is O(m n).
The total running time of the non-optimized algorithm is O(m log n) + O(m n) = O(m n). This is the same running time as for the non-optimized implementation of the Prim's algorithm.