Hi, in the sixth part of my graph series, I want to talk about the Kruskal algorithm. We have programmed Prim in Python in the fifth part. Now it's time for Kruskal.
What is Kruskal?
Kruskal works in contrast to Prim, not over nodes but over edges. Kruskal marks all edges with little effort over the whole graph. Here Kruskal also takes care not to build cycles. Kruskal also builds a minimal spanning tree (MST) like Prim. Before we continue, let's take a look at the graph again.
Kruskal works, as already mentioned, not via nodes but via edges. First, all nodes are marked as not visited. We need this to detect cycles. All edges from the whole graph are determined and collected. These are then sorted in ascending order. These are stored as a PriorityQueue. Next, the smallest edge selected is added to the MST. Then, the next edge is taken from the queue and checked to see if a cycle occurs. If this is not the case, this edge is also added to the MST. Otherwise this edge is ignored. Each edge weight from the MST, is summed up and forms the total weight of the MST. The whole process is repeated until the queue has been processed. The final result is a minimal spanning tree (MST).
A small animation shows how Kruskal works.
Blue are the edges and nodes that will be added to the MST. The red crosses, on the other hand, represent the edges that would create a cycle.
Extend our graph implementation
Now we extend our class Graph with Kruskal. I have implemented this iteratively here. Additionally I wrote a helper function, which sorts all edges in ascending order. Let's deal with the auxiliary function first.
def __sort(self, edges):
temp = []
while len(edges) != 0:
e = self.__min(edges)
temp.append(e)
edges.remove(e)
return temp
The method takes the edge list and sorts the edge list using the min function from Prim. The list is traversed, and each time the smallest edge is determined, which is added to the new sorted list. We get an ascending edge list.
Let's move on to Kruskal.
def kruskal(self):
cost = 0.0
edges = []
mst = []
self.unvisitAllNodes()
for node in self.__nodes.values():
for edge in node.getEdges():
edges.append((edge, edge.getWeight()))
edges = self.__sort(edges)
while len(edges) != 0:
e = edges.pop(0)
A = e[0].getA()
B = e[0].getB()
if not e in mst:
loop = False
for edge in B.getEdges():
if edge.getB().isVisited():
loop = True
if not loop:
mst.append([A.getKey(), e[0].getWeight(), B.getKey()])
print(e[0].getA().getKey(), "--->", e[0].getB().getKey(), "weight:", e[0].getWeight())
cost += e[1]
B.visit()
print("MST:", mst)
print("Total cost", cost)
If we look at the implementation of Kruskal, we see that we first mark all nodes as not visited again to be able to make a cycle detection later. Next, all edges are collected and sorted in ascending order with the help function. Now the queue is processed again until it is empty. We determine the two nodes which are connected with this edge. We save these as and . If the removed edge is not contained in the MST mst, we perform a cycle check. If no cycle is detected, we can add this edge to the MST. The costs or weights of the edges are also added up here to determine the total weight.
If we just run that, we should get an output like this:
KRUSKAL MST
E ---> F weight: 0.5
A ---> B weight: 1.0
D ---> F weight: 2.6
D ---> B weight: 3.0
B ---> C weight: 4.8
MST: [['E', 0.5, 'F'], ['A', 1.0, 'B'], ['D', 2.6, 'F'], ['D', 3.0, 'B'], ['B', 4.8, 'C']]
Total cost 11.899999999999999
Small hint, since I have not yet checked this code for correctness, it is to be used with skepticism.
I hope I could explain Kruskal quite well 😄