Graph Search

From NovaOrdis Knowledge Base
Jump to navigation Jump to search

Internal

Overview

Searching a graph is a fundamental problem in computer science. It means systematically following the edges of a graph so we visit all the vertices of that graph. Techniques for searching a graph lie at the heart of the field of graph algorithms. There are many methods to search graphs, and the most known and used are breadth-first search and depth-first search.

Both breadth-first search and depth-first search algorithms are very efficient, in that they walk the entire graph in linear time of the number of vertices and edges O(n + m).

Generic Graph Search Algorithm

All graph search methods share something in common, described in the following generic graph search algorithm. Graph search subroutines are passed as input a start vertex, called the source node or start node, from which the search originates. The algorithm has two goals:

  • Goal 1. Find everything that is "findable" starting from the source node - all findable vertices. By findable we mean that there's a path from the source node to the findable node. Directed and undirected graphs are behave slightly differently, in that for a directed graph, the edge can only be followed in the "forward" direction.
  • Goal 2. Do it efficiently, by don't exploring anything twice. If the graph has n vertices and m arcs, we look at a running time O(m + n).

A generic algorithm would would be to maintain a boolean per each node that says whether we have "seen" it or not. Everything is initially set to "unexplored", except the search node. It is useful to think at the nodes seen so far as "territory" conquered by the algorithm. There is going to be a frontier between the conquered and unconquered territory. The goal of the generic algorithm is to supplement the conquered territory at each step by one node, assuming there's a node adjacent to the territory already conquered. The algorithm consists in a while loop where we iterate as many times as we can, we look for an edge with one endpoint that we already explored and one endpoint we did not explore yet, until we don't find such an edge anymore. That exits the while loop and stops the algorithm. If we can find such an edge, we "suck" v in the conquered territory and repeat.

The claim is that this generic algorithm does what we want - finds all nodes and does not look at anything twice. This can be proven through contradiction and induction, respectively.

There are different strategies in picking the next nodes to look at, while iterating, and these strategies lead to different algorithms with different properties.

Breadth-First Search (BFS)

External: CLRS page 594.

The algorithm systematically explores the nodes in "layers", using a cautious and tentative approach. The starting node will be in its own layer. The neighbors of s will constitute layer 1, and so on. This is a cautions and tentative exploration of the graph. There is a close correspondence between these layers and shortest path distances. The algorithm is implemented using a queue (first in first out).

The algorithm is linear time O(n + m) with small constants.

BFS can be used to:

BFS Algorithm Pseudocode

https://www.coursera.org/learn/algorithms-graphs-data-structures/lecture/JZRXz/breadth-first-search-bfs-the-basics

The general idea is to start from a node, mark it as explored and put it in a queue. Then, in a while loop as long as the queue has elements, remove the head of the queue, which is always explored, determine its adjacent unexplored nodes - in the same layer, mark them as explored and put them at the back of the queue. Ignore the adjacent explored node, to avoid doing unnecessary work The queue always contains explored nodes at the frontier between the explored area and the unexplored area. Iterate until the queue is empty and the whole connected component is explored.

BFS(graph G, start vertex s)
  # All nodes are assumed unexplored
  initialize a Queue Q (FIFO)
  mark s as explored
  place s in Q
  while Q has elements
    remove the head of the queue v
    for each edge (v, w):
      if w unexplored:
        mark w as explored
        add w to Q

When we "mark as explored" we can actually annotate each node marked as explored with a value of a distance from s that indicates the layers the nodes are in. This is how to count shortest path in graphs.

BFS Applications

Depth-First Search (DFS)