14.2: Matchings in Bipartite Graphs
Recall that a bipartite graph \(\textbf{G}=(V,E)\) is one in which the vertices can be properly colored using only two colors. It is clear that such a coloring then partitions \(V\) into two independent sets \(V_1\) and \(V_2\), and so all the edges are between \(V_1\) and \(V_2\). Bipartite graphs have many useful applications, particularly when we have two distinct types of objects and a relationship that makes sense only between objects of distinct types. For example, suppose that you have a set of workers and a set of jobs for the workers to do. We can consider the workers as the set \(V_1\) and the jobs as \(V_2\) and add an edge from worker \(w \in V_1\) to job \(j \in V_2\) if and only if \(w\) is qualified to do \(j\).
For example, the graph in Figure 14.2 is a bipartite graph in which we've drawn \(V_1\) on the bottom and \(V_2\) on the top.
If \(\textbf{G}=(V,E)\) is a graph, a set \(M \subseteq E\) is a matching in \(\textbf{G}\) if no two edges of \(M\) share an endpoint. If \(v\) is a vertex that is the endpoint of an edge in \(M\), we say that \(M\) saturates \(v\) or \(v\) is saturated by \(M\). When \(\textbf{G}\) is bipartite with \(V=V_1 \cup V_2\), a matching is then a way to pair vertices in \(V_1\) with vertices in \(V_2\) so that no vertex is paired with more than one other vertex. We're usually interested in finding a maximum matching , which is a matching that contains the largest number of edges possible, and in bipartite graphs we usually fix the sets \(V_1\) and \(V_2\) and seek a maximum matching from \(V_1\) to \(V_2\). In our workers and jobs example, the matching problem thus becomes trying to find an assignment of workers to jobs such that
i. each worker is assigned to a job for which he is qualified (meaning there's an edge),
ii. each worker is assigned to at most one job, and
iii. each job is assigned at most one worker.
As an example, in Figure 14.3 , the thick edges form a matching from \(V_1\) to \(V_2\). Suppose that you're the manager of these workers (on the bottom) and must assign them to the jobs (on the top). Are you really making the best use of your resources by only putting four of six workers to work? There are no trivial ways to improve the number of busy workers, as the two without responsibilities right now cannot do any of the jobs that are unassigned. Perhaps there's a more efficient assignment that can be made by redoing some of the assignments, however. If there is, how should you go about finding it? If there is not, how would you justify to your boss that there's no better assignment of workers to jobs?
At the end of the section, we'll briefly look at a theorem on matchings in bipartite graphs that tells us precisely when an assignment of workers to jobs exists that ensures each worker has a job. First, however, we want to see how network flows can be used to find maximum matchings in bipartite graphs. The algorithm we give, while decent, is not the most efficient algorithm known for this problem. Therefore, it is not likely to be the one used in practice. However, it is a nice example of how network flows can be used to solve a combinatorial problem. The network that we use is formed from a bipartite graph \(\textbf{G}\) by placing an edge from the source \(S\) to each vertex of \(V_1\) and an edge from each vertex of \(V_2\) to the sink \(T\). The edges between \(V_1\) and \(V_2\) are oriented from \(V_1\) to \(V_2\), and every edge is given capacity 1. Figure 14.4 contains the network corresponding to our graph from Figure 14.2 . Edges in this network are all oriented from bottom to top and all edges have capacity 1. The vertices in \(V_1\) are \(x_1,…,x_6\) in order from left to right, while the vertices in \(V_2\) are \(y_1,…,y_7\) from left to right.
Now that we have translated a bipartite graph into a network, we need to address the correspondence between matchings and network flows. To turn a matching \(M\) into a network flow, we start by placing one unit of flow on the edges of the matching. To have a valid flow, we must also place one unit of flow on the edges from \(S\) to the vertices of \(V_1\) saturated by \(M\). Since each of these vertices is incident with a single edge of \(M\), the flow out of each of them is 1, matching the flow in. Similarly, routing one unit of flow to \(T\) from each of the vertices of \(V_2\) saturated by \(M\) takes care of the conservation laws for the remaining vertices. To go the other direction, simply note that the full edges from \(V_1\) to \(V_2\) in an integer-valued flow is a matching. Thus, we can find a maximum matching from \(V_1\) to \(V_2\) by simply running the labeling algorithm on the associated network in order to find a maximum flow.
In Figure 14.5 , we show thick edges to show the edges with flow 1 in the flow corresponding to our guess at a matching from Figure 14.3 .
With priority sequence \(S,T,x_1,x_2,…,x_6,y_1,y_2,…,y_7\) replacing our usual pseudo-alphabetic order, the labeling algorithm produces the labels shown below.
\(S:(∗,+,∞)\) \(y_6:(x_6,+,1)\)
\(x_3:(S,+,1)\) \(x_1:(y_6,−,1)\)
\(x_5:(S,+,1)\) \(y_1:(x_1,+,1)\)
\(y_4:(x_3,+,1)\) \(y_2:(x_1,+,1)\)
\(y_5:(x_3,+,1)\) \(y_3:(x_1,+,1)\)
\(x_6:(y_4,−,1)\) \(x2:(y_1,−,1)\)
\(x4:(y_5,−,1)\) \(T:(y_2,+,1)\)
This leads us to the augmenting path \(S,x_3,y_4,x_6,y_6,x_1,y_2,T\), which gives us the flow shown in Figure 14.6 .
Is this a maximum flow? Another run of the labeling algorithm produces
\(S:(∗,+,∞)\) \(x_4:(y_5,−,1)\)
\(x_5:(S,+,1)\) \(y_4:(x_4,+,1)\)
\(y_5:(x_5,+,1)\) \(x_3:(y_4,−,1)\)
and then halts. Thus, the flow in Figure 14.6 is a maximum flow.
Now that we know we have a maximum flow, we'd like to be able to argue that the matching we've found is also maximum. After all, the boss isn't going to be happy if he later finds out that this fancy algorithm you claimed gave an optimal assignment of jobs to workers left the fifth worker (\(x_5\)) without a job when all six of them could have been put to work. Let's take a look at which vertices were labeled by the Ford-Fulkerson labeling algorithm on the last run. There were three vertices (\(x_3, x_4, and x_5\)) from \(V_1\) labeled, while there were only two vertices (\(y_4\) and \(y_5\)) from \(V_2\) labeled. Notice that \(y_4\) and \(y_5\) are the only vertices that are neighbors of \(x_3, x_4\), or \(x_5\) in \(\textbf{G}\). Thus, no matter how we choose the matching edges from \(\{x_3,x_4,x_5\}\), one of these vertices will be left unsaturated. Therefore, one of the workers must go without a job assignment. (In our example, it's the fifth, but it's possible to choose different edges for the matching so another one of them is left without a task.)
The phenomenon we've just observed is not unique to our example. In fact, in every bipartite graph \(\textbf{G}=(V,E)\) with \(V=V_1 \cup V_2\) in which we cannot find a matching that saturates all the vertices of \(V\), we will find a similar configuration. This is a famous theorem of Hall, which we state below.
Let \(\textbf{G}=(V,E)\) be a bipartite graph with \(V=V_1 \cup V_2\). There is a matching which saturates all vertices of \(V_1\) if and only if for every subset \(A⊆V_1\) , the set \(N⊆V\) of neighbors of the vertices in \(A\) satisfies \(|N| \geq |A|\) .