17.7: Community Structure and Modularity

Last updated
Save as PDF

Page ID: 10000

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

The final topics of this chapter are the community structure and modularity of a network. These topics have been studied very actively in network science for the last several years. These are typical mesoscopic properties of a network; neither microscopic (e.g., degrees or clustering coefficients) nor macroscopic (e.g., density, characteristic path length) properties can tell us how a network is organized at spatial scales intermediate between those two extremes, and therefore, these concepts are highly relevant to the modeling and understanding of complex systems too.

Community A set of nodes that are connected more densely to each other than to the rest of the network. Communities may or may not overlap with each other, depending on their definitions.

Modularity The extent to which a network is organized into multiple communities.

Figure 17.7.1 shows an example of communities in a network.

There are literally dozens of different ways to define and detect communities in a network. But here, we will discuss just one method that is now widely used by network science researchers: the Louvain method, proposed by Vincent Blondel et al. in 2008 [77]. It is a very fast, efficient heuristic algorithm that maximizes the modularity of nonoverlapping community structure through an iterative, hierarchical optimization process.

$Fig 17.13.PNG$

The modularity of a given set of communities in a network is defined as follows [78]:

\[Q= \frac{|E_{in}|- \langle{E_{in} \rangle}}{|E|} \label{(17.35)} \]

Here, $|E|$ is the number of edges, $|E_{in}|$ is the number of within-community edges (i.e., those that don’t cross boundaries between communities), and $\langle{|E_{in}|} \rangle$ is the expected number of within-community edges if the topology were purely random. The subtraction of $\langle{|E_{in}|} \rangle$ on the numerator penalizes trivial community structure, such as considering the entire network a single community that would trivially maximize ${|E_{in}|}$.

The Louvain method finds the optimal community structure that maximizes the modularity in the following steps:

1. Initially, each node is assigned to its own community where the node itself is the only community member. Therefore the number of initial communities equals the number of nodes.
2. Each node considers each of its neighbors and evaluates whether joining to the neighbor’s community would increase the modularity of the community structure. After evaluating all the neighbors, it will join the community of the neighbor that achieves the maximal modularity increase (only if the change is positive; otherwise the node will remain in its own community). This will be repeatedly applied for all nodes until no more positive gain is achievable.
3. The result of Step 2 is converted to a new meta-network at a higher level, by aggregating nodes that belonged to a single community into a meta-node, representing edges that existed within each community as the weight of a self-loop attached to the meta-node, and representing edges that existed between communities as the weights of meta-edges that connect meta-nodes.
4. The above two steps are repeated until no more modularity improvement is possible.

One nice thing about this method is that it is parameter-free; you don’t have to specify the number of communities or the criteria to stop the algorithm. All you need is to provide network topology data, and the algorithm heuristically finds the community structure that is close to optimal in achieving the highest modularity.

Unfortunately, NetworkX doesn’t have this Louvain method as a built-in function, but its Python implementation has been developed and released freely by Thomas Aynaud, which is available from perso.crans.org/aynaud/communities/. Once you install it, a new community module becomes available in Python. Here is an example:

$Code 17.22.PNG$

Here, the two important functions in the community module are tested. The first one is best_partition, which generates community structure using the Louvain method. The result is given as a dictionary where keys and values are node IDs and community IDs, respectively. The second function shown above is modularity, which receives community structure and a network and returns the modularity value achieved by the given communities.

Exercise $\PageIndex{1}$

Visualize the community structure in the Karate Club graph using the community IDs as the colors of the nodes.

Exercise $\PageIndex{2}$

Import a large network data set of your choice from Mark Newman’s Network Data website: http://www-personal.umich.edu/~mejn/netdata/ Detect its community structure using the Louvain method and visualize it if possible.

Exercise $\PageIndex{3}$

Do a quick online literature search for other community detection algorithms (e.g., Girvan-Newman method, $k$-clique percolation method, random walk method, etc.). Choose one of them and read the literature to learn how it works. If the software is available, try it yourself on the Karate Club graph or any other network and see how the result differs from that of the Louvain method.

Exercise \(\PageIndex{1}\)

Exercise \(\PageIndex{2}\)

Exercise \(\PageIndex{3}\)