
# 17.5: Degree Distribution

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

Another local topological property that can be measured locally is, as we discussed already, the degree of a node. But if we collect them all for the whole network and represent them as a distribution, it will give us another important piece of information about how the network is structured:

A degree distribution of a network is a probability distribution

$P(k) =\frac{| \begin{Bmatrix} i | deg(i) =k \end{Bmatrix}| }{ n}. \label{(17.28)}$

i.e., the probability for a node to have degree $$k$$.

The degree distribution of a network can be obtained and visualized as follows:

The result is shown in Fig. 17.5.1.

Figure $$\PageIndex{1}$$: Visual output of Code 17.13.

You can also obtain the actual degree distribution P(k) as follows:

This list contains the value of (unnormalized) $$P(k) for k = 0,1,...,k_{max}$$, in this order. For larger networks, it is often more useful to plot a normalized degree histogram list in a log-log scale:

The result is shown in Fig. 17.5.2, which clearly illustrates differences between the three network models used in this example. The Erd˝os-R´enyi random network model has a bell-curved degree distribution, which appears as a skewed mountain in the log-log scale (blue solid line). The Watts-Strogatz model is nearly regular, and thus it has a very sharp peak at the average degree (green dashed line; $$k = 10\0 in this case). The Barab´asi-Albert model has a power-law degree distribution, which looks like a straight line with a negative slope in the log-log scale (red dotted line). Moreover, it is often visually more meaningful to plot not the degree distribution itself but its complementary cumulative distribution function (CCDF), deﬁned as follows: $F(k) =\sum_{k'=k} ^{\infty} P(k') \label{(17.29)}$ This is a probability for a node to have a degree \(k$$ or higher. By deﬁnition, $$F(0) = 1$$ and $$F(k_{max} + 1) = 0$$, and the function decreases monotonically along $$k$$. We can revise Code 17.15 to draw CCDFs:

Figure $$\PageIndex{2}$$: Visual output of Code 17.15.

In this code, we generate ccdf’s from Pk by calculating the sum of Pk after dropping its ﬁrst k entries. The result is shown in Fig. 17.5.2.
As you can see in the ﬁgure, the power law degree distribution remains as a straight line in the CCDF plot too, because $$F(k)$$ will still be a power function of$$k$$, as shown below:

Figure $$\PageIndex{3}$$: Visual output of Code 17.16.

F(k ) \begin{align} \sum_{k'=k} ^{\infty} {P(k')} = \sum_{k'=k} ^{infty}{ak' ^{-\gamma}} \label{(17.30)} \\ \approx \int_{k}^{infty} ak^{-\gamma}dk' = {\begin{bmatrix} \frac{ak^{J-\gamma+1}}{-\gamma + 1}\end{bmatrix}}^{\infty}_{k} = \frac{0-ak^{-\gamma+1}}{-\gamma +1} \label{(17.31)} \\ = \frac{a}{-\gamma -1}k ^{-(\gamma -1)} \label{(17.32)} \end{align}
This result shows that the scaling exponent of $$F(k)$$ for a power law degree distribution is less than that of the original distribution by 1, which can be visually seen by comparing their slopes between Figs. 17.5.2 and 17.5.3.

Exercise $$\PageIndex{1}$$

Import a large network data set of your choice from Mark Newman’s Network Data website: http://www-personal.umich.edu/~mejn/netdata/

Plot the degree distribution of the network, as well as its CCDF. Determine whether the network is more similar to a random, a small-world, or a scale-free network model.

If the network’s degree distribution shows a power law behavior, you can estimate its scaling exponent from the distribution by simple linear regression. You should use a CCDF of the degree distribution for this purpose, because CCDFs are less noisy than the original degree distributions. Here is an example of scaling exponent estimation applied to a Barab´asi-Albert network, where the linregress function in SciPy’s stats module is used for linear regression:

In the second code block, the domain and ccdf were converted to log scales for linear ﬁtting. Also, note that the original ccdf contained values for all k’s, even for those for which $$P(k) = 0$$. This would cause unnecessary biases in the linear regression toward the higher k end where actual samples were very sparse. To avoid this, only the data points where the value of $$F$$ changed (i.e., where there were actual nodes with degree k) are collected in the logkdata and logFdata lists.

The result is shown in Fig. 17.5.4, and also the following output comes out to the terminal, which indicates that this was a pretty good ﬁt:

Figure $$\PageIndex{4}$$: Visual output of Code 17.17.
According to this result, the CCDF had a negative exponent of about -1.97. Since this value corresponds to$$−(γ−$$1$$)$$, the actual scaling exponent $$γ$$ is about 2.97, which is pretty close to its theoretical value, 3.

Exercise $$\PageIndex{2}$$

Obtain a large network data set whose degree distribution appears to follow a power law, from any source (there are tons available online, including Mark Newman’s that was introduced before). Then estimate its scaling exponent using linear regression.