# 17.5: Degree Distribution

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

Another local topological property that can be measured locally is, as we discussed already, the degree of a node. But if we collect them all for the whole network and represent them as a distribution, it will give us another important piece of information about how the network is structured:

A *degree distribution* of a network is a probability distribution

\[P(k) =\frac{| \begin{Bmatrix} i | deg(i) =k \end{Bmatrix}| }{ n}. \label{(17.28)}\]

i.e., the probability for a node to have degree \(k\).

The degree distribution of a network can be obtained and visualized as follows:

The result is shown in Fig. 17.5.1.

* Figure \(\PageIndex{1}\)*: Visual output of Code 17.13.

You can also obtain the actual degree distribution P(k) as follows:

This list contains the value of (unnormalized) \(P(k) for k = 0,1,...,k_{max}\), in this order. For larger networks, it is often more useful to plot a normalized degree histogram list in a log-log scale:

The result is shown in Fig. 17.5.2, which clearly illustrates differences between the three network models used in this example. The Erd˝os-R´enyi random network model has a bell-curved degree distribution, which appears as a skewed mountain in the log-log scale (blue solid line). The Watts-Strogatz model is nearly regular, and thus it has a very sharp peak at the average degree (green dashed line; \(k = 10\0 in this case). The Barab´asi-Albert model has a power-law degree distribution, which looks like a straight line with a negative slope in the log-log scale (red dotted line).

Moreover, it is often visually more meaningful to plot not the degree distribution itself but its *complementary cumulative distribution function (CCDF)*, deﬁned as follows:

\[F(k) =\sum_{k'=k} ^{\infty} P(k') \label{(17.29)}\]

This is a probability for a node to have a degree \(k\) or higher. By deﬁnition, \(F(0) = 1\) and \(F(k_{max} + 1) = 0\), and the function decreases monotonically along \(k\). We can revise Code 17.15 to draw CCDFs:

* Figure \(\PageIndex{2}\)*: Visual output of Code 17.15.

In this code, we generate `ccdf`

’s from` Pk`

by calculating the sum of` Pk`

after dropping its ﬁrst `k `

entries. The result is shown in Fig. 17.5.2.

As you can see in the ﬁgure, the power law degree distribution remains as a straight line in the CCDF plot too, because \(F(k)\) will still be a power function of\( k\), as shown below:

** Figure \(\PageIndex{3}\)**: Visual output of Code 17.16.

\[F(k ) \begin{align} \sum_{k'=k} ^{\infty} {P(k')} = \sum_{k'=k} ^{infty}{ak' ^{-\gamma}} \label{(17.30)} \\ \approx \int_{k}^{infty} ak^{-\gamma}dk' = {\begin{bmatrix} \frac{ak^{J-\gamma+1}}{-\gamma + 1}\end{bmatrix}}^{\infty}_{k} = \frac{0-ak^{-\gamma+1}}{-\gamma +1} \label{(17.31)} \\ = \frac{a}{-\gamma -1}k ^{-(\gamma -1)} \label{(17.32)} \end{align}\]

This result shows that the scaling exponent of \(F(k)\) for a power law degree distribution is less than that of the original distribution by 1, which can be visually seen by comparing their slopes between Figs. 17.5.2 and 17.5.3.

Exercise \(\PageIndex{1}\)

Import a large network data set of your choice from Mark Newman’s Network Data website: http://www-personal.umich.edu/~mejn/netdata/

Plot the degree distribution of the network, as well as its CCDF. Determine whether the network is more similar to a random, a small-world, or a scale-free network model.

If the network’s degree distribution shows a power law behavior, you can estimate its scaling exponent from the distribution by simple linear regression. You should use a CCDF of the degree distribution for this purpose, because CCDFs are less noisy than the original degree distributions. Here is an example of scaling exponent estimation applied to a Barab´asi-Albert network, where the `linregress`

function in SciPy’s` stats `

module is used for linear regression:

In the second code block, the `domain `

and `ccdf `

were converted to log scales for linear ﬁtting. Also, note that the original `ccdf`

contained values for all `k’`

s, even for those for which \(P(k) = 0\). This would cause unnecessary biases in the linear regression toward the higher `k`

end where actual samples were very sparse. To avoid this, only the data points where the value of \(F\) changed (i.e., where there were actual nodes with degree `k`

) are collected in the` logkdata`

and` logFdata`

lists.

The result is shown in Fig. 17.5.4, and also the following output comes out to the terminal, which indicates that this was a pretty good ﬁt:

* Figure \(\PageIndex{4}\)*: Visual output of Code 17.17.

According to this result, the CCDF had a negative exponent of about -1.97. Since this value corresponds to\(−(γ−\)1\()\), the actual scaling exponent \(γ\) is about 2.97, which is pretty close to its theoretical value, 3.

Exercise \(\PageIndex{2}\)

Obtain a large network data set whose degree distribution appears to follow a power law, from any source (there are tons available online, including Mark Newman’s that was introduced before). Then estimate its scaling exponent using linear regression.