detecting different topologies immanent in scale-free ...detecting different topologies immanent in...

6
Detecting different topologies immanent in scale-free networks with the same degree distribution Dimitrios Tsiotas (Δημήτριoς Τσιώτας) a,1 a Department of Planning and Regional Development, University of Thessaly, 38334 Volos, Greece Edited by Terrence J. Sejnowski, Salk Institute for Biological Studies, La Jolla, CA, and approved February 22, 2019 (received for review September 29, 2018) The scale-free (SF) property is a major concept in complex networks, and it is based on the definition that an SF network has a degree distribution that follows a power-law (PL) pattern. This paper high- lights that not all networks with a PL degree distribution arise through a Barabási-Albert (BA) preferential attachment growth process, a fact that, although evident from the literature, is often overlooked by many researchers. For this purpose, it is demonstrated, with simulations, that established measures of network topology do not suffice to dis- tinguish between BA networks and other (random-like and lattice- like) SF networks with the same degree distribution. Additionally, it is examined whether an existing self-similarity metric proposed for the definition of the SF property is also capable of distinguish- ing different SF topologies with the same degree distribution. To contribute to this discrimination, this paper introduces a spectral metric, which is shown to be more capable of distinguishing between different SF topologies with the same degree distribution, in comparison with the existing metrics. network science | BarabásiAlbert networks | preferential attachment | pattern recognition | power-law degree distribution T he scale-free (SF) property is a fundamental concept in the study of complex networks (1, 2), which describes networks where their degree distribution p(k) follows asymptotically a power-law (PL) pattern, according to pðkÞ k γ , [1] where k is the node degree and γ is the PL exponent that should be γ > 1 so that the Riemann zeta function will be finite (3). It has been found that many real-world networks have the SF prop- erty, such as biological, citation, spatial, economic, and social networks (46), along with the Internet and the World Wide Web (2). For such networks, the PL exponent usually ranges within the interval 2 < γ < 3 (1), although sometimes it may exceed these bounds (7). The SF networks have been named so because PLs have their functional form f(x) = α·x β unaltered at all scales. This implies that a rescale (x: = cx) of the independent variable x changes the form of f(x) only through a multiplicative factor that depends on the PL exponent, according to the equation f(cx) = (c β )·f(x). However, this property, with respect to rescaling, is, by defini- tion, being satisfied only for the degree, and therefore it cannot be considered as a universal property that describes all of the measurable structural attributes in the SF networks (4). For instance, it is not certain that, in an SF network, the distribution of the local clustering coefficient or of the betweenness centrality also follows a PL pattern similarly to the degree (4, 8), and thus the SF property does not, by default, describe these measures too. Moreover, the definition of the SF networks is based on a statistical property (i.e., on fitting PLs to the degree distribution of real-world networks) (14, 6), which renders more an empirical (or approxi- mate) and less a structural nature to this definition. Therefore, the definition of the SF property is very broad, and it is not linked directly to a characteristic type of network topology. A step toward linking the SF property with a characteristic network topology was made when the authors of ref. 2 proposed a procedure generating SF networks, which is commonly known as the BarabásiAlbert (BA) model. This procedure is based on growth and on the so-called mechanism of preferential attach- ment (1, 9), according to which an SF network is produced over time when the probability for a node to gain a new connection is proportional to the nodes degree. This implies that new nodes entering the network preferto connect with the already highly connected ones and thus that an SF network forms hierarchies. In this hierarchical structure, a few nodes (the hubs) undertake the major load of connectivity, a fact that is reflected in the PL shape of the degree distribution (1). The BA networks abound in the scientific literature, and thus they have become the standard SF reference model (1, 2, 46). This is obviously because the BA was the first model that successfully described a procedure generating SF networks, but also because the hub-and-spoke hierarchical structure of BA networks is quite important in many disciplines, such as in applied, biological, and socioeconomic research (46, 10), as well as in other real-world applications (1, 4, 11). Although the BA model is capable of producing SF networks, it is not the only model with this capability (4). Indicatively, an alternative to the BA model is the so-called fitting model (or DMS model, which was named so from the initials of its authors) (12), and it is based on a linear preferential attachment procedure, where an additional parameter of the nodesinitial attractiveness is considered in the models algorithm (4). Recently, the authors of ref. 13 proposed a model generating star-like SF networks (called superstar networks). This model has a stronger bias toward high- degree nodes than exhibited by standard preferential attachment. The use of different null models (i.e., reference models, which are generated by a random process and describe a set of features Significance This paper highlights that not all scale-free (SF) networks arise through a Barabási-Albert (BA) preferential attachment process. Although evident from the literature, this fact is often over- looked by many researchers. For this purpose, it is demon- strated, with simulations, that established network measures cannot distinguish between BA networks and other SF networks (random-like and lattice-like) with the same degree distribution. Additionally, it is examined whether an existing self-similarity metric is also capable of distinguishing different SF topologies with the same degree distribution. To contribute to this discrimi- nation, this paper introduces a spectral metric, which is shown to be more capable of distinguishing between different SF topologies with the same degree distribution, in comparison with the existing metrics. Author contributions: D.T. designed research, performed research, contributed new reagents/ analytic tools, analyzed data, and wrote the paper. The author declares no conflict of interest. This article is a PNAS Direct Submission. Published under the PNAS license. 1 Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1816842116/-/DCSupplemental. Published online March 15, 2019. www.pnas.org/cgi/doi/10.1073/pnas.1816842116 PNAS | April 2, 2019 | vol. 116 | no. 14 | 67016706 STATISTICS Downloaded by guest on May 21, 2020

Upload: others

Post on 21-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Detecting different topologies immanent in scale-freenetworks with the same degree distributionDimitrios Tsiotas (Δημήτριoς Τσιώτας)a,1

aDepartment of Planning and Regional Development, University of Thessaly, 38334 Volos, Greece

Edited by Terrence J. Sejnowski, Salk Institute for Biological Studies, La Jolla, CA, and approved February 22, 2019 (received for review September 29, 2018)

The scale-free (SF) property is a major concept in complex networks,and it is based on the definition that an SF network has a degreedistribution that follows a power-law (PL) pattern. This paper high-lights that not all networks with a PL degree distribution arise througha Barabási−Albert (BA) preferential attachment growth process, a factthat, although evident from the literature, is often overlooked bymanyresearchers. For this purpose, it is demonstrated, with simulations, thatestablished measures of network topology do not suffice to dis-tinguish between BA networks and other (random-like and lattice-like) SF networks with the same degree distribution. Additionally,it is examined whether an existing self-similarity metric proposedfor the definition of the SF property is also capable of distinguish-ing different SF topologies with the same degree distribution. Tocontribute to this discrimination, this paper introduces a spectralmetric, which is shown to be more capable of distinguishing betweendifferent SF topologies with the same degree distribution, in comparisonwith the existing metrics.

network science | Barabási−Albert networks | preferential attachment |pattern recognition | power-law degree distribution

The scale-free (SF) property is a fundamental concept in thestudy of complex networks (1, 2), which describes networks

where their degree distribution p(k) follows asymptotically apower-law (PL) pattern, according to

pðkÞ∼ k–γ , [1]

where k is the node degree and γ is the PL exponent that shouldbe γ > 1 so that the Riemann zeta function will be finite (3). Ithas been found that many real-world networks have the SF prop-erty, such as biological, citation, spatial, economic, and socialnetworks (4–6), along with the Internet and the World Wide Web(2). For such networks, the PL exponent usually ranges within theinterval 2 < γ < 3 (1), although sometimes it may exceed thesebounds (7).The SF networks have been named so because PLs have their

functional form f(x) = α·x−β unaltered at all scales. This impliesthat a rescale (x: = cx) of the independent variable x changes theform of f(x) only through a multiplicative factor that depends onthe PL exponent, according to the equation f(cx) = (c–β)·f(x).However, this property, with respect to rescaling, is, by defini-tion, being satisfied only for the degree, and therefore it cannotbe considered as a universal property that describes all of themeasurable structural attributes in the SF networks (4). For instance,it is not certain that, in an SF network, the distribution of thelocal clustering coefficient or of the betweenness centrality alsofollows a PL pattern similarly to the degree (4, 8), and thus theSF property does not, by default, describe these measures too.Moreover, the definition of the SF networks is based on a statisticalproperty (i.e., on fitting PLs to the degree distribution of real-worldnetworks) (1–4, 6), which renders more an empirical (or approxi-mate) and less a structural nature to this definition. Therefore, thedefinition of the SF property is very broad, and it is not linked directlyto a characteristic type of network topology.A step toward linking the SF property with a characteristic

network topology was made when the authors of ref. 2 proposed

a procedure generating SF networks, which is commonly knownas the Barabási−Albert (BA) model. This procedure is based ongrowth and on the so-called mechanism of preferential attach-ment (1, 9), according to which an SF network is produced overtime when the probability for a node to gain a new connection isproportional to the node’s degree. This implies that new nodesentering the network “prefer” to connect with the already highlyconnected ones and thus that an SF network forms hierarchies.In this hierarchical structure, a few nodes (the hubs) undertakethe major load of connectivity, a fact that is reflected in the PLshape of the degree distribution (1). The BA networks abound inthe scientific literature, and thus they have become the standardSF reference model (1, 2, 4–6). This is obviously because the BA wasthe first model that successfully described a procedure generating SFnetworks, but also because the hub-and-spoke hierarchical structureof BA networks is quite important in many disciplines, such as inapplied, biological, and socioeconomic research (4–6, 10), as well asin other real-world applications (1, 4, 11).Although the BA model is capable of producing SF networks,

it is not the only model with this capability (4). Indicatively, analternative to the BA model is the so-called fitting model (orDMS model, which was named so from the initials of its authors)(12), and it is based on a linear preferential attachment procedure,where an additional parameter of the nodes’ initial attractiveness isconsidered in the model’s algorithm (4). Recently, the authors ofref. 13 proposed a model generating star-like SF networks (called“superstar networks”). This model has a stronger bias toward high-degree nodes than exhibited by standard preferential attachment.The use of different null models (i.e., reference models, which

are generated by a random process and describe a set of features

Significance

This paper highlights that not all scale-free (SF) networks arisethrough a Barabási−Albert (BA) preferential attachment process.Although evident from the literature, this fact is often over-looked by many researchers. For this purpose, it is demon-strated, with simulations, that established network measurescannot distinguish between BA networks and other SF networks(random-like and lattice-like) with the same degree distribution.Additionally, it is examined whether an existing self-similaritymetric is also capable of distinguishing different SF topologieswith the same degree distribution. To contribute to this discrimi-nation, this paper introduces a spectral metric, which is shown tobe more capable of distinguishing between different SF topologieswith the same degree distribution, in comparison with the existingmetrics.

Author contributions: D.T. designed research, performed research, contributed new reagents/analytic tools, analyzed data, and wrote the paper.

The author declares no conflict of interest.

This article is a PNAS Direct Submission.

Published under the PNAS license.1Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1816842116/-/DCSupplemental.

Published online March 15, 2019.

www.pnas.org/cgi/doi/10.1073/pnas.1816842116 PNAS | April 2, 2019 | vol. 116 | no. 14 | 6701–6706

STATIST

ICS

Dow

nloa

ded

by g

uest

on

May

21,

202

0

of certain network topology) to generate SF networks does notby default result in the same network topology (13, 14). Thishappens even in cases where the degree distribution of thesemodels is the same. For instance, the authors of ref. 14 studied,with simulations, some structural properties of SF networks withthe same degree distribution (the BA model, the Molloy−Reedmodel, the Kalisky model, and two SF models proposed by theauthors, named “MA” and “MB”), and they observed that thesenetworks have different structural properties in terms of theirnumber of components, their components’ size, and global efficiency(i.e., the harmonic mean of the geodesic edge lengths; see ref. 4).According to this approach, the structures of the examined modelsranged between a decentralized pattern with a larger number ofcomponents and a centralized BA pattern with all vertices includedin a single component and with a medium to high global efficiency.Within this context, the definition of the SF property was

submitted to criticism for being abstract. Indicatively, the authors ofref. 15 noted that SF networks inherit from the literature “an im-precision as to what exactly SF means.” In particular, based on therelevant literature, the authors summarized that the SF networkshave a scaling (PL) degree distribution, they are generated by certainrandom processes (one of which is preferential attachment), they havehighly connected hubs, they preserve their property under randomdegree-preserving rewiring, they are self-similar (i.e., their total issimilar to one or more of its parts; see ref. 8), and they are inde-pendent of specific domain attributes. Toward an attempt to specializethis broad definition, for the detection of the SF property, the authorsof ref. 15 proposed a self-similarity metric defined by the formula

SðGÞ= sðGÞ=smax, [2]

where sðGÞ=Pðu, vÞ∈EdegðuÞ · degðvÞ, smax = max{s(H)}, and H is

the set of graphs with degree distribution identical to that of G.The S(G) metric indicates the existence of the SF property

when it is (maximum and) close to 1 [S(G) ≈ 1], which denotesthat the hubs in the network are connected to each other.However, with the introduction of the superstar SF networks,which are dominated by a single hub, the authors of ref. 13 haveshown that connectivity between hubs is not a defining conditionfor the SF networks. Thereupon, they proposed an approach fordistinguishing between the BA and the superstar SF topology.This approach was based on the counting of the number (andsize) of hubs and on the definition of the minimum degree of hubsin terms of the theoretic exponent of the degree distribution.As is evident from the previous short review, the SF property is

very broad to describe a singular topology in networks. Substantially,this property defines a family of networks (the SF networks),where the BA model is only one member of this family. Since theBA networks prevail in the scientific literature (1, 2, 4–6), manyresearchers seem to overlook their difference from the SF networks.One reason for this is probably that the detection of the SF propertyis an easy task, based on the PL definition (1, 2, 4), and thus nospecial tools have been developed for discrimination among theSF networks. A bright exception to this observation may be thework of ref. 13, which proposed a discrimination method betweenBA and superstar SF networks (although it suggested as optimalthe structure of the superstar SF networks against the BA’s), butthis approach still depends on a degree distribution consideration,and it is based on a (superstar) SF null model where its pro-gramming code is not broadly available; thus it is not easy to im-plement this approach in order to evaluate its ability to discriminateamong other members of the SF networks’ family. Further, whentaking into account the recent work of ref. 16, the authors of whichclaim that “SF networks are rare” in nature, it can be argued thatthe BA model is the most important of the SF networks because itsuggests the common choice for empirical studies in network sci-ence. Therefore, the development of tools discriminating between

the BA model and other SF networks suggests an up-to-date andimportant issue for network science.Within this context, this paper highlights that not all networks

with a PL degree distribution arise through a BA preferentialattachment growth process, a fact which, although evident fromthe literature, is often overlooked by many researchers. Towardthis direction, it is demonstrated, with simulations, that manyestablished measures of network topology do not suffice to distin-guish between BA networks and other SF networks (i.e., belongingto the family of SF networks) with the same degree distribution.Additionally, it is examined whether the S(G) structural metricproposed by the authors of ref. 15, which provides a self-similardefinition of the SF property, is also capable to distinguish betweendifferent SF topologies with the same degree distribution. Finally,to contribute to this discrimination, this paper introduces a spectralmetric, which is defined with reference to the main diagonal of theadjacency matrix. The proposed measure is shown to be morecapable of distinguishing between different SF topologies with thesame degree distribution, in comparison with the existing metrics.The remainder of this paper is organized as follows: Section 1

describes the SF null models used in the analysis, and section 2 showsthe simulation results, performs a statistical inference analysis onsome fundamental network measures, and examines the ability of theS(G) metric to distinguish among some members of the SF networksfamily with the same degree distribution. Section 3 proposes a spectralmetric and examines the ability of this metric to distinguish among theavailable SF networks, and, finally, in section 4, conclusions are given.

1. Null ModelsThe undirected (source) null models with the SF property weregenerated using the algorithm of the Generalized BA model (2),which is available in the open-source software of ref. 17 (version0.8.2). The initial parameters of the generator were set each time tothe default values (SI Appendix), where their modifications mayprovide addresses for further research. The number of the algo-rithm’s steps ranged between 10 and 15,000 (SI Appendix, Table S1),which was submitted to a customized systematic sampling with agradually increasing lag, aiming to produce networks with anincreasing number of nodes. Null models for more than 15,000steps were not generated, due to computational constraints. Dueto the probabilistic architecture in the generator’s algorithm, theproduced SF null models include isolated nodes. However, theseisolated nodes were ignored to apply PL fittings to the degreedistributions of the null models.Further, associated random-like (RL) and lattice-like (LL)

null models, with the same degree distribution as the source BAnetwork (see SI Appendix, Fig. S1), were generated using the“randomization” (18) and “latticization” (18, 19) iterative algo-rithms, which are available inm-file format from ref. 20. Accordingto the randomization algorithm, network nodes are randomly be-ing chosen in quadruplets (u, v, w, and z), so that the edges euv andewz belong to the network [euv,ewz ∈ E(G), where E(G) ≡ E is theedge set of the network G], whereas the edges euz and evw do not(euv,ewz ∉ E). These edges are then rewired so that euv,ewz ∉ E andeuv,ewz ∈ E, provided that none of the new edges already exist inthe network; if new edges already exist, the rewiring step is abortedand a new quadruplet is selected. This procedure preserves thedegree distribution even in cases of directed networks. The latti-cization algorithm follows the same procedure with the randomi-zation algorithm, under the constraint that “swaps are only carriedout if the resulting matrix has nonzero entries that are locatedcloser to the main diagonal (thus approximating a lattice or ringtopology). This algorithm is implemented as a probabilistic opti-mization using a weighted cost function” (19).

2. Simulations and AnalysisSimulations were conducted on 44 undirected BA (SF) networks(see SI Appendix, Table S1), and on 44 RL (GRL) and 44 LL

6702 | www.pnas.org/cgi/doi/10.1073/pnas.1816842116 Tsiotas

Dow

nloa

ded

by g

uest

on

May

21,

202

0

(GLL) associated null models, all of which were produced fromthe BA models, and they have the same number of nodes (n) andedges (m) and the same degree distribution p(k) as the source(BA) networks. The topologies of the BA, RL, and LL nullmodels were embedded (visualized) in the 2D Euclidean spaceusing the “Force-Atlas” layout which is available in the open-sourcesoftware of ref. 17. This layout is produced by a force-directed al-gorithm (see ref. 21), which is developed by the software’s devel-opers and is based on applying repulsion strengths between thenetwork hubs while arranging the hubs’ connections into sur-rounding clusters. The Force-Atlas algorithm (17) is used on thesoftware’s default parameters (see SI Appendix).An indicative picture of the topologies produced for the

available network types is shaped in Fig. 1, which displays thetopological layouts and the sparsity (spy) plots (i.e., matrix plotsdisplaying nonzero elements with dots; see ref. 10) of their ad-jacency matrices of the G(i)(4,981, 7,469) null models, where G(n,m) is a network with n nodes and m edges and i = BA, RL, LLindicates the network type (where appropriate, the i index re-ferring to the model type will not be written, due to simplicity).Since all graphs are subjected to the same embedding (and thusto the same transformation rules) (6, 22), comparisons amongthe layouts shown in Fig. 1 are possible. As can be observed fromFig. 1, the topological layouts and spy plots of the BA, RL, andLL null models appear considerably different. In particular, thetopological layout of the LL model is obviously different from theothers because it configures a dense torus of nodes with a smallcentral core, whereas the BA and the RL models configure greatercores surrounded by node rings (expressing isolated nodes) ofnegligible thickness (number of nodes). The topological layoutsof the BA and RL models appear similar, where a slight differencein the density of their cores can be observed, which, for the BAmodel, seems bigger. A similar picture is also shaped by the ex-amination of the spy plots of these three null models, where the LLpattern configures a dense strap along the main diagonal withsmall concentrations in the top right and bottom left corners of thematrix. Conversely, the BA and RL models configure a scatteredpattern throughout the area of the adjacency matrix with a denseconcentration in the top left corner. In general, in their vast ma-jority, except the case ofG(n = 4,m = 3), the null models illustratea similar topological picture to that shown in Fig. 1.In an attempt to quantify the previous observations based on

the topological maps, a statistical inference analysis is applied toa set of measures describing fundamental topological aspects innetworks. In particular, the measures participating in the analysisare the network diameter (dG), which is the length of the longestpath describing the scale of network (23), the modularity (Q),which is an objective function expressing the network’s ability tobe divided into communities (24), the number of connectedcomponents (NC), which expresses the level of network con-nectivity (6, 10), the average (hCi) and local (C) clustering co-efficient, which express (in global and local level, respectively)the degree to which nodes in a graph tend to cluster together(25), the average path length (hli), which measures the in-formation efficiency in the network (1), and the network assor-tativity (r), which measures the preference of network nodes toattach to other similar (13). These measures were selected in theanalysis from a broader set of network measures available in theliterature (1, 6, 13), because they were the only ones havingdistinct statistical properties for any pair of the three networktypes BA, RL, and LL, and thus their overall view may give in-sights about what network topology is, in total (see ref. 10).Within this context, Fig. 2 shows 95% confidence intervals

(CIs) (26) for these topological measures, which are computedfor each network type (BA, RL, and LL) of the available nullmodels. As can be observed, in all cases, the CIs of the BA andRL models are different but overlaid (i.e., their mean values canbe considered statistically equal), whereas the LL null models

are distinct and do not overlay the other network types (i.e., theirmean values can be considered statistically different), except themeasures dG and NC. This analysis shows that classic network to-pological measures do not succeed in discriminating among thesethree types of null models, where the cases of BA and RL are ofquite similar topologies and therefore are difficult to discriminate.Next, to evaluate the capability of the SF metric proposed by

the authors of ref. 15 to discriminate among these three typesof null models, 95% and 99% CIs are computed for the s(G) andS(G) metrics, as is shown in Fig. 3. Computations are conductedon the set H = {BA, RL, LL}, according to Eq. 2. As can beobserved, the s(G) metric does not succeed in discriminat-ing among these available types (BA, RL, and LL), whereas theS(G) shows quite distinct results. In particular, S(G) scoreswithin the interval [0.988, 0.998] correspond (with 95% certainty) tothe BA topology, and, within the interval [0.941, 0.964], theycorrespond to the RL topology, whereas, within the interval[0.871, 0.905], they correspond to the LL topology.As a further analysis, 99% CIs—produced based on the Stu-

dent’s distribution—are computed on “dynamic” sample sizeconsisting of samples with a sequentially decreasing number ofcases (null models). In particular, let’s consider as X1 = {G1, G2,G3,. . ., G44} the total set of the available null models (it can beeither G: = BA, or G: = RL, or G: = LL; see SI Appendix, TableS1). Then, we define the set Xi = Xi–1 – {Gi} = {Gi, Gi+1,. . ., G44},where i = 2,3,. . .41 (samples with less than four null modelswere not considered). The number of the null models in theset Xi is 45 – i. In this analysis, the lengths of the CIs differdue to the different number of cases included in each set Xi.Additionally, the Student’s distribution which produces broaderintervals is chosen for the computation of the 99% CIs, tocounterbalance the uncertainty due to the probabilistic nature ofthe simulation and due to sampling constraints. Within thiscontext, the calculation of the CIs on the dynamic sample sizeshows that, for almost 24% of the available samples (10 out of41), for which the S(G) metric was computed, it is not possible todiscriminate between the null models RL and LL. This is espe-cially observed for samples including bigger networks (X32,X33,. . ., X41), where the number of nodes is n ≥ 1,384, althoughthis threshold can be considered as an approximate rather than a

A1

B1 B2 B3

BA A2 RL A3 LL

BA RL LL

0

1k

2k

3k

4k

0 1k 2k 3k 4k

0

1k

2k

3k

4k

0 1k 2k 3k 4k

0

1k

2k

3k

4k

0 1k 2k 3k 4k

Fig. 1. (A) Topological layouts using the Force-Atlas embedding, which isavailable in the open-source software of the authors of ref. 17. (B) Sparsity(spy) plots of the adjacency matrices for (b1) a BA, (b2) an RL, and (b3) an LLundirected graph, with n = 4,981 and m = 7,469. All graphs have the samedegree distribution p(k), which follows a PL pattern (see SI Appendix, Fig.S1). Axes in the adjacencies count number of nodes, where the symbol “k” inaxes refers to thousands.

Tsiotas PNAS | April 2, 2019 | vol. 116 | no. 14 | 6703

STATIST

ICS

Dow

nloa

ded

by g

uest

on

May

21,

202

0

legitimate cutoff. Increasing the precision in this threshold maysuggest an avenue for further research.

3. Proposing an SF Detection Spectral MetricThe previous analysis has shown, first, that it is not possible todiscriminate the BA topology among the null models BA, RL,and LL (which have the same degree distribution) by usingclassic measures of network topology and, second, that the S(G)metric proposed by the authors of ref. 15 is a good measure tosucceed this discrimination, but it is difficult to discriminate be-tween the BA and RL topologies when the network size getsbigger. Within this context, this paper proposes a spectral metricfor discrimination of the BA, RL, and LL topologies, which wasinspired by the spatial constraint used in the latticization algorithm(18, 19) and by the spy plots’ layouts of the adjacency matrices thatare shown in Fig. 1. In particular, the proposed metric measuresthe average distance from the main diagonal of the nonzero ele-ments in the adjacency matrix of a graph (see SI Appendix, Fig. S2),and it is defined by the following math formula:

DD= ddðAÞ= hddi= 1n2

Xði, jÞ∈E

ddij =1n2

Xði, jÞ∈E

xij · yijzij

=1ffiffiffi2

p· n2

Xði, jÞ∈E

ji− jj,[3]

where ddij is the distance of the element (i,j) from the maindiagonal of an adjacency A, xij = j(i,i)–(i,j)j, yij = j(j,j)–(i,j)j,zij =

ffiffiffiffiffiffiffiffiffiffiffiffiffix2ij + y2ij

q, n is the number of network nodes, and h·i is the

average operant.The proposed metric is given the name “diagonal distance”

(DD) of the adjacency A, and it expresses the average of dis-tances (heights) ddij that intersect the right angle of the triangles(αiiαijαjj) shown in SI Appendix, Fig. S2. The DD may also suggesta measure useful to recurrence quantification analysis (RQA),especially to the family of the diagonal-referenced RQA mea-sures and metrics (see ref. 27), but its evaluation in this fieldsuggests a topic for further research. However, the DD is sensi-tive under node reordering (or relabeling). For instance, let’s

consider a network G(10,1), with n = 10 nodes and a single edge(m = 1) connecting the first (n1) and the second (n2) nodes(e1,2∈E). For this network, the edge’s distance to the main di-agonal equals ddðe1,2Þ= 1=

ffiffiffi2

p. However, by swapping (relabeling)

nodes 2 and 10 (n2 ↔ n10), the distance to the main diagonal of the

A B C

D E F

G

Fig. 2. CIs of 95% confidence level, for the topological measures of (A) network diameter (dG), (B) modularity (Q), (C) number of components (NC), (D) average (〈C〉) and(E) local (C) clustering coefficient, (F) average path length (〈l〉), and (G) assortativity (r), computed for each network type (BA, RL, and LL) of the available null models.

B

A

Fig. 3. (A) The 95% CIs that are computed on normal distribution and onthe total sample size, for the self-similarity metrics s(G) and S(G) proposed bythe authors of ref. 15 (computations were conducted on the set H = {BA, RL,LL}, according to Eq. 2). (B) The 99% CIs for S(G), which were computed onStudent’s distribution and on dynamic sample size (lighter in color linesrepresent the lower and the upper bounds of the CIs). Values at the hori-zontal axis express the sample sets on which the CIs are each time calculated,where X1 = {G1, G2, G3,. . ., G44} and Xi = Xi–1 – {Gi}, with i = 2,3,. . .41 (detailsabout null models Gi can be found in SI Appendix, Table S1).

6704 | www.pnas.org/cgi/doi/10.1073/pnas.1816842116 Tsiotas

Dow

nloa

ded

by g

uest

on

May

21,

202

0

new edge will become ddðe1,10Þ= 9=ffiffiffi2

p. This observation raises the

question about whether it is convenient to use the DD metric forpattern recognition.A first answer can be given from the study of the programming

codes available from ref. 20, where it can be observed that thenode labeling in null models expresses the node age, namely, thestep at which each node was created by the generator algorithmduring the stepwise procedure of the model’s construction (ob-viously, in this procedure, all nodes have different ages). There-fore, pattern recognition using the DD metric is possible, when thenode ages in the network are known and thus when the nodes arelabeled according to their age, provided that all ages are different(if not, their ordering suggests a topic of further research). Further,in Fig. 1, we can observe that node labeling according to node agecan produce configurations in the adjacency matrices that arerepresentative of network topologies. This is evident by the cor-respondences that can be made between the topological layouts(Fig. 1A) and the spy plots (Fig. 1B), where we can observe thatcores in the layouts correspond to dot concentrations in the matrixcorners. Especially for the LL topology, we can observe that thenode torus in the layout corresponds to the diagonal strap inthe adjacency matrix. It should be noted that applying a layout inthe open-source software used in the analysis does not affect thenode placement in the adjacency matrix.However, in most of the cases, the node age is not known for

real-world networks. Therefore, to repair the sensitivity of DD tonode reordering, this measure has to be calculated after defininga standard relabeling of the network nodes. Such a relabeling canbe achieved under the control of a node attribute (degree, localclustering, betweenness centrality, closeness centrality, etc.), namely,by using as new labels the rank (either ascending or descending) ofthe nodes according to a standard attribute. Through a trial anderror approach, a successful relabeling can be achieved under thecontrol (a descending order was chosen) of the eigenvector

centrality (CE) (see ref. 11), which is a spectral measure computed onthe eigenvectors of the adjacency matrix. The results of the relabelingunder the control of CE are shown in Fig. 4, where distinct topologiesemerge among the three network types and, also, nonoverlaid CIsare produced for their respective DDs. Some avenues for furtherresearch in this topic can be the examination of other orderingchoices (i.e., to compute the DD under the control of other nodeattributes) and also to seek consistently for minimum or maximumpossible values of the DD.The DD is subjected to the same testing as the metric of

ref. 15, to comparatively evaluate its capability to discriminateamong the three network types BA, RL, and LL. The results ofthe analysis are shown in Fig. 5, where scores of the DD withinthe interval [0.484, 0.510] correspond (with 95% certainty) to theBA topology, scores within the interval [0.437, 0.476] correspondto the RL topology, and scores within the interval [0.291, 0.352]correspond to the LL topology. The analysis of DD on the dynamicsample size shows that, for almost 15% of the available samples(6 out of 41), it is not possible to discriminate between the nullmodels BA and RL. This is especially observed for samples in-cluding the smallest networks (X1, X2, and X3), and for the leastsamples (X39, X40, and X41), although the latter case is subjected touncertainty due to small sample sizes.According to the previous analysis, the proposed metric can

discriminate for more samples between the available SF networktypes than the metric of ref. 15. Within this framework, thesecond answer that can be given about the ability of the metricDD to be used for pattern recognition is also positive, and it isbased on the node reordering under the control of the CE.Overall, the proposed metric DD is shown to be useful for thedetection of the SF topology produced by the BA preferentialattachment growth model. The DD provides a comparable to theS(G) metric performance alongside providing advantages in itsspectral definition (it measures the concentration of nonzero elementsto the main diagonal of the adjacency matrix), in its comparison-free

A1 A2

A3

DD[BA]=0.4809

Fig. 4. Spy plots of the adjacency matrices for (a1) a BA, (a2) an RL, and (a3)an LL undirected graph, with n = 4,981 and m = 7,469, which were relabeledunder the control of the CE (i.e., by using as new labels the descending rankof the nodes according to CE). All graphs have the same degree distributionp(k), which follows a PL pattern (see SI Appendix, Fig. S1).

A

B

Fig. 5. (A) The 95% CIs for the proposed measure of DD, which arecomputed on normal distribution and for the total sample size. (B) The99% CIs for DD, which were computed on Student’s distribution and ondynamic sample size (lighter in color lines represent the lower and theupper bound borders of the CIs). Values at the horizontal axis express thesample sets on which the CIs are each time calculated, where X1 = {G1, G2,G3,. . ., G44} and Xi = Xi–1 – {Gi}, with i = 2,3,. . .41 (details about null modelsGi can be found in SI Appendix, Table S1).

Tsiotas PNAS | April 2, 2019 | vol. 116 | no. 14 | 6705

STATIST

ICS

Dow

nloa

ded

by g

uest

on

May

21,

202

0

definition [since the S(G) metric is defined by the maximum valueextracted from a set of SF topologies], in its degree-free configu-ration (it is not subjected to the constraint of being defined by onenetwork measure), and in the ease of implementing into code.

4. ConclusionsThis paper highlighted that not all networks with a PL degreedistribution arise through a BA preferential attachment growthprocess, a fact that, although evident from the literature, is oftenoverlooked by many researchers. The analysis showed, withsimulations, that classic measures of network topology do notsucceed in discriminating between BA networks and other (RLand LL) SF networks with the same degree distribution.Toward this direction, an existing self-similarity metric S(G)

was examined, which was proposed for the detection of the SF prop-erty, to evaluate the capability of this metric to discriminate among thethree available topologies (BA, RL, and LL) with the same degreedistribution. The analysis showed that S(G) is capable of producingdistinct results at 95% confidence level, but, when submitted to adynamic 99% CI (based on the Student’s distribution) analysis, aninconsistency was observed for samples including bigger networks.Within this context, this paper proposed a spectral metric

(diagonal distance, DD) defined as the average distance of the

nonzero elements from the main diagonal of a network’s adja-cency matrix. The proposed measure was submitted to the sameevaluation as the existing SF metric, and it was found to be alsocapable of discriminating among the available SF topologies butwas more consistent than the S(G) in terms of the network size.Also, the analysis pointed to some avenues for further research.Some indicative directions are to examine the effects on diago-nal distance by changing the algorithms’ defaults, to increaseprecision by including bigger sample sizes, to compute DD underother ordering choices or by considering its minimum or maxi-mum values, to examine changes in DD on LL models generatedby reordered arrangements of the same adjacency, and to de-tect differences in DD between the BA and the SF superstartopologies.Overall, this paper highlighted the difference between BA and

SF networks, provided insights about differences in the topologyof the SF networks, examined the potential of established mea-sures to distinguish among different topologies of SF networkswith the same distribution, and introduced a metric for patternrecognition among the members of SF networks family.

ACKNOWLEDGMENTS. I thank the two anonymous reviewers for theirvaluable comments that significantly improved the quality of the paper.

1. Albert R, Barabasi A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:1–47.

2. Barabasi A-L, Albert R (1999) Emergence of scaling in randomnetworks. Science 286:509–512.3. Goldstein ML, Morris SA, Yena GG (2004) Problems with fitting to the power-law

distribution. Eur Phys J B 41:255–258.4. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang D-U (2006) Complex networks:

Structure and dynamics. Phys Rep 424:175–308.5. Easley D, Kleinberg J (2010) Networks, Crowds, and Markets: Reasoning About a

Highly Connected World (Cambridge Univ Press, Oxford).6. Barthelemy M (2011) Spatial networks. Phys Rep 499:1–101.7. Choromanski K, Matuszak M, Miekisz J (2013) Scale-free graph with preferential at-

tachment and evolving internal vertex structure. J Stat Phys 151:1175–1183.8. Song C, Havlin S, Makse HA (2005) Self-similarity of complex networks. Nature 433:

392–395.9. Bianconi G, Barabasi A-L (2001) Competition and multiscaling in evolving networks.

Europhys Lett 54:436–442.10. Tsiotas D, Polyzos S (2017) The complexity in the study of spatial networks: An epis-

temological approach. Netw Spat Econ 18:1–32.11. Newman MEJ (2010) Networks: An Introduction (Oxford Univ Press, Oxford).12. Dorogovtsev SN, Mendes JFF, Samukhin AN (2000) Structure of growing networks

with preferential linking. Phys Rev Lett 85:4633–4636.13. Small M, Li Y, Stemler T, Judd K (2015) Growing optimal scale-free networks via

likelihood. Phys Rev E Stat Nonlin Soft Matter Phys 91:042801.14. Grisi-Filho JHH, Ossada R, Ferreira F, Amaku M (2013) Scale-free networks with the

same degree distribution: Different structural properties. Phys Res Int 2013:1–9.

15. Li L, Alderson D, Doyle JC, Willinger W (2005) Towards a theory of scale-free graphs:Definition, properties, and implications. Internet Math 2:431–523.

16. Broido AD, Clauset A (2018) Scale-free networks are rare. arXiv:1801.03400v1.17. Bastian M, Heymann S, Jacomy M (2009) Gephi: An open source software for ex-

ploring and manipulating networks. Proceedings of the Third International ICWSMConference (AAAI Press, Menlo Park, CA), pp 361–362.

18. Maslov S, Sneppen K (2002) Specificity and stability in topology of protein networks.Science 296:910–913.

19. Sporns O, Kötter R (2004) Motifs in brain networks. PLoS Biol 2:e369.20. Brain Connectivity Toolbox (2018) Network models. Available at https://sites.google.

com/site/bctnet/null. Accessed Septemper 28, 2018.21. Fruchterman TM, Reingold EM (1991) Graph drawing by force-directed placement.

Software Pract Exper 21:1129–1164.22. Yan S, et al. (2007) Graph embedding and extensions: A general framework for di-

mensionality reduction. IEEE Trans Pattern Anal Mach Intell 29:40–51.23. Bollobas B, Riordan O (2004) The diameter of a scale-free random graph.

Combinatorica 24:5–34.24. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl

Acad Sci USA 103:8577–8582.25. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature

393:440–442.26. Walpole RE, Myers RH, Myers SL, Ye K (2012) Probability & Statistics for Engineers &

Scientists (Prentice Hall, New York), 9th Ed.27. Marwan N, Romano MC, Thiel M, Kurths J (2007) Recurrence plots for the analysis of

complex systems. Phys Rep 438:237–329.

6706 | www.pnas.org/cgi/doi/10.1073/pnas.1816842116 Tsiotas

Dow

nloa

ded

by g

uest

on

May

21,

202

0