arxiv:1803.04225v1 [cs.si] 12 mar 2018 · european and u.s. workshop agendas and published special...

14
Analyzing the network structure and gender differences among the members of the Networked Knowledge Organization Systems (NKOS) community Fariba Karimi, Philipp Mayr and Fakhri Momeni GESIS – Leibniz Institute for the Social Sciences, Unter Sachsenhausen 6-8 50667 Cologne, Germany [email protected] Abstract. In this paper, we analyze a major part of the research output of the Networked Knowledge Organization Systems (NKOS) community in the period 2000 to 2016 from a network analytical perspective. We fo- cus on the papers presented at the European and U.S. NKOS workshops and in addition four special issues on NKOS in the last 16 years. For this purpose, we have generated an open dataset, the ”NKOS bibliography” which covers the bibliographic information of the research output. We analyze the co-authorship network of this community which results in 123 papers with a sum of 256 distinct authors. We use standard network analytic measures such as degree, betweenness and closeness centrality to describe the co-authorship network of the NKOS dataset. First, we investigate global properties of the network over time. Second, we an- alyze the centrality of the authors in the NKOS network. Lastly, we investigate gender differences in collaboration behavior in this commu- nity. Our results show that apart from differences in centrality measures of the scholars, they have higher tendency to collaborate with those in the same institution or the same geographic proximity. We also find that homophily is higher among women in this community. Apart from small differences in closeness and clustering among men and women, we do not find any significant dissimilarities with respect to other centralities. Keywords: NKOS workshops, Network analysis, Co-authorship net- works, gender, homophily 1 Introduction The Networked Knowledge Organization Systems (NKOS) 1 community in Eu- rope and in the United States of America has held a long-running series of annual workshops at the European Conference on Digital Libraries (ECDL), latterly renamed as the International Conference on Theory and Practice of 1 For an introduction of KOS and NKOS and recent applications see [8,14]. arXiv:1803.04225v1 [cs.SI] 12 Mar 2018

Upload: others

Post on 18-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Analyzing the network structure and genderdifferences among the members of the

Networked Knowledge Organization Systems(NKOS) community

Fariba Karimi, Philipp Mayr and Fakhri Momeni

GESIS – Leibniz Institute for the Social Sciences,Unter Sachsenhausen 6-850667 Cologne, Germany

[email protected]

Abstract. In this paper, we analyze a major part of the research outputof the Networked Knowledge Organization Systems (NKOS) communityin the period 2000 to 2016 from a network analytical perspective. We fo-cus on the papers presented at the European and U.S. NKOS workshopsand in addition four special issues on NKOS in the last 16 years. For thispurpose, we have generated an open dataset, the ”NKOS bibliography”which covers the bibliographic information of the research output. Weanalyze the co-authorship network of this community which results in123 papers with a sum of 256 distinct authors. We use standard networkanalytic measures such as degree, betweenness and closeness centralityto describe the co-authorship network of the NKOS dataset. First, weinvestigate global properties of the network over time. Second, we an-alyze the centrality of the authors in the NKOS network. Lastly, weinvestigate gender differences in collaboration behavior in this commu-nity. Our results show that apart from differences in centrality measuresof the scholars, they have higher tendency to collaborate with those inthe same institution or the same geographic proximity. We also find thathomophily is higher among women in this community. Apart from smalldifferences in closeness and clustering among men and women, we do notfind any significant dissimilarities with respect to other centralities.

Keywords: NKOS workshops, Network analysis, Co-authorship net-works, gender, homophily

1 Introduction

The Networked Knowledge Organization Systems (NKOS)1 community in Eu-rope and in the United States of America has held a long-running series ofannual workshops at the European Conference on Digital Libraries (ECDL),latterly renamed as the International Conference on Theory and Practice of

1 For an introduction of KOS and NKOS and recent applications see [8,14].

arX

iv:1

803.

0422

5v1

[cs

.SI]

12

Mar

201

8

Page 2: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

2 Fariba Karimi, Philipp Mayr and Fakhri Momeni

Digital Libraries (TPDL), the Joint Conference on Digital Libraries (JCDL) andsome other scattered events. The NKOS workshops in the U.S. have started in1997/1998 organized by Linda Hill, Gail Hodge, Ron Davies and others. Slightlylater, the first NKOS workshop was organized in Europe at ECDL 2000 in Lis-bon (Portugal) by Martin Doerr, Traugott Koch, Douglas Tudhope and Repkede Vries.

Typically, recent advances in Knowledge Organization Systems (KOS) havebeen reported at the annual NKOS workshops, e.g. including the Simple Knowl-edge Organization System (SKOS) W3C standard, the ISO 25964 thesauri stan-dard, the CIDOC Conceptual Reference Model (CRM), Linked Data applica-tions, KOS-based recommender systems, KOS mapping techniques, KOS reg-istries and metadata, social tagging, user-centered issues, and many other top-ics2. Special issues on Networked Knowledge Organization Systems have beenpublished in Journal of Digital Information in 2001 [8] and 2004 [24], in NewReview of Hypermedia and Multimedia in 2006 [25] and recently in the Interna-tional Journal of Digital Libraries in 2016 [14]. Recently, the NKOS workshopactivities have accelerated again e.g. with two European NKOS in 2016 at theTPDL and Dublin Core conference and a revival of the U.S. NKOS activities in2017. In addition, the last two NKOS workshops at TPDL have resulted in formalconference proceedings published as CEUR Workshop Proceedings [15,16].

The motivation of this paper is to analyze and visualize the collaborationnetwork of the NKOS community. We are focusing here on the informal part ofthis output, the paper presentations given at the past NKOS workshops. Thespecialty of this research output is that these research papers typically are notpublished in journals or conference proceedings. These papers appear just asoral presentations at the workshop and are documented on the correspondingwebsites. To cover this informal research output, we have collected presentationinformation from the workshop agendas. To analyze the co-authorship network ofthis community, we restrict our analysis to papers which have been authored bya minimum of two authors. This results in 123 papers with a sum of 256 distinctauthors. It is important to note that practices at the NKOS workshops in theUnited States and Europe are different. In the United States, NKOS workshopswere previously not based on an open call for papers contribution type, butrather via inviting speakers. This practice explains the relatively low ratio ofco-authorship in the U.S. workshop series. From the beginning, in Europe, theNKOS workshops were based on accepting academic papers and resulted in anopen call for papers and subsequent peer review of submitted paper abstracts.

In the following, we report about the network structure and gender differencesamong the members of the NKOS community as we could recall from the pastEuropean and U.S. workshop agendas and published special issues.

This paper is largely an extended version of the paper ”Analyzing the re-search output presented at European Networked Knowledge Organization Sys-tems workshops (2000-2015)” [18] presented at the 15th NKOS workshop at

2 Comprehensive review articles on KOS and NKOS topics have been published in[26,9].

Page 3: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 3

TPDL 2016. In [18], we focused on the European workshops and special issues.Meanwhile, we have extended the dataset and included the U.S. NKOS work-shops and some other scattered NKOS events. So, this paper is able to give amore comprehensive overview of the international NKOS research community.To the best of our knowledge, this paper is the first attempt to analyze theco-authorship network of NKOS in great details.

In the following sections we describe the underlying dataset (section 2), weperform network analysis (section 3), highlight some results of our analysis (sec-tion 4) and conclude our paper (section 5).

2 NKOS workshop bibliography dataset

For our analysis, we have compiled an open dataset derived from the ”NKOSbibliography”3. The NKOS bibliography has been started in 2016 [18] and cov-ers bibliographic information of all research papers presented at the past NKOSworkshops. Editing, organizing activities (incl. the introductions) at the work-shops have not been covered in our dataset. Journal papers published in fourspecial issues on NKOS which have been edited by members of the NKOS com-munity in the same period have been added. These journal papers are the onlyformal publications in our analysis. In the end, we manually disambiguate au-thor names of all papers. The bibliography is stored in single bibtex files (onebibtex file for each venue).

To this date, the NKOS bibliography covers:

– sixteen European NKOS workshops from 2000 to 2016. In total 16 workshopagendas: ECDL 2000, 2003-2010, TPDL 2011-2016, Dublin Core 2016,

– eight U.S. NKOS workshop agendas: JCDL 2000-2003, 2005 and NKOS-CENDI 2008-2009, 2012,

– four special issues on NKOS [8,24,25,14] and– two scattered NKOS workshops at ISKO-UK 2011 and ICADL 2015.

For the analysis in this paper, we have compiled all research presentationsat NKOS workshops and papers published in special issues. We restrict ouranalysis to papers which have been authored by a minimum of two authors. Thisrestriction reduces the content of the dataset, e.g. the ECDL NKOS workshopfrom 2000 is missing in Table 1 because all papers were single author papers. Intotal, this results in a dataset of 123 papers with a sum of 256 distinct authors(see Table 1)4.

3 Network analysis of the NKOS community

In order to analyze the collaboration of the NKOS community, we build a net-work of all authors at the workshops and special issues and compute various

3 The NKOS workshop bibliography is maintained in the following github repository:https://github.com/PhilippMayr/NKOS-bibliography.

4 The data for this subset is available under https://github.com/PhilippMayr/NKOS-bibliography/tree/master/publications/ijdl17

Page 4: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

4 Fariba Karimi, Philipp Mayr and Fakhri Momeni

year nr. papers nr. authors nr. links avg. clustering

2001 4 9 6 0.37

2002 3 10 13 0.8

2003 5 12 9 0.4

2004 13 39 47 0.65

2005 7 22 26 0.81

2006 11 33 39 0.73

2007 4 15 24 1.0

2008 7 15 9 0.2

2009 10 34 60 0.68

2010 8 21 19 0.61

2011 8 32 59 0.80

2012 6 26 56 0.92

2013 5 18 31 0.86

2014 6 16 13 0.85

2015 9 24 23 0.58

2016 17 60 114 0.75

Table 1: Overview of all NKOS papers sorted by years. In general, communityshows a high average clustering in many years indicating that there are manytriangles in the network.

centrality measures for each author. A link in this network represents two au-thors who wrote a paper together. Therefore, if we have np number of papersand a paper i has mi authors, the total number of pairs (links) E are

E =

np∑i=1

mi(mi − 1)

2if mi ≥ 1 (1)

If two authors have published more than one paper together, we give weightsto the link equivalent to the number of times they have collaborated in differentpapers. Thus, the resulting network is a weighted undirected graph.

In this paper, first, we investigate global properties of the network over time.Second, we analyze the centrality of the authors in this network. Lastly, weinvestigate gender differences in collaboration behavior in this community.

4 Results

Figure 1 demonstrates the overall NKOS co-authorship network. In this view,each author has at least one co-author. The node color represents the gender;purple for men and orange for women. This network contains 44 components.From the network illustrated in this figure, we selected the largest componentthat is represented in Figure 3. 107 authors (41% of all authors) are connected inthis component. The NKOS co-authorship network in the ”NKOS bibliography”is a typical co-authorship network with one relatively large component, somesmaller components and many isolated co-authorships or triples.

Page 5: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 5

Figure 2 shows the degree distribution for this network. Despite being arather small network, the degree distribution follows a similar trend as a power-law degree distribution that has been observed in other co-authorship networks[1,11].

Fig. 1: Co-authorship network of the NKOS community. In general, the networkis sparse and contains 44 isolated components. The largest connected component(the cluster in the middle) contains 107 number of nodes. Nodes are colored basedon their gender. Purple nodes are men and orange nodes are women.

In Figure 3, the largest connected component, we see that scientists tend toforge intra-institutional collaborations [6]. Good examples are the clusters fromJohannes Keizer (FAO), Antoine Isaac (Vrije Universiteit Amsterdam/Europeana)and Philipp Mayr (GESIS). A large fraction of their co-authors are affiliated withthe same institution. Also a tendency to select those co-authors who are in ge-ographic proximity is visible in figure 3. E.g. Douglas Tudhope (University ofSouth Wales, UK) has a larger fraction of UK-affiliated co-authors.

4.1 Node centralities

To detect the influence of authors on information exchange we calculate vari-ous measures of centrality namely degree centrality, betweenness centrality andcloseness centrality of the authors. Here we only focus on the largest connectedcomponent (LCC) in order to have a robust comparison.

Page 6: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

6 Fariba Karimi, Philipp Mayr and Fakhri Momeni

100 101

degree (k)

10−3

10−2

10−1

100

p(k)

(women) exponent: 2.80(men) exponent: 3.39

Fig. 2: Degree distribution of the NKOS network. Blue and orange colors indicatethe distribution for men and women respectively. Although the network is small,it exhibits power-law degree distribution.

Degree centrality is the most straightforward measure of centrality that de-picts the importance of nodes in terms of total number of unique links. Theauthors with high degree centrality have established a wide collaboration withmany different scholars.

Betweenness centrality indicates fraction of shortest paths between all pairsof nodes that pass through a node. The betweenness of a node indicates thenode’s ability to funnel the flow in the network [20]. In this network, the authorwith a high betweenness has a large influence in transferring the informationfrom one part of the network to another.

Closeness centrality indicates how close scholars are from others. Mathemat-ically, it is sum of all the shortest paths between a node to all other nodes [7].If a shortest path between node u to v is d(u, v) and the total number of nodesin the graph is denoted by N , closeness centrality of the node u is defined asfollows:

c(u) =N − 1∑N−1

v d(u, v)(2)

where N − 1 in the nominator normalizes the measure so that it becomessize independent. Scholars with high closeness centrality are on average closerto other nodes in the network.

Figure 4 shows the comparison of centrality measures for top 15 authors inthe largest connected component. It is interesting to note that authors centralityranks may vary depending on the type of the centrality measures. For example,even though H. Manguinhas has relatively high degree centrality, this author

Page 7: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 7

Tom Miles

Antoine Isaac

Gudrun Johannsen

Janine Rigby

Johannes Keizer

Valentine Charles

Anita Liang

Hella Moller Rasmussen

Katy Newton

Aude Lima

Traugott Koch

Dimitra Atsidis

Jutta Lindenthal

Timothy Tolle

Margherita Sini

James Terwilliger

H. Lundbeck

Hugo Manguinhas

Linda Hill

Diana Massam

Christian Wartena

Sue Ellen Wright

Boris Lauser

Vivien Petras

Sascha Schüller

Sachit Rajbhandari

Imma Subirats

Jose Borbinha

Fakhri Momeni

Judith Logan

Vladimir Alexiev

Hsiung-ming Liaw

Ceri Binding

Koraljka Golub

Michael Khoo

Gail Hodge

Jian Qin

George Buchanan

Vanda Broughton

Jacco van Ossenbruggen

Jan Wielemaker

Marcia Lei Zeng

Agis Papantoniou

Xia Lin

Sammy Davidson

Janet Ormes

Johan De Smedt

Hong Mei Cristina Recinella

Mathew J. Weaver

Anne-Kathrin Walter

Frehiwot Fisseha

Laura Pusterla

Armando Stellato

Lillian Cassel

Aida Slavic

Maarten BrinkerinkAriane N{'{e}}roulidis

Andy Priest

Julaine Clunis

Stella Dextre Clarke

Guus Schreiber

Sergiu Gordea

Peter Mutschke

Philipp Schaer

Christopher Jones

Patrick Healey

Howard D. White

Victor de Boer

Caroline Williams Michiel Hildebrand

Thomas Lüke

Susan PriceMarianne Lykke Nielsen

Emlyn Everitt

Brigitte Mathiak

Claudio Gnoli

Hilary Jones

Zeljko Carevic

Ya-ning Chen

Ahsan Morshed

Anna Bendiscioli

Ernesto William De Luca

Karen F. Gracy

Han-wei Yen

Sophy Shu-Jiun Chen

Douglas Tudhope

Murtha Baca

Stephan Stipdonk

Terence R. Smith

Harith Alani

Jan Buzydlowski

Yves Jacques

Wilko van Hoek

York Sure

Philipp Mayr

Dagobert Soergel

Veronique Ginouves

Stephen Katz

Jean Delahousse

Heike Neuroth

Hsueh-hua ChenJae-wook Ahn

Benjamin Zapilko

Caterina Caracciolo

Anna Eslau

Lois Delcambre

Fig. 3: Largest component in the NKOS co-authorship network. The network isclustered into 9 clusters using Louvain clustering method [2]. Nodes are coloredbased on their cluster and size of the node represents node’s degree. Clusters areshaped based on the location of the groups and collaboration among members.Majority of the scholars in the largest component are based in Europe.

Page 8: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

8 Fariba Karimi, Philipp Mayr and Fakhri Momeni

0 5 10 15 20Degree

Maarten BrinkerinkAriane N{\'{e}}roulidis

Sergiu GordeaValentine Charles

Veronique GinouvesMichiel HildebrandHugo Manguinhas

Marcia Lei ZengXia Lin

Marianne Lykke NielsenDagobert Soergel

Johannes KeizerAntoine Isaac

Douglas TudhopePhilipp Mayr

101010101010

11131313

1516

172323

(a)

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08Betweenness centrality

Traugott KochLois Delcambre

Stella Dextre ClarkeSophy Shu-Jiun Chen

Claudio GnoliAida Slavic

Xia LinDagobert Soergel

Johannes KeizerMarianne Lykke Nielsen

Douglas TudhopeVladimir Alexiev

Antoine IsaacPhilipp Mayr

Marcia Lei Zeng

0.00650.0065

0.00950.0095

0.01270.01560.0166

0.02030.0238

0.02760.0571

0.05890.0593

0.06390.0855

(b)

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16Closeness centrality

Christian WartenaErnesto William De Luca

Vanda BroughtonGeorge Buchanan

Emlyn EverittMarianne Lykke Nielsen

Vladimir AlexievJohannes Keizer

Koraljka GolubDagobert Soergel

Stella Dextre ClarkeXia Lin

Philipp MayrMarcia Lei Zeng

Douglas Tudhope

0.13000.13000.13070.13070.1307

0.13430.13730.13770.13810.1408

0.14940.1552

0.16630.16750.1675

(c)

Fig. 4: Top 15 authors with the highest (a) Degree centrality, (b) betweennesscentrality and (c) closeness centrality.

Page 9: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 9

does not appear in the top closeness or betweenness rank. A closer look at theauthor’s location in the graph 3 shows that this author is embedded in the lightgreen cluster with high clustering and few connectivity with other clusters.

Comparing closeness centrality and betweenness centrality also shows inter-esting results. Although some authors have a high closeness to other scholars,they may not have high betweenness centrality. For example, K. Golub has arelatively high closeness centrality due to special location of the author in con-nection with many other authors from different clusters. However, this authordoes not have a relatively high betweenness centrality because her network po-sition does not allow to connect other further distanced clusters. In contrast,author A. Slavic does not have a high degree or a high closeness centrality, butthis author has a high betweenness centrality due to connecting an almost iso-lated red cluster to the rest of the network. The same is true for T. Koch. Itis important to note that while scholars with higher closeness centrality are onaverage closer to other scholars and thus can access novel ideas more frequently,authors with high betweenness centrality play a crucial role in transferring theknowledge in the community [10].

4.2 Structural holes and bridges

Weak ties play a crucial role in networks by connecting disconnected clustersand act as bridges in networks. Structural hole idea first coined by sociologistRonald Burt, suggests that nodes can act as a mediator between two or moreclosely connected clusters. This is in particular important since novel ideas orinformation need to pass from these gatekeepers to transfer to other parts of thenetwork. Here, we measure the effective size of a node based on the concept ofredundancy. A person's ego network has redundancy to the extent to which herneighbors are connected to each other as well. In a simple graph, the effectivesize of a node u, e(u), can be expressed as:

e(u) = n− 2t

n(3)

Where t is the number of the total ties in the egocentric network (excluding thoseties to the ego) and n is the number of total nodes in the egocentric network(excluding the ego). The effective size can vary from 1 to the total number oflinks in the ego [3]. The higher the effective size, the more effective a node is interms of being a bridge.

Figure 5 displays the top 15 ranked authors with respect to their effective size.The ranking suggests that in this community, nodes with high degree (hubs) alsoact as bridges between the clusters, thus, they can transfer novel ideas amongtheir peers.

4.3 Gender differences in the co-authorship network

To infer the gender of the scholars, we use the state-of-the-art approach bycombining the results of the first names and Google images of the scholars with

Page 10: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

10 Fariba Karimi, Philipp Mayr and Fakhri Momeni

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Effective size

Gudrun JohannsenClaudio Gnoli

Ahsan MorshedSophy Shu-Jiun Chen

Ceri BindingKoraljka Golub

Lois DelcambreXia Lin

Marianne Lykke NielsenAntoine Isaac

Johannes KeizerMarcia Lei Zeng

Dagobert SoergelDouglas Tudhope

Philipp Mayr

4.04.04.04.0

5.05.05.0

10.010.0

11.011.0

12.012.0

19.021.0

Fig. 5: Top 15 scholars with the highest effective size. The effective size indicatesthe ability of a node to connect otherwise disconnected nodes and therefore thenode can act as a weak tie or bridge.

their full names [13]. For the remaining unidentified names or names with initials,we manually check the author’s online profile based on the title of their papers.Our complete network consists of 97 (38%) women and 157 (62%) number ofmen and 2 unidentified names. Compared to other scientific communities andin particular in science and engineering fields, this community shows a higherpercentage of active women [11]. The share of women and men in the largestconnected component also shows an interesting effect. We find 46 women and 59men in the LCC which means women occupy 43% of the nodes in this component.

Homophily. In the first step, we measure homophily in this network. There arevarious ways to define homophily. Here, we use two well-defined measures. Firstmeasure of homophily is proposed by Newman that computes the Pearson corre-lation between attributes when corrected by what we would expect from node’sdegree [19]. The homophily varies between -1 (disassortativity) to +1 (completeassortativity). We find that gender assortativity in this community is 0.1. Thismeans that there is a positive tendency among scholars in this community tocollaborate with similar gender. One can observe the gender homophily fromfigure 1.

Although the assortativity measure captures the overall homophily in thenetwork, it does not provide additional insights whether or not the nature ofhomophily is symmetric or asymmetric. Indeed, we have shown previously thatasymmetric homophily can impact the degree centrality of the nodes and inparticular minority group in networks [12]. To capture the asymmetric natureof the homophily, we take a simple approach first proposed by Coleman (1958).In this case we measure the probability of links that exist between two scholarsof the same gender. Let us denote the probability of links that exist among

Page 11: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 11

0 5 10 15 20

F

M

degree

0.00 0.25 0.50 0.75 1.00

F

M

clustering

0.00 0.02 0.04 0.06 0.08

F

M

betweenness

0.075 0.100 0.125 0.150

F

M

closeness

0 5 10 15 20

F

M

effective size

0 20 40 60

F

M

strength

Fig. 6: Box plots indicating median and quartiles of network properties for maleand female scholars in the largest connected component. Median is similar formajority of the node characteristics except for closeness centrality that is higherfor men. With regards to degree centrality there are more outliers among menwith high degree. For clustering, women have higher clustering on average thanmen. Men also show outliers with higher effective size and strength compared towomen.

women as pww and among men as pmm. To compare groups of different sizes,the probabilities are compared with group sizes and normalized by the maximumvalues. If the fraction of women is denoted by fw and men by fm, the Colemanindex for women is:

Cw =pww − fw

1 − fw(4)

Similar definition will apply for men. The maximum value for Coleman ho-mophily index is 1. When applying this index to our network we get Cw = −0.12for women and Cm = −0.42 for men. These results suggest that the homophilyamong women is higher than the homophily among men in this network. Similarfindings were also found in other co-authorship networks [11].

Network characteristics and gender differences. Next, we measure the networkcharacteristics among men and women in the largest connected component. Weuse six measures of networks similar to the previous section. We also includestrength of the node as the sum of all wighted links.

Figure 6 shows box plots comparing network measures for men and women.Overall, the median and quartiles for degree and betweenness are the same formen and women. Women show higher tendency for higher clustering comparedto men. Men show higher median for closeness centrality compared to women. In

Page 12: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

12 Fariba Karimi, Philipp Mayr and Fakhri Momeni

addition, there is a higher number of outliers among men in terms of the degree,effective size and strength compared to women.

5 Conclusion

In this paper, we have analyzed the collaborative research of authors and theirconnectivity for the special case of NKOS workshop activities including fourspecial issues on NKOS. The results highlight the most active and central schol-ars in this community. We found differences among centrality measures of thescholars which indicate that scholars play a different role in their collaborationnetwork. We also found the most influential scholars who act as bridges betweenthe clusters. We found 9 clusters in the largest component that show schol-ars have higher tendency to collaborate with those in the same institution orthe same geographic proximity [6]. Our analyses show that NKOS communityis rather successful in bringing researchers from different domains together inrecent years.

NKOS co-authorship network consists of 38% women in total, and the shareof women in the largest connected component is 43%. The network shows positivegender homophily and the homophily among women is higher compared to men.We found on average men have higher closeness centrality compared to women.In addition, women have slightly higher clustering compared to men. Apart fromthese differences, we do not find any significant dissimilarities between men andwomen with respect to their centralities.

This study has some limitations. First of all, we have included just researchpaper presentations. Editing and organizing activities at the workshops, whichhave an enormous impact on the visibility and connectivity of researchers, havenot been covered in our dataset. This leads to artifacts, e.g. Traugott Koch,5 along-term organizer of the NKOS workshops and editor of the early JoDI specialissues on NKOS, is not covered very well in our dataset and the network.

Second, many influential papers (e.g. [9,26]) and standardization activities(e.g. the W3C Recommendation for SKOS [17]), presented and discussed atNKOS events and published after the NKOS workshops are missing. This factis reducing the representativeness and completeness of the network.

Third, we have not included bibliometric data to complete our analysis. Thisis because most of the NKOS workshop activities (presentations) are not formallycited or even mentioned in scientific papers. In difference to the workshop output,the few journal papers in the special issues on NKOS are cited. Some works (e.g.[4,5,23,21,22]) are cited well in the literature. So adding citation data would bea next reasonable step to complete the dataset.

5 Traugott Koch was an central protagonist and networker of the U.S. and EuropeanNKOS community. He retired and left the NKOS community in 2012.

Page 13: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

Network structure of the NKOS community 13

6 Future work

We are planning to extend the analysis of the NKOS network. In this way, wefirst plan to complement the dataset with other NKOS research output. We alsoplan to analyze the development of topics in the titles and abstracts of the pre-sentations and papers. Combining network analytic measures with bibliometricanalysis (e.g. co-citations, bibliographic coupling) would complement our pre-liminary observations and advance our understanding of the role of gender andother attributes in scientific collaboration. We invite people to contribute to ouropen dataset.

7 Acknowledgment

We thank our colleague Marcia Lei Zeng (Kent State University) who provided uswith internal information about the U.S. NKOS workshops. This work was partlyfunded by DFG, grant no. SU 647/19-1; the ”Opening Scholarly Communicationin the Social Sciences” (OSCOSS) project at GESIS.

References

1. Barabasi, A.L.: Scale-free networks: a decade and beyond. science 325(5939), 412–413 (2009)

2. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of com-munities in large networks. Journal of statistical mechanics: theory and experiment2008(10), P10008 (2008)

3. Burt, R.S.: Structural holes and good ideas. American Journal of Sociology 110(2),349–399 (2004)

4. Cranefield, S.: Networked knowledge representation and exchange using uml andrdf. Journal of Digital Information (2001), https://journals.tdl.org/jodi/

index.php/jodi/article/view/30

5. Doerr, M.: Semantic problems of thesaurus mapping. Journal of Digital Informa-tion (2001), https://journals.tdl.org/jodi/index.php/jodi/article/view/

31

6. Evans, T.S., Lambiotte, R., Panzarasa, P.: Community structure and patterns ofscientific collaboration in Business and Management. Scientometrics 89(1), 381–396 (Oct 2011), http://link.springer.com/10.1007/s11192-011-0439-1

7. Freeman, L.C.: Centrality in social networks conceptual clarification. Social Net-works 1(3), 215–239 (1978)

8. Hill, L., Koch, T.: Networked Knowledge Organization Systems: introduction to aspecial issue. Journal of Digital Information 1(8) (2001), https://journals.tdl.org/jodi/index.php/jodi/article/view/32/33

9. Hodge, G.: Systems of Knowledge Organization for Digital Libraries: BeyondTraditional Authority Files (2000), https://www.clir.org/pubs/reports/pub91/pub91.pdf

10. Iyer, S., Killingback, T., Sundaram, B., Wang, Z.: Attack robustness and centralityof complex networks. PloS one 8(4), e59613 (2013)

Page 14: arXiv:1803.04225v1 [cs.SI] 12 Mar 2018 · European and U.S. workshop agendas and published special issues. This paper is largely an extended version of the paper "Analyzing the re-search

14 Fariba Karimi, Philipp Mayr and Fakhri Momeni

11. Jadidi, M., Karimi, F., Wagner, C.: Gender disparities in science? dropout, pro-ductivity, collaborations and success of male and female computer scientists. arXivpreprint arXiv:1704.05801 (2017)

12. Karimi, F., Genois, M., Wagner, C., Singer, P., Strohmaier, M.: Visibility of mi-norities in social networks. arXiv preprint arXiv:1702.00150 (2017)

13. Karimi, F., Wagner, C., Lemmerich, F., Jadidi, M., Strohmaier, M.: Inferring gen-der from names on the web: A comparative evaluation of gender detection methods.In: Proceedings of the 25th International Conference Companion on World WideWeb. pp. 53–54. International World Wide Web Conferences Steering Committee(2016)

14. Mayr, P., Tudhope, D., Clarke, S.D., Zeng, M.L., Lin, X.: Recent applications ofKnowledge Organization Systems: introduction to a special issue. InternationalJournal on Digital Libraries 17(1), 1–4 (2016), http://link.springer.com/10.

1007/s00799-015-0167-x15. Mayr, P., Tudhope, D., Golub, K., Wartena, C., De Luca, E.W.: Proceedings of the

15th European Networked Knowledge Organization Systems (NKOS) Workshop.CEUR-WS.org (2016), http://ceur-ws.org/Vol-1676/

16. Mayr, P., Tudhope, D., Golub, K., Wartena, C., De Luca, E.W.: Proceedings of the17th European Networked Knowledge Organization Systems (NKOS) Workshop.CEUR-WS.org (2017), http://ceur-ws.org/Vol-1937/

17. Miles, A., Bechhofer, S.: SKOS Simple Knowledge Organization System Reference(2009), https://www.w3.org/TR/skos-reference/

18. Momeni, F., Mayr, P.: Analyzing the research output presented at European Net-worked Knowledge Organization Systems workshops (2000-2015). In: Proc. of the15th European Networked Knowledge Organization Systems Workshop (NKOS2016). pp. 7–14. CEUR-WS.org, Hannover, Germany (2016), http://ceur-ws.

org/Vol-1676/paper1.pdf19. Newman, M.E.: Assortative mixing in networks. Physical review letters 89(20),

208701 (2002)20. Opsahl, T., Agneessens, F., Skvoretz, J.: Node centrality in weighted networks:

Generalizing degree and shortest paths. Social Networks 32(3), 245–251 (2010),http://dx.doi.org/10.1016/j.socnet.2010.03.006

21. Soergel, D., Lauser, B., Liang, A., Fisseha, F., Keizer, J., Katz, S.: Reengineeringthesauri for new applications: the agrovoc example. Journal of Digital Information(2004), https://journals.tdl.org/jodi/index.php/jodi/article/view/112

22. Trant, J., with the participants in the steve.museum project: Exploring the po-tential for social tagging and folksonomy in art museums: Proof of concept. NewReview of Hypermedia and Multimedia (2006), http://www.tandfonline.com/

doi/abs/10.1080/1361456060080294023. Tudhope, D., Alani, H., Jones, C.: Augmenting thesaurus relationships: possibilities

for retrieval. Journal of Digital Information (2001), https://journals.tdl.org/jodi/index.php/jodi/article/view/181/160

24. Tudhope, D., Koch, T.: New Applications of Knowledge Organization Systems:introduction to a special issue. Journal of Digital Information 4(4) (2004), https://journals.tdl.org/jodi/index.php/jodi/article/view/109/108

25. Tudhope, D., Lykke Nielsen, M.: Introduction to Knowledge Organization Systemsand Services. New Review of Hypermedia and Multimedia 12(1), 3–9 (2006)

26. Zeng, M.L., Chan, L.M.: Trends and Issues in Establishing Interoperability AmongKnowledge Organization Systems. Journal of the American Society for InformationScience and Technology 55(3), 377–395 (2004)