bellanca 2009 measuring interdisciplinary research- analysis of co-authorship for research staff at...
TRANSCRIPT
........................................................................................................................................................................................................................................
........................................................................................................................................................................................................................................
Research article
Measuring interdisciplinary research: analysisof co-authorship for research staff at theUniversity of York
Leana Bellanca*
Department of Biology, University of York, Heslington, York, North Yorkshire YO10 5YW, UK.
* Corresponding author: Email: [email protected]
Supervisors: Dr Daniel Franks and Dr Leo Caves, York Centre for Complex Systems Analysis, Department of Biology, PO Box 373, University of York, Heslington, York,North Yorkshire YO10 5YW, UK.
Collaboration allows researchers to combine the strength of different disciplines to undertake research that neither could do individually.
Scientific collaboration can be examined by analysing patterns of co-authorship of papers in publication databases (e.g. Web of Science)
using methods from Social Network Analysis. In this project, I describe three networks consisting of researchers in the Biology and
Chemistry Departments at the University of York to investigate degree, degree distribution, key brokers and preference of researchers
for collaborating within or outside their own research field. Clustering (or transitivity) was used to describe whether collaboration is more
likely if two researchers have a collaborator in common. To introduce a control and realize the significance of the results produced, a
network consisting of 98 researchers from the Chemistry and Biology departments was produced and compared with a distribution
of 1000 ER random graphs for degree, transitivity and betweenness. We find that researchers in the Department of Biology (50 research-
ers) have fewer collaborations with their departmental colleagues than those in the Department of Chemistry (45 researchers): the
average number of links each researcher had with others in the Biology collaboration network was 2.6, the corresponding values for
Chemistry were 4.8 links per researcher. We also find that researchers within the Chemistry department were more likely than their col-
leagues in Biology to collaborate with another researcher if they had a collaborator in common. One aim of the study was to characterize
the extent of interdisciplinary research within the Department of Biology. Staff in the Biology department were categorized into distinct
research foci, indicating the discipline of the researcher. There were many links from the Bioinformatics and Mathematics, and Biophysics
and Biochemistry foci, to other foci, implying that staff within these foci were interdisciplinary in their research—indicative of their role in
providing techniques or tools that are applicable across discipline boundaries. This sort of analysis provides quantitative evidence to
understand the social patterns of scientific collaboration and may be a useful tool in the development of strategies to promote inter-
disciplinary research within research institutions.
Key words: collaboration, social network analysis, degree, random graphs, scientific research.
IntroductionScientific researchers aim to investigate, interpret and revise
knowledge of the world. A group of scientists who work
together may form a social network.1 Collaboration may
be utilized to produce a funding proposal, co-author a scien-
tific paper or share ideas through informal discussion.
Collaboration is often used to undertake interdisciplinary
research. Interdisciplinary research aims to combine the
strengths of several disciplines to create a new discipline
allowing researchers to undertake research that neither
could do independently. Funding bodies sometimes consider
grant proposals from groups of researchers whose research
incorporates an interdisciplinary approach to solve large
complex problems. Interdisciplinary research bridges gaps
in terminology, approach and methodology, generating
new approaches to thinking.2
Interdisciplinary Research
Collaboration allow scientists to pool resources to leverage
the cost of expensive scientific equipment and trained
.........................................................................................................................................................................................................................................
Volume 2 † Number 2 † June 2009 10.1093/biohorizons/hzp012
.........................................................................................................................................................................................................................................
# 2009 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License
(http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any
medium, provided the original work is properly cited. 99
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
specialists. The knowledge-based3 view suggests that finan-
cial resourcing affects how scientific collaborations form.
An organization is likely to develop resources such as exper-
tise and purchase of expensive equipment if the expertise is
used regularly. In comparison, if expertise is expensive to
develop internally, collaboration with another organization
may be beneficial. However, co-ordination costs involved
in multi-centre collaborations between scientists may
present a barrier to interdisciplinary research. For example,
travel costs to attend meetings with overseas collaborators.
Frequent communication between collaborators is often
associated with greater trust, increased output (i.e. scientific
publications) and greater value for money.3 Sharing
resources such as a common website or database between
collaborators may spread the cost of data handling and com-
munication for each investigator and potentially result in
improved, systematic methods and standardized measure-
ments. Collaboration must occur within a work and
reward structure largely focused on individual achievement
and reputation. The number of publications may be a
factor when considering a researcher for promotion.4
Network Theory
A pair of researchers has a relationship if they have collabo-
rated to publish a scientific paper together.5 A graph can be
produced to describe collaborations between many pairs of
researchers using network theory and bibliometrics.
Bibliometrics is the analysis of scientific publication
records. Each researcher in a graph is a node connected by
an edge or link. Each edge can be weighted to indicate the
number of times a researcher pair has collaborated together.
A node can be characterized by their degree (k) and attri-
bute6 with hubs being nodes of high degree. The degree is
the number of links the node has to other nodes, whereas
the attribute is the intrinsic characteristic of the node such
as a researcher’s research interest. The degree distribution
is the number of nodes with a particular degree. A node
can be also be characterized by its betweenness.
Betweenness measures the number of times a researcher is
an intermediary in the path between two researchers.7 A
node with high betweenness may identify a researcher as a
broker, able to initiate or hinder communication flow
through a network.8 Path length is the number of edges
between two researchers.9 Links can be directed or undir-
ected. For example, in a directed network, node X is
parent to node Y. In an undirected graph, links are undir-
ected implying that each node/collaborator is perceived as
equal contributor to the relationship.10
Network Models
The Erdos–Renyi (ER) graph describes a random network
that follows a Poisson distribution.3, 10 That is, most nodes
have an equal number of edges relative to the network’s
average degree. Random graphs have a small mean shortest
path length and a small clustering coefficient. Clustering
coefficient (or transitivity) measures the probability of a
researcher pair collaborating if they have a collaborator in
common. Shortest mean path length means each researcher
can reach every other by a small number of links.11 An
advantage of having a small mean shortest path length is
that communication can flow through a network quickly.
A small world network more accurately reflects a scientific
collaboration network than a random graph.1 Small world
networks have a small mean shortest path length and a clus-
tering coefficient significantly higher than an ER random
graph.9 Thus, in a network graph nodes tend to group
together.
Interdisciplinary Research at the University of York
A recent report by the University of York, Information
Needs for a World Class University identifies information
needs for promotion of interdisciplinary research.12
This report maintains that research at the University of
York is moving towards interdisciplinary collaboration.
Collaboration becomes necessary as the university competes
in the global higher education market. The University of
York identified their current information services as a
barrier to internal and external collaboration. Simpler
access to data and communication between researchers is
needed to facilitate interdisciplinary research. An academic
social networking site, currently being developed by the
TRANSIT initiative13 at the University of York will allow
a researcher to undertake projects and input data into a data-
base without multiple points of entry.
Research Aims
In this project, I describe the scientific collaboration net-
works in the Biology and Chemistry departments at the
University of York. I define scientific collaboration as two
researchers having co-authored a scientific publication
together from 2001 to 2007.8 Three networks were con-
sidered to investigate degree, degree distribution, key
brokers and preference of researchers for collaborating
within or outside their own research field. Clustering (or
transitivity) was used to investigate whether collaboration
is more likely if two researchers have a collaborator in
common and if clustering had any effect on the structure
of the social network of scientists. A network consisting of
98 researchers from the Chemistry and Biology departments
was produced to test the hypothesis that collaboration does
not differ from a random distribution of 1000 ER random
graphs for measures of degree, transitivity and betweenness.
Materials and MethodsPublication data for researchers employed at the University
of York were extracted from records contained in Web of
Science,14 from the Biology Research Support office and
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
100
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
Chemistry department. UCINET,15 a social network analysis
tool, was used to statistically analyse relational data.
Bibexcel16 was used to filter publication data to produce a
.coc file that detailed the number of times a pair of research-
ers had collaborated together. A .coc file was modified
slightly in Microsoft Excel17 to produce a UCINET input
file, .dl. Network analyses considered three networks
separately:
(1) Biology collaboration network;
(2) Chemistry collaboration network;
(3) Biology and Chemistry collaboration network.
Compiling Researcher Lists
A list of 85 Biology, 62 Chemistry department researchers
and a combined list of 147 Biology and Chemistry research-
ers at the University of York was compiled from 2001 to
2007.18, 19 Researchers who did not collaborate with their
departmental colleagues were not included in the network
graphs for the three networks subsequently produced. Fifty
out of 85 researchers collaborated with others in the
Biology department, 45 out of 62 collaborated with others
in the Chemistry department and 98 out of the 147 research-
ers collaborated with others in the Chemistry and Biology
departments. The period 2001–2007 was used because it
was when the last RAE at the University of York was con-
ducted. The three researcher lists for the three networks
included surname, initial of first and sometimes second
name. Researcher lists included independent research
fellows, lecturers and professors and excluded PhD students,
visiting fellows and teaching fellows. For this article,
researchers were designated a number from 1 to 147 for
anonymity.
Biology and Chemistry Publication Data
ISI Web of Science scientific publication database was used14
to mine journal publication records for researchers in the
Biology department. The following exemplifies parameters
used in the search page:
Location: Univ* York
Years: 2001–2007.
Publication data mined from Web of Science for Biology
researchers was found to be inaccurate. Difficulty was
experienced in identifying researchers from the Biology
department at the University of York. On many occasions,
there were a number of researchers with the same initials
and surname employed at the University of York.
Therefore, Biology researcher publication data from the
Biology Research Support Office was used because publi-
cation records had been verified by individual authors,
whereas data from Web of Science had not. Duplicate publi-
cation records were removed from the Biology Research
Support Office data by Endnote20 and data exported to
Bibexcel. Data from the Biology Research Support Office
included publications from journal publications, books, con-
ference proceedings, newspapers, patent, personal communi-
cation, report and generic (all types of publication). It was
thought that including all types of publication may have
affected the number of collaborations between researchers
in the three networks and the frequency at which they
occur (shown by the weight of an edge in a network graph).
Web of Science was not used to analyse the Chemistry col-
laboration network because of inaccuracies found in the data
for the Biology collaboration network. Chemistry publi-
cation data were received from the Chemistry department
unformatted with each publication containing researchers’
names, title, abstract, journal reference, etc. in a Microsoft
Word document. Unformatted data were parsed in Awk21
by Caves22 to produce a Bibexcel output file, .out containing
researcher name and publication number. All types of publi-
cation were included when the Chemistry collaboration
network was produced.
UCINET: Social Network Analysis Tool
UCINET produced a square adjacency matrix that detailed
the number of times a pair of researchers had collaborated
together.23 A .dl was used as an input format for UCINET
and is similar to a Bibexcel .coc file.
A .dl file consisted of a data set preceded by the header:
dl n ¼ 50 format ¼ edgelist1 alpha ¼ no
labels embedded data:
n was the number of nodes in the network and Edgelist 1
specified that the data were textual. Labels were embedded
because UCINET drew the labels for the rows and columns
of the adjacency matrix from the first two fields in each
data record. The .dl data set was imported into UCINET
and saved as a REAL data set. REAL data sets are flexible
because they contained values that ranged from 21E36 to
þ1E36. A .dl file can be visualized using the network
graph visualization package, Netdraw distributed with
UCINET.
Binary, Weighted and Symmetric Data
Symmetrical data represented each researcher as an equal
contributor to the collaboration. Data were made symmetric
in UCINET using the symmetrize function from the
Transform menu. Binary data are where a relationship
between a researcher pair is coded 1 and no relationship is
coded 0. Weighted data are where a number is assigned to
each edge to indicate the number of publications for a
researcher pair. Weighted data were used in some measures
of collaboration when the Chemistry and Biology networks
were analysed separately.
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
101
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
Describing the Characteristics of Each Node: Assigning aResearch Specialism to Each Node
Research foci classification data for each researcher in the
Biology collaboration network were provided by the
Biology Research Support Office. Analysis of research foci
for the Chemistry collaboration network was not conducted
because chemistry research foci data were difficult to obtain.
Each researcher in the Biology collaboration was character-
ized by its research field using Netdraw.24 Transform .
node attribute editor allowed the user to insert a column to
describe each node.
Generating a Random Graph
A graph of 98 researchers from a potential 147 was produced
using steps detailed in the Biology and Chemistry Publication
Data, UCINET: Social Network Analysis Tool, and Binary,
Weighted and Symmetric Data sections. Researchers who
did not collaborate with their Biology and Chemistry col-
leagues were not included in the graph. These observed
data were compared with 1000 ER random graphs generated
using the random function from the ‘data’ menu in UCINET.
The following parameters were used to generate the ER
random graphs and were the same as the Chemistry and
Biology network data.
Density: 0.0406 (binary and undirected)
Number of nodes: 98
Number of graphs: 1000
Type of graph: Undirected
Data: Binary.
Binary data were used to produce the Chemistry and Biology
collaboration network to allow direct comparison with a fre-
quency distribution of 1000 ER graphs. A frequency distri-
bution was generated in Excel.
Results
Accuracy of Data
To illustrate inaccuracies found in the Biology collaboration
network, weighted network data mined from Web of Science
( journal publications only) are shown in Fig. 1, and
weighted data from the Biology Research Support Office
( journal publications only) (Fig. 2). Figure 1 has 45 research-
ers compared with 50 in Fig. 3, indicative of discrepancies
between the two data sets.
The Biology collaboration network consisting of 50
researchers is shown in Fig. 3 where edges are weighted
and all types of publication are used from data from the
Biology Research Support Office. The weight of the edges
is affected by the type of publication used. For example,
Figure 1. The Biology collaboration network using journal publication data from Web of Science, 2001 –2007. Data are weighted and undirected. There are45 nodes. The number of publications between a pair of researchers ranges from 0 to 26. A thick line indicates the researcher pair has collaboratedtogether many times, and a thin line means collaboration has occurred infrequently.
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
102
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
researchers 97 and 110 collaborated 20 times in Fig. 3 com-
pared with 15 times in Fig. 2 where only journal publications
were used.
A random sample of six Biology researchers was taken
from the Web of Science data set and compared with the
data set from Biology Research Support Office shown in
Table 1. Table 1 implies a tendency to overestimate the
number of publications in the Web of Science data set—a
false positive.
Chemistry Collaboration Network
Publication data were used to produce a weighted chemistry
collaboration network (Fig. 4). Forty-five out of 62 research-
ers (72%) collaborated with other researchers in the
Chemistry department compared with 59% in the Biology
department when both data sets used all types of publi-
cations and weighted data. Figure 4 has one giant and one
small component. Within the giant component, two sub-
components are evident connected by researchers 6, 26 and
43. The strongest links are between researchers 37 and 10
and researchers 33 and 61 with 88 and 101 publications,
respectively. This suggests these researchers had an intense
relationship for a short period of time and produced many
papers or a consistent relationship over a long period of time.
Researcher Degree
The degree is the number of ties a node has to other nodes.10
Researchers are likely to influence those they are directly
adjacent to.25 The patterns of ties in a network define an
researcher’s social role and position.6, 7. Nieminen’s
measure (equation 1)26 was used to express the number of
nodes adjacent to a point, Pk where Pi and Pk represent
two nodes:
CD Pkð Þ ¼Xn
i¼1
a Pi;Pkð Þ ð1Þ
where, a(Pi, Pk) ¼ 1 if and only if Pi and Pk are connected by
a line, otherwise 0.
The CD(Pk) is confounded by network size because it is an
absolute count of degree. To compare the relative centrality
of points for Chemistry and Biology network graphs, CD(Pk)
was normalized. Tables 2 and 3 show the degree and normal-
ized degree for Chemistry and Biology Collaboration
Figure 2. The Biology collaboration network using journal publication data from Biology Research Support Office, 2001 –2007. Data are weighted andundirected. There are 50 nodes. The number of publications between a pair of researchers ranges from 0 to 15. A thick line indicates the researcherpair has collaborated together many times, and a thin line means collaboration has occurred infrequently.
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
103
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
networks, respectively, using weighted undirected data.
Table 4 shows that the average number of links each
researcher has with others in the Biology collaboration
network was 2.6, the corresponding values for Chemistry
were 4.8 links per researcher.
Table 2 shows that researchers 6 and 16 had a normalized
degree of 27.273 and 25.000, respectively, occupying central
positions in the Chemistry collaboration network. Seven
researchers had a normalized degree of 2.273, occupying a
peripheral position in the network.
Table 3 shows researchers 88 and 64 have the highest nor-
malized degree of 16.327 and 14.286, respectively,
suggesting that they occupy a central position in the
Biology collaboration network and are able to influence
Figure 3. The Biology collaboration network using publication data from Biology Research Support Office, 2001– 2007. All types of publication used. Dataare weighted and undirected. There are 50 nodes with five components; one large and four small. A thick line indicates the researcher pair has collaboratedtogether many times, and a thin line means collaboration has occurred infrequently.
........................................................................................................................................................................................................................................
Table 1. Researcher degree, number of publication records and percentage difference between a random sample of six Biology researchers
Researcher Node degreeusing Web ofScience data
Node degreeusing ResearchSupport officedata
Number ofpublications usingWeb of Science dataset
Number of publicationusing ResearchSupport Office dataset
Difference in number ofpublication between ResearchSupport Office and Web ofScience data set
Percentagedifference (%)
110 3 2 68 58 10 10
88 7 8 47 31 16 41
67 6 5 21 15 6 33
64 7 6 38 11 27 110
82 7 7 57 40 17 35
94 5 5 34 26 8 26
Data mined from Web of Science are compared with those from the Biology Research Support Office. Weighted data and journal publications only were used forboth data sets.
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
104
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
whether collaboration takes place. Conversely, 18 research-
ers have a normalized degree of 2.041. This suggests that
these researchers do not occupy a peripheral position in the
network and are unlikely to influence potential for
collaboration.
Normalized degree distribution measures the number of
researchers with a particular degree and describes structural
properties of the network. Figure 5 shows the degree distri-
bution of the Biology and Chemistry collaboration networks
using binary data.
Betweenness
A researcher is also central to a social network if they are an
intermediary node between the paths of others.26 A
researcher will pursue all pathways that are independent of
each other (maximum flow) to collaborate with another pro-
portionally to the length of the pathways.27 Flow between-
ness measures the number of times a researcher lies on all
paths between another researcher pair. Flow betweenness
can be expressed by equation 2:7
CF xið Þ ¼Xn
jkk
Xn
mk xið Þ ð2Þ
where, mjk is the maximum flow from nodes xj to xk and
mjk(xi) the maximum flow from xj to xk that passes
through node xi.
Figure 4. The Chemistry collaboration network using publication data fromthe Chemistry department, 2001–2007. All types of publication used. Dataused are weighted and undirected. Forty-five researchers from a possible62 collaborated with their chemistry colleagues. The weight of the edge isindicated by the thickness of the line, with a thick line indicating many col-laborations and a thin line a few collaborations together.
........................................................................................................................................................................................................................................
Table 2. Degree and normalized degree for 45 authors in the Chemistry collaboration network
Researcher Degree Normalized degree Researcher Degree Normalized degree
6 12.0 27.273 37 4.0 9.091
16 11.0 25.000 51 4.0 9.091
62 10.0 22.727 13 4.0 9.091
40 9.0 20.455 9 3.0 6.818
1 9.0 20.455 10 3.0 6.818
59 8.0 18.182 30 3.0 6.818
43 8.0 18.182 53 3.0 6.818
61 8.0 18.182 42 3.0 6.818
4 8.0 18.182 26 3.0 6.818
54 8.0 18.182 36 3.0 6.818
20 7.0 15.909 7 2.0 4.545
22 7.0 15.909 24 2.0 4.545
17 7.0 15.909 8 2.0 4.545
21 7.0 15.909 41 2.0 4.545
23 6.0 13.636 27 2.0 4.545
19 6.0 13.636 48 1.0 2.273
35 6.0 13.636 29 1.0 2.273
14 6.0 13.636 34 1.0 2.273
39 5.0 11.364 45 1.0 2.273
32 5.0 11.364 12 1.0 2.273
55 5.0 11.364 5 1.0 2.273
33 5.0 11.364 60 1.0 2.273
58 5.0 11.364
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
105
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
Flow betweenness increases with network size and
density because it is an absolute count, preventing compari-
son between two network data sets. Alternatively, normal-
ized flow-betweenness can be used to compare
betweenness of Biology and Chemistry Collaboration net-
works shown in equation 3, 7 by dividing flow that
passes through xi by the total flow where xi does not
participate in communication.
C0F pið Þ ¼
Pn
jkk
PnmjkðxiÞ
Pn
jkk
Pnmjk
ð3Þ
Researchers in the Biology and Chemistry Collaboration
networks with high normalized flow betweenness can be
considered brokers. That is, researchers who have power
to initiate or prevent collaboration between another pair
of researchers.11 Tables 5 and 6 describe normalized
betweenness using UCINET’s Flow Betweenness algorithm
and weighted data for Chemistry and Biology collaboration
networks, respectively.
Researchers 88, 67 and 113 were the top three brokers
with normalized flow betweenness of 19.969, 12.822 and
11.476, respectively, in the Biology collaboration network
shown in Table 6. Researcher 67 influences communication
flow in the network but is not the most connected author.
Researcher 67 has a normalized degree of 10.204 compared
with a minimum of 2.041 and maximum 16.327. Eighteen of
........................................................................................................................................................................................................................................
Table 3. Degree and normalized degree for 50 authors in the Biology collaboration network
Researcher Degree Normalized degree Researcher Degree Normalized degree
88 8.0 16.327 117 2.0 4.082
64 7.0 14.286 137 2.0 4.082
82 7.0 14.286 124 2.0 4.082
113 6.0 12.245 133 2.0 4.082
94 5.0 10.204 95 2.0 4.082
67 5.0 10.204 110 2.0 4.082
65 5.0 10.204 135 2.0 4.082
132 5.0 10.204 69 1.0 2.041
71 4.0 8.163 115 1.0 2.041
146 4.0 8.163 118 1.0 2.041
99 3.0 6.122 119 1.0 2.041
81 3.0 6.122 147 1.0 2.041
130 3.0 6.122 108 1.0 2.041
74 3.0 6.122 109 1.0 2.041
143 3.0 6.122 126 1.0 2.041
78 3.0 6.122 111 1.0 2.041
98 3.0 6.122 112 1.0 2.041
128 3.0 6.122 131 1.0 2.041
122 3.0 6.122 73 1.0 2.041
100 3.0 6.122 97 1.0 2.041
85 3.0 6.122 77 1.0 2.041
120 3.0 6.122 136 1.0 2.041
66 2.0 4.082 102 1.0 2.041
91 2.0 4.082 103 1.0 2.041
93 2.0 4.082 106 1.0 2.041
................................................................................................................
Table 4. Degree and normalized degree summary statistic table forBiology and Chemistry authors
Degree Normalized degree
Summary statistics for Chemistry collaboration network
Average 4.844 11.010
Minimum 1.00 2.273
Maximum 12.00 27.273
Summary statistics for Biology collaboration network
Average 2.6 5.306
Minimum 1.0 2.041
Maximum 8.0 16.327
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
106
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
the 50 researchers have a normalized flow betweenness of
zero indicating they are not intermediaries in the flow of
communication between two researchers.
Researchers 21, 6 and 26 are the top three brokers in the
Chemistry collaboration network with normalized flow
betweenness values of 19.095, 16.254 and 14.095, respect-
ively. Researcher 26 is one of the nodes that connect the
two sub-compartments of the giant component and has an
average normalized degree of 6.818 compared with a
minimum of 2.273 and maximum of 27.273. This implies
that researcher 26 can influence communication flow
without being a highly connected researcher. Researcher 26
is connected to two highly connected researchers, 16 and
6, who have a normalized degree of 25.000 and 27.273,
respectively. Seven of the 45 researchers in the Chemistry col-
laboration network have a normalized flow betweenness of
0.00. This implies that intermediaries in the Chemistry col-
laboration network have greater potential to hinder or
promote communication compared with the Biology collab-
oration network.
Research Foci
Is research focus a meaningful way to classify authors?
Analysis of the normalized degree of a researcher to research-
ers within the same or different research foci can describe
this. Classification of each node into one of the nine research
foci may reveal whether a researcher tends to collaborate
with people within their own research foci or different.
This is shown in Fig. 6 using the Biology collaboration
network data. Figures 7 and 8 show normalized degree
within and between research foci, respectively. Figure 8
shows that Bioinformatics and Mathematics has a normal-
ized between foci degree of 0.17 compared with a within
normalized degree of 0.06. This implies that Bioinformatics
and Mathematics authors collaborate more with authors of
other foci than themselves. Figure 9 summarizes the
number of links between research foci on a network graph.
Normalized flow betweenness can be used to measure
whether a focus can promote or hinder communication
flow. If normalized flow betweenness using weighted data
is summed for all authors in a focus and divided by the
number of authors, the average normalized flow betweenness
can be calculated. Figure 10 shows the average normalized
flow betweenness between foci. The data suggest that the
two foci, Ecology and Evolution and Bioinformatics and
Mathematics, with normalized flow betweenness values of
5.48 and 5.40, respectively, are influential in promoting or
inhibiting potential for collaboration, indicating that they
may act as brokers in the network. Molecular
Microbiology and Molecular and Cellular Medicine have
Figure 5. Degree distribution for (A) the Biology Collaboration networkand (B) the Chemistry collaboration network. Most researchers in theBiology collaboration network (18 researchers) are connected to few indi-viduals and have a degree of 1 whereas eight researchers have a largedegree (5 –8) and are highly connected to others. The Chemistry collabor-ation network has six researchers with a degree of 1 and one researcherwith a degree of 12. Twenty-six researchers have a degree of 5– 12. To sum-marize, the Chemistry collaboration network has more researchers withhigher degree who occupy a central position in the network than theBiology collaboration network.
................................................................................................................
Table 5. Normalized weighted flow betweenness for 45 researchersin the Chemistry collaboration network
Researcher Normalized weightedflow betweenness
Researcher Normalized weightedflow betweenness
1 2.326 32 8.693
4 0.415 33 1.792
5 0.00 34 0.000
6 16.254 35 2.794
7 4.228 36 0.299
8 0.344 37 0.520
9 1.207 39 4.573
10 0.117 40 0.788
12 0.000 41 0.037
13 4.264 42 0.057
14 0.846 43 9.391
16 1.235 48 0.000
17 0.335 51 10.258
19 2.992 53 0.207
20 1.973 54 4.612
21 19.095 55 5.025
22 5.618 58 1.347
23 8.051 59 4.003
24 0.106 60 0.000
26 14.095 61 2.813
27 0.306 62 4.607
29 0.000 121 0.000
30 1.618
All types of publications were used.
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
107
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
little influence in promoting collaboration with normalized
flow betweenness values of 0 and 0.02, respectively. All
five authors belonging to the Molecular and Cellular
Medicine group are disconnected from the main component
of the graph, indicating that they either do not collaborate
with others or collaborate with researchers outside of the
Biology department.
Clustering
Is There Evidence of Clustering in the Biology and
Chemistry Collaboration Networks?
High transitivity or clustering means that there is a heigh-
tened probability of two people having collaborated if they
have one or more collaborators in common.11 Transitivity
can be measured by the clustering coefficient described by
equation 4.9
C ¼ 6� ðnumber of transitive triples on a graphÞðnumber of paths of length 2Þ ð4Þ
The Biology collaboration network shows little evidence of
clustering since only 0.09% of all types of triples are transi-
tive (Table 7). The Biology collaboration network consisted
of 102 transitive triples from a possible of 117 600 triples
of all kinds The Biology collaboration network has a
density of 0.05%, making collaboration between two
researchers who have a collaborator in common unlikely.
In comparison, the Chemistry collaboration network
shows 624 transitive triples from a possible 85 140 triples
of all kinds. The Chemistry collaboration network has a
higher percentage of transitive triples (0.72%) compared
with Biology (0.09%). The Chemistry collaboration
network has a density of 0.1%. Two authors in the
Chemistry collaboration network are more likely to publish
if they have collaborator in common than the Biology collab-
oration network.
Chemistry and Biology Collaboration Network ERRandom Graphs
Erdos and Renyi9 proposed the Gn,p random graph to
describe the random occurrence of a collection of nodes con-
nected by edges.11, 28 Each pair of nodes in a Gn,p graph is
connected together with independent probability p or not
connected with probability 1 2 p.
The observed network is a binary network consisting of
researchers from the Biology and Chemistry departments.29, 30
The density and number of nodes in the observed data and
1000 Gn,p graphs do not differ. If the average of a measure
such as transitivity, flow betweenness and degree in observed
data does not deviate from the average of 1000 random
graphs, when random graphs are plotted on a frequency distri-
bution, it is implied collaboration is a random process.29, 31, 32
Figure 11 displays a network graph of the observed data and
Fig. 12 is a frequency distribution that shows the average
flow betweenness for all nodes in each random graph for
1000 ER random graphs.
The average flow betweenness for 1000 ER random
graphs was 114.363 compared with 130.966 in the observed
network data. The observed average flow betweenness was
above the 95th (.127) percentile in the frequency distri-
bution of 1000 random graphs; therefore, average flow
betweenness occurs more often in observed data than at
random. Flow betweenness is not a random process that
occurs in the Biology and Chemistry scientific collaboration
network.
If our observed data for node degree approximate a
random graph, each node in the observed data will have a
degree similar to the network graph’s average. Few nodes
will have a degree higher or lower than the graphs’s
average. The average degree (3.939) for 1000 ER graphs
and observed data did not differ. This is because the
number of edges each node is connected to may vary in 1
Gn,p replication, but the average number of edges per
................................................................................................................
Table 6. Normalized weighted flow betweenness for 50 researchersin the Biology collaboration network
Researcher Normalized weightedflow betweenness
Researcher Normalized weightedflow betweenness
64 11.252 108 0.00
65 10.619 109 0.00
66 0.879 110 0.085
67 12.822 111 0.00
69 0.00 112 0.00
71 5.659 147 0.00
73 0.00 113 11.476
74 3.210 115 0.00
77 0.00 117 0.006
78 1.546 118 0.00
81 1.542 119 0.00
82 8.505 120 0.961
85 2.062 122 0.903
88 19.229 124 3.231
91 0.085 126 0.00
93 0.009 128 4.535
94 8.829 130 6.378
95 3.231 131 0.00
98 3.242 132 6.576
97 0.00 133 0.021
99 3.274 135 1.190
100 0.903 136 0.00
102 0.00 137 0.006
103 0.00 143 1.373
106 0.00 146 4.755
All types of publications were used
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
108
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
node in each graph does not differ from the other 999 Gn,p
replications.32
Transitivity shows that two authors are likely to collabor-
ate if they have an acquaintance in common.9 The percentage
of transitive triples for the random and observed networks is
,0.01% and 0.09%, respectively. Thus, transitivity in the
observed data occurs more often than at random. It is
likely that the process of introducing colleagues to one
another is important in the community structure of the
Biology and Chemistry collaboration network.
DiscussionWith knowledge of the structure of the Biology and
Chemistry collaboration network, through co-publication
of scientific papers, what have we learnt about interdisciplin-
ary research and scientific collaboration? The results give
Figure 6. Classification of the Biology collaboration network by research foci. Relationships are undirected and binary. All types of publication were used.The key details each author by research focus.
Figure 7. Normalized number of links within each research foci in theBiology department at the University of York. Data are binary and undir-ected. The normalized degree within a foci ranges from 0 to 0.29 with amean of 0.14. The Ecology and Evolution has the highest normalizeddegree of 0.29. Bioinformatics has the lowest normalized degree withinfoci of 0.06. This implies that Ecology and Evolution is well connectedwithin their own foci and Bioinformatics is poorly connected. However,Ecology and Evolution ( population 9) has increased opportunity to collab-orate within their own foci than Bioinformatics ( population 2).
Figure 8. Normalized number of links between each research foci in theBiology department at the University of York. Data are binary and undir-ected. The normalized between foci degree ranges from 0.13 to 0.25.Biochemistry and Biophysics has the highest normalized between degreeof 0.25. Conversely, Molecular and Cellular Medicine has the lowestbetween normalized degree of 0.04. Molecular and Cellular Medicineauthors collaborate to a lesser extent with other foci.
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
109
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
support to the idea that authors collaborate in a non-random
way, where measures of transitivity and betweenness in the
observed co-authorship network deviate from a frequency
distribution of 1000 ER random graphs. Authors make
informed decisions about who to collaborate with. Critique
of the assumptions underlying the measures used to analyse
the scientific collaboration network is needed to realize the
significance of the results. Is the structure of the Biology
and Chemistry networks, reported by this work, replicable
in other networks? Data mining methods used to analyse
publication data from the Biology and Chemistry depart-
ments and the problems of missing data may have skewed
results providing an inaccurate representation of the
Biology and Chemistry collaboration networks.
Data Mining and Sampling Bias
A tie between two authors occurs when they have published
a scientific paper together. A tie is a crude measure of
Figure 9. Number of times collaboration has occurred between authors ofdifferent foci. Data are undirected.
Figure 10. Average normalized flow betweenness values betweenresearch foci. Weighted undirected data are used. The Ecology andEvolution and Bioinformatics and Mathematics are brokers in the Biologycollaboration network. The normalized flow betweenness forBioinformatics and Mathematics and Ecology and Evolution are 5.397833and 5.479818, respectively. The normalized flow betweenness forMolecular Microbiology and Molecular and Cellular Medicine is 0.0 and0.014, respectively.
.................................................................................................................
Table 7. Transitivity of the Biology and Chemistry collaborationnetworks
Transitivity Chemistrycollaborationnetwork
Biologycollaborationnetwork
Number of non-vacuous transitive
ordered triples
624 102
Number of triples of all kinds 85 140 117 600
Number of triples in which i! j and
j! k
1232 366
Number of triangles with at least two
legs
2448 894
Number of triangles with at least three
legs
624 102
Percentage of all ordered triples 0.72% 0.09%
Transitivity: % of ordered triples in which
i! j and j! k that are transitive
50.65% 27.87%
Transitivity: % of triangles with at least
two legs that have three legs
25.49% 11.41%
Data are undirected and weighted.
Figure 11. Network graph using data from Biology and Chemistry publi-cation records. Data are binary and undirected and all types of publicationdata were used. There are three components disconnected from the giantcomponent. The Chemistry and Biology collaboration network combinedhas 98 nodes and a density (matrix average) of 0.0406.
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
110
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
collaboration since not all collaborators may be included in
the author list.33 Gift authorship is when an author is
included in a publication and has made no contribution to
research but has influenced the career of a main author.34
Conversely, an individual may not be included in the
author list but made significant contributions to research
efforts.
The use of co-publication to represent a tie has repercus-
sions when sampling ties in Biology and Chemistry collab-
oration networks. Bias is created when author lists are
incomplete. Authors employed for part of the 2001–2007
periods are less likely to collaborate with many of their col-
leagues than an author who has worked in the department
for the full 2001–2007 term. The effects of missing data
are unknown since we are unable to compare the observed
network data with a data set that includes all research
fellows, professors or lecturers. The Biology and
Chemistry networks have different numbers of links,
despite having similar population sizes. The Chemistry col-
laboration network had a density of 0.11% and had 45
nodes compared with a density of 0.05% in the Biology
collaboration network with 50 nodes. Therefore, caution
should be exercised when making comparisons
between the two networks. Constructing a scientific
collaboration network is a time-consuming process,
making it difficult to produce a number of replicates for
each network.31
The boundary specification problem35 refers to inclusion
parameters for authors in social networks. Authors who col-
laborated with others outside the Biology and Chemistry
departments at the University of York were not recorded in
the data sets. Thus, we cannot comment on the total
degree of an author only the degree in relation to other
authors in the Biology and Chemistry department.
Random Graphs
Comparison of measures for observed network data with a
random distribution of the measure allows us to introduce a
control. If we did not compare our data with a random
graph, we would not know whether observed data were a
random occurrence or not. One thousand replications of
the ER graph is adequate to investigate whether the
Biology and Chemistry collaboration network, detailed in
Fig. 12, does not differ from a random distribution.
However, we cannot statistically infer whether the Biology
and Chemistry collaboration network deviates or not
significantly from the random model because of the small
sample size.36 Thus, we have only carried out descriptive
analysis of the networks. Although rich in information,
generalizations about social relationships cannot be made.
A problem of comparing social network data with a
random graph is that random graphs poorly reflect social net-
works. The Biology and Chemistry collaboration networks
violate the independence assumption of Gn,p graphs
because collaboration networks demonstrate transitivity
where nodes are dependent on others: a property that ER
random graphs lack.28 However, given the tools available
to compare observed network data with a null model, the
ER random model was adequate. Similarly, care should be
taken when attempting to fit network data to a network
model such as small world due to small sample size.
ConclusionsWe conclude that scientific collaborators, quantified using
the Biology and Chemistry collaboration networks, seek
expertise from their colleagues in a non-random way to
fulfil a research goal. Highly connected authors have poten-
tial to influence whether collaboration occurs or not. Some
authors may not be well connected but influence collabor-
ation by acting as a communication point between two
highly connected authors. Publication records have provided
a documented source of information about professional
relationships among researchers. Future directions could
include investigating how scientific collaboration changes
over time in the Chemistry and Biology department. Will
the same authors be identified as brokers or highly connected
in 4 years time? Similarly, the change, if any, of foci member-
ship over time could be investigated and whether this affects
practice of interdisciplinary research.
AcknowledgementsThe author is grateful to Dr Daniel Franks and Dr Leo Caves
for providing feedback.
Figure 12. Frequency distribution of 1000 ER random graphs measuringflow betweenness. The graph peaks at the 112–114 interval with 241/
1000 having an average flow betweenness in this range. The 130–132and 97 –99 intervals had the lowest frequency of 2 and 3, respectively.The shape of the line deviates slightly from the Gn,p frequency distributionwith a sharp peak at 112–114 opposed to a rounded peak. This may bebecause too few data points were used.
.........................................................................................................................................................................................................................................Bioscience Horizons † Volume 2 † Number 2 † June 2009 Research article
.........................................................................................................................................................................................................................................
111
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from
........................................................................................................................................................................................................................................
FundingFunding to undertake this research was provided by
Department of Biology, University of York.
References1. Watts DJ, Strogatz SH, Steven H (1998) Collective dynamics of ‘small-world’
networks. Nature 390: 440 –442.
2. NIH. http://nihroadmap.nih.gov/interdisciplinary (26 February 2008).
3. Birnholtz JP (2006) What does it mean to be an author? The intersection of
credit, contribution, and collaboration in science. J Am Soc Inf Sci Technol 57:
1758 –70.
4. Cummings J, Kiesler S (2007) Coordination costs and project outcomes in
multi-university collaborations. Res Policy 36: 1620.
5. Newman MEJ (2001) Scientific collaboration network 1. Network construc-
tion and fundamental results. Phys Rev E 64: 016131.
6. Knoke D, Kuklinski JH (1983) Network Analysis, 2nd ed. Thousand Oaks: Sage
University Paper.
7. Freeman LC, Borgatti SP, White DR (1991) Centrality in valued graphs:
a measure of betweenness based on network flow. Soc Networks 13:
141– 154.
8. Newman MEJ (2004) Coauthorship networks and patterns of scientific col-
laboration. Proc Natl Acad Sci USA 101: 5200 –5205.
9. Newman MEJ, Barabasi AL, Watts DJ (2006) The Structure and Dynamics of
Networks. Princeton NJ: Princeton University Press.
10. Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s
functional organization. Nat Rev Genet 5: 101– 113.
11. Newman MEJ (2001) The structure of scientific collaboration networks. Proc
Natl Acad Sci USA 98: 404 –409.
12. Sheldon T, Canning AM, Demaine R et al. (2008) Information Needs for a
World Class University. York: University of York, Information strategy group.
13. www.transit.york.ac.uk (15 November 2007).
14. http://isiknowledge.com/wos (10 November 2007).
15. Borgatti SP, Everett MG, Freeman LC (2002) Ucinet 6.0 Version 6.175. Harvard,
MA: Analytic Technologies.
16. Persson O (2007) Bibexcel, Bibliometric Tool. Version 2007-10-30. Inforsk.
http://www.umu.se/inforsk/Bibexcel/.
17. Microsoft Excel (2007).
18. http://bioltfws1.york.ac.uk/biostaff/staff.php (29 October 2007).
19. http://www.york.ac.uk/depts/chem/staff/staff.html (3 January 2008).
20. Endnote Reference Manager http://www.endnote.com/ (5 January 2008).
21. Aho AV, Kernighan BW, Weinberger PJ (1988) The Awk Programming
Language. Reading, MA: Addison-Wesley Publication Company.
22. Caves L. Personal Communication, March 3, 2008.
23. Borgatti SP, Freeman LC, Everett MG (2005) UCINET 5 for Windows, Software
for Social Network Analysis, User Guide. Harvard, MA: Analytic Technologies.
24. Borgatti SP (2008) A Brief Guide to Netdraw. Harvard, MA: Analytic
Technologies.
25. Wellman B, Berkowitz SD (1988) Social Structures: A Network Approach, 1st
ed. New York: Cambridge University Press.
26. Freeman LC (1979) Centrality in social networks: conceptual clarification. Soc
Networks 1: 215 –239.
27. Hanneman R, Riddle M (2005) Introduction to Social Network Methods.
Riverside, CA: University of California.
28. Newman MEJ, Strogatz SH, Watts DJ (2001) Random graphs with arbitrary
degree distributions. Phys Rev E 64: 026118.
29. Manly BFJ (1997) Randomization, Bootstrap and Monte Carlo Methods in
Biology, 2nd ed. London: Chapman & Hall.
30. Robins G, Pattison P, Kalish Y et al. (2007) An introduction to exponential
random graph ( p*) models for social networks. Soc Networks 29: 173 –191.
31. James R, Croft D, Krause J (2009) Potential banana skins in animal social
network analysis. Behav Ecol Sociobiol in press.
32. Newman MEJ (2003) Random graphs as models of networks. In Bornholdt S,
Schuster HG eds, Handbook of Graphs and Networks. Berlin: Wiley VCH.
33. Cluxton LD (2004) Scientific authorship, Part 2. History, recurring issues, prac-
tices and guidelines. Mutat Res 589: 31–45.
34. Fuchs S (1992) The Professional Quest for Truth: A Social Theory of Science and
Knowledge. Albany, NY: State University of New York Press.
35. Kossinets G (2006) Effects of missing data in social networks. Soc Networks
28: 247–268.
36. Coolican H (2004) Research Methods and Statistics in Psychology, 4th ed.
London: Hodder Arnold.
Author BiographyLeana Bellanca studied for a BSc (Hons) degree in Molecular Cell Biology at the University of York. She developed interests in
Systems Biology, including network theory and relational data analysis. Leana is also interested in Immunology and
Epidemiology. She is employed as a Medical Information Officer at Professional Information, Richmond, North
Yorkshire. This involves drug safety reporting and providing information about medicinal products to healthcare
professionals and patients. Leana intends to remain in this field of work.
Submitted on 30 September 2008; accepted on 18 December 2008; advance access publication 8 April 2009
.........................................................................................................................................................................................................................................Research article Bioscience Horizons † Volume 2 † Number 2 † June 2009
.........................................................................................................................................................................................................................................
112
at Oxford U
niversity on August 1, 2012
http://biohorizons.oxfordjournals.org/D
ownloaded from