package ‘diderot’ · diderot-package 3 version: 0.12 date: 2017-09-19 license: gpl (>=2) a...
TRANSCRIPT
Package ‘Diderot’April 19, 2020
Type Package
Title Bibliographic Network Analysis
Version 0.13
Date 2020-04-16
Author Christian Vincenot
Maintainer Christian Vincenot <[email protected]>
Depends R (>= 3.0.1), foreach, graphics, igraph
Imports stringi, RCurl, splitstackshape, data.table, utils, stats,parallel, doParallel
DescriptionEnables the user to build a citation network/graph from bibliographic data and, based on modu-larity and heterocitation metrics, assess the degree of awareness/cross-fertilization be-tween two corpora/communities. This toolset is optimized for Scopus data.
License GPL (>= 2)
NeedsCompilation no
Repository CRAN
Date/Publication 2020-04-19 11:20:02 UTC
R topics documented:Diderot-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2build_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5compute_BC_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6compute_citation_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8compute_custom_modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9compute_Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10compute_Ji_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11compute_modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12create_bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13get_date_from_doi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15get_reference_title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16heterocitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1
2 Diderot-package
heterocitation_authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18load_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19MC_baseline_distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21nb_refs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22plot_authors_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23plot_heterocitation_timeseries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24plot_modularity_timeseries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25plot_publication_curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26precompute_heterocitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27save_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28significance_Dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Index 32
Diderot-package Bibliographic Network Analysis Package
Description
Denis Diderot (1713-1784), French philosopher and co-founder of the modern encyclopedia.
This package allows to detect and quantify the unification or separation of two bibliographic corporathrough the creation of citation networks. This tool can be used to study the spread of conceptsacross scientific disciplines, or the fusion/fission of scientific communities.
Details
Package: DiderotType: Package
Diderot-package 3
Version: 0.13Date: 2020-04-17License: GPL (>=2)
A typical flow of use of the package includes the following points.
First, literature metadata, including references, from the two fields of studies to analyze are down-loaded from Scopus (or built manually). This data is imported to create a bibliographic datasetusing create_bibliography.
Second, a graph is created with a call to build_graph to reproduce the citation network in thebibliographic dataset.
Finally, statistical analysis can be performed on the graph to assess the fusion/fission state of the twocorpora/communities. Heterocitation indices (i.e. share and balance) show how much publicationsor authors cite papers from the other corpus (see heterocitation and heterocitation_authorsrespectively). Such analysis shall always be preceded by a call to precompute_heterocitationto perform initial calculations. These metrics are completed by traditional as well as custom mod-ularity metrics (see compute_modularity and compute_custom_modularity respectively) thattranslate how much the communities are separated. Publications that foster mutual awareness andcross-fertilization between the corpora/communities can be identified using the usual betweenesscentrality metric (see compute_BC_ranking) and the Ji index (see compute_Ji_ranking).
Author(s)
Christian Vincenot
Maintainer: Christian Vincenot ([email protected])
See Also
igraph
Examples
## Not run:# Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM)# were downloaded from Scopus. The structure of each corpus is as follows:tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE)str(tt,strict.width="cut")### 'data.frame': 3184 obs. of 9 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl"..### $ Title : chr "Coevolution of epidemics, social networks, and in"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20"..### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2."..### $ Abstract : chr "This research shows how a limited supply of antiv"..### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ References : chr "(2009) Centre Approves Restricted Retail Sale of "..
# Define the name of corpora (labels) and specific keywords to identify relevant
4 Diderot-package
# publications (keys).labels<-c("IBM","ABM")keys<-c("individual-based model|individual based model",
"agent-based model|agent based model")
# Build the IBM-ABM bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"),
labels=labels, keywords=keys)### [1] "File IBMmerged.csv contains 3184 records"### [1] "File ABMmerged.csv contains 9641 records"
# Build and save citation graphgr<-build_graph(db=db,small.year.mismatch=T,fine.check.nb.authors=2,
attrs=c("Corpus","Year","Authors", "DOI"))### [1] "Graph built! Execution time: 1200.22 seconds."save_graph(gr, "graph.graphml")
# Compute and plot modularitycompute_modularity(gr_sx, 1987, 2018)###[1] 0.3164805plot_modularity_timeseries(gr_sx, 1987, 2018, window=1000)
# Compute and plot publication heterocitationgr_sx<-precompute_heterocitation(gr,labels=labels,infLimitYear=1987, supLimitYear=2018)###[1] "Summary of the nodes considered for computation (1987-2017)"###[1] "-----------------------------------------------------------"###[1] "IBM ABM IBM|ABM"###[1] "1928 5378 153"###[1]###[1] "Edges summary"###[1] "-------------"###[1] "IBM->IBM/IBM->Other 5583/1086 => Prop 0.163"###[1] "ABM->ABM/ABM->Other 16946/2665 => Prop 0.136"###[1] "General Same/Diff 22529/3751 => Prop 0.143"###[1]###[1] "Heterocitation metrics"###[1] "----------------------"###[1] "Sx ALL / IBM / ABM"###[1] "0.127 / 0.137 / 0.124"###[1] "Dx ALL / IBM / ABM"###[1] "-0.652 / -0.803 / -0.598"heterocitation(gr_sx, labels=labels, 1987, 2005)###[1] "Sx ALL / ABM / IBM"###[1] "0.047 / 0.214 / 0.007"###[1] "Dx ALL / ABM / IBM"###[1] "-0.927 / -0.690 / -0.982"plot_heterocitation_timeseries(gr_sx, labels=labels, mini=-1, maxi=-1, cesure=2005)
# Compute author heterocitationhetA<-heterocitation_authors(gr_sx, 1987, 2018, pub_threshold=4)head(hetA[order(hetA$avgDx,decreasing=T),c(1)], n=10)### [1] "Ashlock D." "Evora J." "Hernandez J.J." "Hernandez M." "Gooch K.J."### [6] "Reinhardt J.W." "Ng K." "Kazanci C." "Senior A.M." "Ariel G."
build_graph 5
# Try to figure which publication are most impactful in terms of cross-fertilizationjir<-compute_Ji_ranking(gr_sx, labels=labels, 1987, 2018)head(jir[,c(2,7)],n=3)### Title Ji### 758 A standard protocol for describing individual-based and agent-based models 200### 4437 Pattern-oriented modeling of agent-based complex systems: Lessons from ecology 134### 33 The ODD protocol: A review and first update 120
## End(Not run)
build_graph Builds a citation graph.
Description
Builds a citation graph based on a database of bibliographic records generated with create_bibliography.This process is automatically parallelized on multicore hardware. By default, matching between ti-tle and references is done based on the full title, publication year, and three first authors. Publicationattributes present in the dataframe can be copied to graph nodes using the attrs argument.
Usage
build_graph(db, title = "Cite Me As", year = "Year", authors = "Authors",ref = "Cited References", set.title.as.name = F, attrs = NULL,verbose = F, makeCluster.type = "PSOCK", nb.cores=NA,fine.check.threshold = 1000, fine.check.nb.authors = 3,small.year.mismatch = T, debug = F)
Arguments
db Bibliographic database created with created_bibliography.
title Name of the data frame column in which publication titles are listed.
year Name of the data frame column in which publication years are listed.
authors Name of the data frame column in which publication authors are listed.
ref Name of the data frame column in which publication references are listed.set.title.as.name
Set graph vertex ID to publication title
attrs Attributes of the bibliographic database (i.e. data frame column names, such as"Authors"", "Year") to be set as vertex attributes.
verbose Verbosity flag triggering a more detailed output during graph building.makeCluster.type
Type of cluster to be used to parallelize the graph building process. For moreoptions, see makeCluster in the doParallel library.
nb.cores Number of cores to be used for parallel computation.
6 compute_BC_ranking
fine.check.threshold
Title length under which citation matching is further confirmed based on publi-cation year. This value can be reduced to increase performance on large biblio-graphic databases. By default, publication year check is always performed.
fine.check.nb.authors
Maximum number of authors to check against for citation matching. This valuecan be reduced to increase performance on large bibliographic databases. De-fault value is three authors.
small.year.mismatch
Flag indicating whether small year mismatches (+- 1 year) should be tolerated.It is recommended to keep this this flag to TRUE to accomodate usual inconsis-tencies in bibliographic databases.
debug Debug flag allowing the user to browse function calls upon execution error. Formore details, see recover in the utils library.
Value
Returns a graph object.
Author(s)
Christian Vincenot ([email protected])
See Also
create_bibliography
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
compute_BC_ranking Function to compute the Betweenness Centrality ranking.
Description
This function computes the Betweenness Centrality (BC) for each graph node (i.e. publication).
compute_BC_ranking 7
Usage
compute_BC_ranking(gr, labels, write_to_graph = F)
Arguments
gr Citation graph
labels Labels (i.e. names) of the two corpora featured in the graph.
write_to_graph Flag to indicate whether to write results to the graph (i.e. save BC values asnode attributes).
Value
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, BC) sorted by de-creasing BC. Else, returns the graph given as input to which BC values are added as node attributes.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_citation_ranking, compute_Ji_ranking
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Compute BCcompute_BC_ranking(gr, labels)
8 compute_citation_ranking
compute_citation_ranking
Function to compute the citation ranking.
Description
This function computes the citation count for each graph node (i.e. publication).
Usage
compute_citation_ranking(gr, labels, write_to_graph = F)
Arguments
gr Citation graphlabels Labels (i.e. names) of the two corpora featured in the graph.write_to_graph Flag to indicate whether to write results to the graph (i.e. save citation count
values as node attributes).
Value
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations) sorted bydecreasing citation count. Else, returns the graph given as input to which citation count values areadded as node attributes.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_BC_ranking, compute_Ji_ranking
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Compute Citation Rankingcompute_citation_ranking(gr, labels)
compute_custom_modularity 9
compute_custom_modularity
Function to compute the custom modularity of a citation graph over atime window.
Description
This function computes custom modularity of the citation graph. Custom modularity here standsfor Newman’s modularity computed over the subgraph comprising nodes that belong to a singlecorpus only and are within the time window, as well as direct outgoing neighbors of the formernodes (whatever their year tag). Basically, the citation graph considered thus includes publicationswithin the time window as well as older papers that they cite.
Usage
compute_custom_modularity(gr, infLimitYear, supLimitYear)
Arguments
gr Citation graph
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
Value
Returns the custom modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_modularity, plot_modularity_timeseries
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
10 compute_Ji
# Compute Custom Modularitycompute_custom_modularity(gr, 1990, 2018)
compute_Ji Function to compute the evolution of the Ji metric for a term (e.g. pub-lication title) present in the reference list of a bibliographic dataset.
Description
This function computes the Ji metric for a given term from a bibliographic dataset and returns itsannual evolution within the timeframe specified. This metric indicates how much the term (e.g.publication title, software name) is cited simulteaneously in the references of both corpora andis thus important for cross-fertilization between the two communities. This function is run on thebibliographic dataset (created with create_bibliography) and is thus useful before graph creationor when the term to be searched is not the title of a node in the resulting graph. For instance, if theuser knows that a publication (or, e.g. software or scientific database referenced only through aURL or grey literature) is cited and may have an impact on cross-fertilization between the twocommunities (the literature of which is represented by the two corpora) but does not have its ownentry in the bibliographic database and would therefore not be featured as a node in the graphcreated by build_graph, the compute_Ji function can be used to assess its importance.
Usage
compute_Ji(db, pubtitle, labels, from = -1, to = -1)
Arguments
db Bibliographic database created with created_bibliography.
pubtitle Publication title, or more generally term to be searched (e.g. software name).
labels Labels (i.e. names) of the two corpora featured in the graph.
from Start year of the time window considered (included)
to End year of the time window considered (*excluded*)
Value
Dataframe containing year and Ji metric value.
Author(s)
Christian Vincenot ([email protected])
See Also
create_bibliography
compute_Ji_ranking 11
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Compute Jicompute_Ji(db, "Title1", labels, from=1990, to=2018)
compute_Ji_ranking Function to compute the Ji metric ranking for publications in a citationgraph.
Description
This function computes the Ji metric for each graph node (i.e. publication). This metric indicateshow much a publication is cited simulteaneously by both corpora and is thus important for cross-fertilization between the two communities.
Usage
compute_Ji_ranking(gr, labels, infLimitYear, supLimitYear, write_to_graph=F)
Arguments
gr Citation graph
labels Labels (i.e. names) of the two corpora featured in the graph.
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
write_to_graph Flag to indicate whether to write results to the graph (i.e. save Ji values as nodeattributes).
Value
If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations fromcorpus 1, citation from corpus 2, Ji) sorted by decreasing Ji. Else, returns the graph given as inputto which Ji are added as node attributes.
Author(s)
Christian Vincenot ([email protected])
12 compute_modularity
See Also
build_graph, precompute_heterocitation, compute_Ji
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Compute Ji rankingcompute_Ji_ranking(gr, labels, 1990, 2018)
compute_modularity Function to compute citation graph modularity over a time window.
Description
This function computes Newman’s modularity of the citation graph restricted to a given time win-dow and ignoring nodes belonging to both corpora simultaneously.
Usage
compute_modularity(gr, infLimitYear, supLimitYear)
Arguments
gr Citation graph
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
Value
Returns Newman’s modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_custom_modularity, plot_modularity_timeseries
create_bibliography 13
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Compute Modularitycompute_modularity(gr, 1990, 2018)
create_bibliography Function to create a bibliographic dataset
Description
This function creates a bibliographic dataset based on two external corpus files, each representingthe bibliography of a given domain.
Usage
create_bibliography(corpora_files, labels, keywords, retrieve_pubdates = F,clean_refs = F, encoding = NULL)
Arguments
corpora_files Vector containing the pathes to two corpus files (e.g. Scopus exports). TheCSV files should contain for each record at least Authors (comma separated),Publication Title, Publication Year, and References (semicolon separated). Theinclusion of DOI (for date checking; see the retrieve_pubdates option) as well asAbstract, Author.Keywords, and Index.Keywords (for the in-depth identificationof publications belonging to both corpora) are strongly recommended.
labels Labels (i.e. names) given to the two corpora to be analyzed.
keywords Keywords identifying the two corporaretrieve_pubdates
Flag indicating whether to confirm publication dates by retrieving them (seeget_date_from_doi)
clean_refs Attempt to clean references and keep titles only. NOT RECOMMENDED, es-pecially if build_graph should be used subsequently.
encoding Character encoding used in the input files.
14 create_bibliography
Value
Returns a dataframe containing a bibliographic dataset usable by Diderot and including all refer-ences from both corpora.
Author(s)
Christian Vincenot ([email protected])
See Also
build_graph
Examples
## Not run:# Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM)# were downloaded from Scopus. The structure of each corpus is as follows:tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE)str(tt,strict.width="cut")### 'data.frame': 3184 obs. of 9 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl"..### $ Title : chr "Coevolution of epidemics, social networks, and in"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20"..### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2."..### $ Abstract : chr "This research shows how a limited supply of antiv"..### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ References : chr "(2009) Centre Approves Restricted Retail Sale of "..
# Define the name of corpora (labels) and specific keywords to identify relevant# publications (keys).labels<-c("IBM","ABM")keys<-c("individual-based model|individual based model",
"agent-based model|agent based model")
# Build the IBM-ABM bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"),
labels=labels, keywords=keys)### [1] "File IBMmerged.csv contains 3184 records"### [1] "File ABMmerged.csv contains 9641 records"
# Processed output. Note the field name changes (for standardization with ISI Web# of Knowledge format) and addition of the "Corpus" field (with identification of# joint "IBM | ABM" publications based on keywords).str(db, strict.width="cut")### 'data.frame': 12504 obs. of 10 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sloot "..### $ Cite Me As : chr "Coevolution of epidemics, social networks, and indivi"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.2010.0"..
get_date_from_doi 15
### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.0-78"..### $ Abstract : chr "This research shows how a limited supply of antiviral"..### $ Author.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono"..### $ Cited References: chr "(2009) Centre Approves Restricted Retail Sale of Tami"..### $ Corpus : chr "IBM" "IBM | ABM" "IBM | ABM" "IBM" ...
## End(Not run)
get_date_from_doi Function to retrieve publication date based on Digital Object Identi-fier (DOI)
Description
This function retrieves precise publication date by querying the Digital Object Identifier (DOI)web server. Alternatively, if extract_date_from_doi is set to TRUE, the function will first try toextract a publication year from the publication DOI string. If create_bibliography is called withretrieve_pubdates = TRUE, it calls get_date_from_doi for each record to confirm publicationdates.
Usage
get_date_from_doi(doi, extract_date_from_doi)
Arguments
doi Character string representing the Digital Object Identifier (DOI) of the publica-tion
extract_date_from_doi
Flag indicating whether to try to simply extract publication year from the DOIstring before restorting to online queries to the DOI server
Value
Returns a date in YYYY-MM-DD format or YYYY-MM format if extract_date_from_doi is set toTRUE.
Note
Scopus records already contain the year of publication of scientific papers indexed. However, insome cases these are inaccurate and can be verified by comparing them with the date retrieved bythis function. Note that
Author(s)
Christian Vincenot ([email protected])
16 get_reference_title
See Also
create_bibliography
Examples
## Not run:# Query publication date from DOI serverget_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)
## End(Not run)# Extract date from DOI stringget_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)
get_reference_title Function to extract the publication title from a reference using Freecite
Description
Function to extract the publication title from a reference using an online query to Freecite
Usage
get_reference_title(str)
Arguments
str Character string representing a reference
Value
Returns a character string of the publication title
Author(s)
Christian Vincenot ([email protected])
heterocitation 17
heterocitation Function to calculate the heterocitation between two corpora
Description
This function calculates the heterocitation share and heterocitation balance between two corporaA and B in the time window specified. The heterocitation share (Sx) of a publication belongingto corpus A is defined as the percentage of citations to publications belonging to corpus B (orA|B) in its reference list. The global heterocitation share for corpus A is calculated as the averageheterocitation share of the publications that corpus A contains (e.g. a value of 0.2 for corpus Aindicates that, on average, publications in corpus A cite only 20% of papers from corpus B). Theheterocitation balance metric (Dx), on the other hand, takes into consideration the respective sizesof corpus A and B to discern how much the heterocitation share deviates from values expected inthe case of well-mixedness (i.e. if A and B originated from a unique community; e.g. a value of-50% for corpus A indicates that, on average, publications in corpus A cite papers from corpus Bhalf less frequently than expected, which suggests a lack of mutual awareness between the corporaand related communities).
Usage
heterocitation(gr, labels, infLimitYear, supLimitYear)
Arguments
gr Citation graph priorly preprocessed with precompute_heterocitation
labels Labels (i.e. names) of the two corpora featured in the graph.
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
Value
Returns a numerical vector containing, in this order, the heterocitation share (Sx) for corpus A, Band global, and the heterocitation balance (Dx) for A, B and global.
Note
precompute_heterocitation should be called before running this function.
Author(s)
Christian Vincenot ([email protected])
See Also
precompute_heterocitation, plot_heterocitation_timeseries, heterocitation_authors,MC_baseline_distribution, significance_Dx
18 heterocitation_authors
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Heterocitationgr<-precompute_heterocitation(gr,labels, 1990, 2018)heterocitation(gr,labels, 1990, 2018)
heterocitation_authors
This function computes heterocitation metrics for authors
Description
This function computes heterocitation metrics for authors. The heterocitation share (Sx) and het-erocitation balance (Dx) of an author are calculated as the average of these metrics for paperspublished by this author within the given time window. See the man page of heterocitation fordefinitions of heterocitation metrics.
Usage
heterocitation_authors(gr, infLimitYear, supLimitYear, pub_threshold = 0,remove_orphans = F, remove_citations_to_joint_papers = F)
Arguments
gr Citation graph priorly preprocessed with precompute_heterocitation
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
pub_threshold Minimum number of publications for authors to be considered.
remove_orphans Do not consider publications that do not cite any other paper in the dataset (i.e.orphan nodes in the citation network)
remove_citations_to_joint_papers
Do not consider publications belonging to both corpora in the authors’ averagecorpus calculation.
load_graph 19
Value
Returns a data frame containing author name ("Authors"), number of publications ("NbPubs"), listof publication years ("Years"), list of publications corpora ("Corpus"), list of publication heteroc-itation share ("Sx"), list of publication heterocitation balance ("Dx"), average heterocitation share("avgSx"), average heterocitation balance ("avgDx"), average corpus value of publications ("avg-Corpus"), regression coefficient of the heterocitation share evolution ("coeffSx"), regression coeffi-cient of the heterocitation balance evolution ("coeffDx"), regression coefficient of the evolution ofthe corpus value of publications ("coeffCorpus").
Note
precompute_heterocitation should be called before running this function.
Author(s)
Christian Vincenot ([email protected])
See Also
precompute_heterocitation, heterocitation
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Heterocitationgr<-precompute_heterocitation(gr,labels, 1990, 2018)
# Author heterocitationheterocitation_authors(gr, 1990, 2018)
load_graph Function to load a citation graph
Description
This function loads a citation graph saved on the filesystem.
20 load_graph
Usage
load_graph(filename)
Arguments
filename File to load
Value
Returns a graph object.
Note
This function basically supports only graph previously saved with Diderot’s save_graph. However,as the file is actually a graphml file handled by igraph, advanced users may use this function onappropriate graphs created elsewhere, as long as they respect Diderot’s structure (presence of a"Corpus"" field, etc).
Author(s)
Christian Vincenot ([email protected])
See Also
save_graph
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
## Not run:save_graph(gr, "Saved.graphml")
# Load saved graphgr<-load_graph("Saved.graphml")
## End(Not run)
MC_baseline_distribution 21
MC_baseline_distribution
Function to compute baseline heterocitation values for the graph un-der study with random permutation of corpus attributions
Description
This function performs Monte Carlo runs with random permutations of corpus tags in the graphprovided and computes the heterocitation balance on the new graphs. Permutation is repeated overseveral iterations (set through the "rep" argument) and provides a baseline Dx values for the graphtopology considered. This can then be compared with the Dx value obtained for the original graphto evaluate whether it could merely be the result of chance (see significance_Dx).
Usage
MC_baseline_distribution(gr, labels, infYearLimit, supYearLimit, rep = 20)
Arguments
gr Graph file (created with build_graph)
labels List of the names of the two corpora studied (e.g. c("Computer Science", "Math-ematics")), present in the "Corpus" attribute
infYearLimit Minimum year considered in this study
supYearLimit Maximum year considered in this study
rep Number of Monte Carlo iterations
Value
This function currently plots the histograms of distribution of Dx values generated through randompermutations of corpus tags among the records. Returns a list containing:
Dx1 Dx value for corpus 1 per iteration
Dx1 Dx value for corpus 2 per iteration
DxALL Global Dx value per iteration
Author(s)
Christian Vincenot ([email protected])
See Also
significance_Dx, heterocitation
22 nb_refs
nb_refs Function to calculate the number of citations in a bibliographicdatabase
Description
This function output the number of records and citations in a bibliographic database, and returnsthe latter.
Usage
nb_refs(db)
Arguments
db Bibliographic database
Value
Returns the number of citations in the bibliographic database
Author(s)
Christian Vincenot ([email protected])
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# NB refsnb_refs(db)
plot_authors_count 23
plot_authors_count Function to plot authors count timeseries
Description
This function plots and returns the annual number of new authors and cumulative number of authorsin the bibliographic database.
Usage
plot_authors_count(db)
Arguments
db Bibliographic database created with created_bibliography.
Value
Returns a dataframe containing year, annual number of new authors (i.e. not seen before), andcumulative number of authors.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_citation_ranking
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Plot authors count## Not run: plot_authors_count(db)
24 plot_heterocitation_timeseries
plot_heterocitation_timeseries
Function to plot the heterocitation share and heterocitation balancetimeseries
Description
This function plots and returns annual heterocitation share and heterocitation balance values.
Usage
plot_heterocitation_timeseries(gr_arg, labels, mini = -1, maxi = -1, cesure = -1)
Arguments
gr_arg Citation graph
labels Labels (i.e. names) of the two corpora featured in the graph.
mini Start year of the time window
maxi End year of the time window
cesure Year before which values should be cumulated. Default value is -1, which indi-cates that each year in the time window should be plotted.
Value
Returns a dataframe with year and annual values for heterocitation share (sx1, sx2 and sxall forcorpus A and B and global resp.) and heterocitation balance (dx1, dx2 and dxall for corpus A andB and global resp.).
Author(s)
Christian Vincenot ([email protected])
See Also
precompute_heterocitation, heterocitation
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
plot_modularity_timeseries 25
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Heterocitation timeseriesgr<-precompute_heterocitation(gr,labels, 1990, 2018)plot_heterocitation_timeseries(gr, labels, 1990, 2018)
plot_modularity_timeseries
Function to plot graph modularity timeseries
Description
This function plots and returns annual graph modularity values for predefined corpora (representingcommunities). See compute_modularity for details on modularity calculation.
Usage
plot_modularity_timeseries(gr_arg, mini = -1, maxi = -1, cesure = -1, window = 1,modularity_function = "normal")
Arguments
gr_arg Citation graph
mini Start year of the time window
maxi End year of the time window
cesure Year before which values should be cumulated. Default value is -1, which indi-cates that each year in the time window should be plotted.
window The temporal sliding window size over which modularity should be computed.modularity_function
Modularity function to be used for the calculation: "custom" indicates thatcompute_custom_modularity will be used, whereas "normal" indicates thatcompute_modularity will be used.
Value
Returns a dataframe containing year and annual modularity value.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_modularity, compute_custom_modularity
26 plot_publication_curve
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Compute Modularity timeseriesplot_modularity_timeseries(gr, 1990, 2018)
plot_publication_curve
Function to plot publication count timeseries
Description
This function plots and returns the annual number of publications.
Usage
plot_publication_curve(gr, labels, k = 1)
Arguments
gr Citation graph
labels Labels (i.e. names) of the two corpora featured in the graph.
k Text font size (multiplier of cex values)
Value
Returns a dataframe containing year and annual publication count for each corpus and both together.
Author(s)
Christian Vincenot ([email protected])
See Also
compute_citation_ranking
precompute_heterocitation 27
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
# Publication curveplot_publication_curve(gr,labels)
precompute_heterocitation
This function precomputes heterocitation values for eachnode/publication of a graph.
Description
This function computes heterocitation values for each publication and stores them as node attributesin the graph. The heterocitation share of a publication belonging to corpus A is defined as thepercentage of citations to publications belonging to corpus B (or A|B) in its reference list (e.g. avalue of 0.2 for a publication in corpus A indicates that the publication cites only 20% of papersfrom corpus B). The heterocitation balance metric, on the other hand, takes into consideration therespective sizes of corpus A and B to discern how much the heterocitation share deviates fromvalues expected in the case of well-mixedness (i.e. if A and B originated from a unique community;e.g. a value of -30% for a publication in corpus A indicates that it cites papers from corpus B 30%less frequently than expected).
Usage
precompute_heterocitation(gr, labels, infLimitYear, supLimitYear)
Arguments
gr Citation graph
labels Labels (i.e. names) of the two corpora featured in the graph.
infLimitYear Start year of the time window considered (included)
supLimitYear End year of the time window considered (*excluded*)
28 save_graph
Value
Returns the graph gr with added node attributes Sx and Dx representing the heterocitation share andheterocitation balance respectively.
Note
Corpus-wide heterocitation values can be computed using heterocitation.
Author(s)
Christian Vincenot ([email protected])
See Also
heterocitation, plot_heterocitation_timeseries, compute_Ji_ranking
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
gr<-precompute_heterocitation(gr,labels, 1990, 2018)
save_graph Function to save a graph
Description
This function saves a graph produced with Diderot. The resulting structure is actually a graphmlfile and can thus be exported to third party software.
Usage
save_graph(gr, filename)
Arguments
gr Graph object to save.
filename File to save the graph to.
significance_Dx 29
Author(s)
Christian Vincenot ([email protected])
See Also
load_graph
Examples
labels<-c("Corpus1","Corpus2")
# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),
labels=labels, keywords=NA)
# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)
## Not run: save_graph(gr, "Saved.graphml")
significance_Dx Function to evaluate the significance of the heterocitation balancevalue
Description
This function assesses to what extent the heterocitation balance (Dx value) calculated for a graph de-parts from baseline situation. The latter typically represents Dx values to be expected by chance, i.e.through random permutation of corpus assignation at the node/vertex level (see MC_baseline_distribution).A Shapiro-Wilk test is first executed on the control distribution (using shapiro.test) and if thenormality hypothesis is not rejected, a one-sample t test (see t.test) is used to test whether valueis significantly different from the control distribution. The strength of this difference is additionallyassessed through Glass’ delta, an estimator of effect size (Glass, McGraw, and Smith, 1981).
Usage
significance_Dx(value, control, normality_threshold=0.05)
Arguments
value Heterocitation balance (Dx) calculated for the citation network studied
control Baseline distribution of Dx values in control experimentsnormality_threshold
P value threshold under which the hypothesis of normality is rejected in thepreliminary Shapiro-Wilk test
30 significance_Dx
Value
Returns a list containing the p-value obtained in a one-sample t test comparing value and the controldistribution (with null hypothesis being that value could come from the control distribution) or NAif the control distribution is not normal based on a Shapiro-Wilk normality test, and Glass’ estimatorof effect size.
Author(s)
Christian Vincenot ([email protected])
References
Glass, G. V., McGraw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills:Sage Publications.
See Also
significance_Dx, heterocitation
Examples
## Not run:# Heterocitation in our graphheterocitation(gr_sx, labels=labels, 1987, 2005)### [1] "Sx ALL / ABM / IBM"### [1] "0.047 / 0.214 / 0.007"### [1] "Dx ALL / ABM / IBM"### [1] "-0.927 / -0.690 / -0.982"
# Generate a baseline distribution for Dx values obtained through chance# Here, we run 200 iterations of node corpus permutationsbaseline<-MC_baseline_distribution(gr_sx, labels, 1987, 2018, 200)
# Assess whether our observed Dx is possibly due to chancesignificance_Dx(-0.927, baseline[["Dx ALL"]])### [1] "Distribution is normal. Performing t-test."###### One Sample t-test###### data: value - control### t = -323.0017, df = 319, p-value < 2.2e-16### alternative hypothesis: true mean is not equal to 0### 95 percent confidence interval:### -0.9159834 -0.9048923### sample estimates:### mean of x### -0.9104379###### [1] "Glass' effect size: -18.0563442219448"
significance_Dx 31
## End(Not run)
Index
∗Topic packageDiderot-package, 2
Author Hetero-citation(heterocitation_authors), 18
Author Heterocitation(heterocitation_authors), 18
Betweenness Centrality(compute_BC_ranking), 6
Bibliographic Network Creation(create_bibliography), 13
build_graph, 3, 5, 10, 12–14
Citation Graph Building (build_graph), 5Citation Graph Creation
(create_bibliography), 13Citation Graph Loading (load_graph), 19Citation Ranking
(compute_citation_ranking), 8compute_BC_ranking, 3, 6, 8compute_citation_ranking, 7, 8, 23, 26compute_custom_modularity, 3, 9, 12, 25compute_Ji, 10, 10, 12compute_Ji_ranking, 3, 7, 8, 11, 28compute_modularity, 3, 9, 12, 25create_bibliography, 3, 6, 10, 13, 15, 16Custom Modularity
(compute_custom_modularity), 9
Diderot (Diderot-package), 2Diderot-package, 2
get_date_from_doi, 13, 15, 15get_reference_title, 16Graph Saving (save_graph), 28
Hetero-Citation (heterocitation), 17Hetero-citation Timeseries
(plot_heterocitation_timeseries),24
heterocitation, 3, 17, 18, 19, 21, 24, 28, 30Heterocitation Significance
(significance_Dx), 29Heterocitation Timeseries
(plot_heterocitation_timeseries),24
heterocitation_authors, 3, 17, 18
igraph, 3
Ji Metric (compute_Ji), 10Ji Ranking (compute_Ji_ranking), 11
load_graph, 19, 29
makeCluster, 5MC_baseline_distribution, 17, 21, 29modularity (compute_modularity), 12Modularity Timeseries
(plot_modularity_timeseries),25
nb_refs, 22
plot_authors_count, 23plot_heterocitation_timeseries, 17, 24,
28plot_modularity_timeseries, 9, 12, 25plot_publication_curve, 26precompute_heterocitation, 3, 12, 17–19,
24, 27Publication Timeseries
(plot_publication_curve), 26
recover, 6
save_graph, 20, 28shapiro.test, 29significance_Dx, 17, 21, 29, 30
t.test, 29
32