package ‘diderot’ · diderot-package 3 version: 0.12 date: 2017-09-19 license: gpl (>=2) a...

Package ‘Diderot’April 19, 2020

Type Package

Title Bibliographic Network Analysis

Version 0.13

Date 2020-04-16

Author Christian Vincenot

Maintainer Christian Vincenot <[email protected]>

Depends R (>= 3.0.1), foreach, graphics, igraph

Imports stringi, RCurl, splitstackshape, data.table, utils, stats,parallel, doParallel

DescriptionEnables the user to build a citation network/graph from bibliographic data and, based on modu-larity and heterocitation metrics, assess the degree of awareness/cross-fertilization be-tween two corpora/communities. This toolset is optimized for Scopus data.

License GPL (>= 2)

NeedsCompilation no

Repository CRAN

Date/Publication 2020-04-19 11:20:02 UTC

R topics documented:Diderot-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2build_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5compute_BC_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6compute_citation_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8compute_custom_modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9compute_Ji . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10compute_Ji_ranking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11compute_modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12create_bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13get_date_from_doi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15get_reference_title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16heterocitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1

2 Diderot-package

heterocitation_authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18load_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19MC_baseline_distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21nb_refs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22plot_authors_count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23plot_heterocitation_timeseries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24plot_modularity_timeseries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25plot_publication_curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26precompute_heterocitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27save_graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28significance_Dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Index 32

Diderot-package Bibliographic Network Analysis Package

Description

Denis Diderot (1713-1784), French philosopher and co-founder of the modern encyclopedia.

This package allows to detect and quantify the unification or separation of two bibliographic corporathrough the creation of citation networks. This tool can be used to study the spread of conceptsacross scientific disciplines, or the fusion/fission of scientific communities.

Details

Package: DiderotType: Package

Diderot-package 3

Version: 0.13Date: 2020-04-17License: GPL (>=2)

A typical flow of use of the package includes the following points.

First, literature metadata, including references, from the two fields of studies to analyze are down-loaded from Scopus (or built manually). This data is imported to create a bibliographic datasetusing create_bibliography.

Second, a graph is created with a call to build_graph to reproduce the citation network in thebibliographic dataset.

Finally, statistical analysis can be performed on the graph to assess the fusion/fission state of the twocorpora/communities. Heterocitation indices (i.e. share and balance) show how much publicationsor authors cite papers from the other corpus (see heterocitation and heterocitation_authorsrespectively). Such analysis shall always be preceded by a call to precompute_heterocitationto perform initial calculations. These metrics are completed by traditional as well as custom mod-ularity metrics (see compute_modularity and compute_custom_modularity respectively) thattranslate how much the communities are separated. Publications that foster mutual awareness andcross-fertilization between the corpora/communities can be identified using the usual betweenesscentrality metric (see compute_BC_ranking) and the Ji index (see compute_Ji_ranking).

Author(s)

Christian Vincenot

Maintainer: Christian Vincenot ([email protected])

See Also

igraph

Examples

## Not run:# Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM)# were downloaded from Scopus. The structure of each corpus is as follows:tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE)str(tt,strict.width="cut")### 'data.frame': 3184 obs. of 9 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl"..### $ Title : chr "Coevolution of epidemics, social networks, and in"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20"..### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2."..### $ Abstract : chr "This research shows how a limited supply of antiv"..### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ References : chr "(2009) Centre Approves Restricted Retail Sale of "..

# Define the name of corpora (labels) and specific keywords to identify relevant

4 Diderot-package

# publications (keys).labels<-c("IBM","ABM")keys<-c("individual-based model|individual based model",

"agent-based model|agent based model")

# Build the IBM-ABM bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"),

labels=labels, keywords=keys)### [1] "File IBMmerged.csv contains 3184 records"### [1] "File ABMmerged.csv contains 9641 records"

# Build and save citation graphgr<-build_graph(db=db,small.year.mismatch=T,fine.check.nb.authors=2,

attrs=c("Corpus","Year","Authors", "DOI"))### [1] "Graph built! Execution time: 1200.22 seconds."save_graph(gr, "graph.graphml")

# Compute and plot modularitycompute_modularity(gr_sx, 1987, 2018)###[1] 0.3164805plot_modularity_timeseries(gr_sx, 1987, 2018, window=1000)

# Compute and plot publication heterocitationgr_sx<-precompute_heterocitation(gr,labels=labels,infLimitYear=1987, supLimitYear=2018)###[1] "Summary of the nodes considered for computation (1987-2017)"###[1] "-----------------------------------------------------------"###[1] "IBM ABM IBM|ABM"###[1] "1928 5378 153"###[1]###[1] "Edges summary"###[1] "-------------"###[1] "IBM->IBM/IBM->Other 5583/1086 => Prop 0.163"###[1] "ABM->ABM/ABM->Other 16946/2665 => Prop 0.136"###[1] "General Same/Diff 22529/3751 => Prop 0.143"###[1]###[1] "Heterocitation metrics"###[1] "----------------------"###[1] "Sx ALL / IBM / ABM"###[1] "0.127 / 0.137 / 0.124"###[1] "Dx ALL / IBM / ABM"###[1] "-0.652 / -0.803 / -0.598"heterocitation(gr_sx, labels=labels, 1987, 2005)###[1] "Sx ALL / ABM / IBM"###[1] "0.047 / 0.214 / 0.007"###[1] "Dx ALL / ABM / IBM"###[1] "-0.927 / -0.690 / -0.982"plot_heterocitation_timeseries(gr_sx, labels=labels, mini=-1, maxi=-1, cesure=2005)

# Compute author heterocitationhetA<-heterocitation_authors(gr_sx, 1987, 2018, pub_threshold=4)head(hetA[order(hetA$avgDx,decreasing=T),c(1)], n=10)### [1] "Ashlock D." "Evora J." "Hernandez J.J." "Hernandez M." "Gooch K.J."### [6] "Reinhardt J.W." "Ng K." "Kazanci C." "Senior A.M." "Ariel G."

build_graph 5

# Try to figure which publication are most impactful in terms of cross-fertilizationjir<-compute_Ji_ranking(gr_sx, labels=labels, 1987, 2018)head(jir[,c(2,7)],n=3)### Title Ji### 758 A standard protocol for describing individual-based and agent-based models 200### 4437 Pattern-oriented modeling of agent-based complex systems: Lessons from ecology 134### 33 The ODD protocol: A review and first update 120

## End(Not run)

build_graph Builds a citation graph.

Description

Builds a citation graph based on a database of bibliographic records generated with create_bibliography.This process is automatically parallelized on multicore hardware. By default, matching between ti-tle and references is done based on the full title, publication year, and three first authors. Publicationattributes present in the dataframe can be copied to graph nodes using the attrs argument.

Usage

build_graph(db, title = "Cite Me As", year = "Year", authors = "Authors",ref = "Cited References", set.title.as.name = F, attrs = NULL,verbose = F, makeCluster.type = "PSOCK", nb.cores=NA,fine.check.threshold = 1000, fine.check.nb.authors = 3,small.year.mismatch = T, debug = F)

Arguments

db Bibliographic database created with created_bibliography.

title Name of the data frame column in which publication titles are listed.

year Name of the data frame column in which publication years are listed.

authors Name of the data frame column in which publication authors are listed.

ref Name of the data frame column in which publication references are listed.set.title.as.name

Set graph vertex ID to publication title

attrs Attributes of the bibliographic database (i.e. data frame column names, such as"Authors"", "Year") to be set as vertex attributes.

verbose Verbosity flag triggering a more detailed output during graph building.makeCluster.type

Type of cluster to be used to parallelize the graph building process. For moreoptions, see makeCluster in the doParallel library.

nb.cores Number of cores to be used for parallel computation.

6 compute_BC_ranking

fine.check.threshold

Title length under which citation matching is further confirmed based on publi-cation year. This value can be reduced to increase performance on large biblio-graphic databases. By default, publication year check is always performed.

fine.check.nb.authors

Maximum number of authors to check against for citation matching. This valuecan be reduced to increase performance on large bibliographic databases. De-fault value is three authors.

small.year.mismatch

Flag indicating whether small year mismatches (+- 1 year) should be tolerated.It is recommended to keep this this flag to TRUE to accomodate usual inconsis-tencies in bibliographic databases.

debug Debug flag allowing the user to browse function calls upon execution error. Formore details, see recover in the utils library.

Value

Returns a graph object.

Author(s)

Christian Vincenot ([email protected])

See Also

create_bibliography

Examples

labels<-c("Corpus1","Corpus2")

# Build a bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c(tempfi1,tempfi2),

labels=labels, keywords=NA)

# Build graphgr<-build_graph(db=db,small.year.mismatch=TRUE, attrs=c("Corpus","Year","Authors"), nb.cores=1)

compute_BC_ranking Function to compute the Betweenness Centrality ranking.

Description

This function computes the Betweenness Centrality (BC) for each graph node (i.e. publication).

compute_BC_ranking 7

Usage

compute_BC_ranking(gr, labels, write_to_graph = F)

Arguments

gr Citation graph

labels Labels (i.e. names) of the two corpora featured in the graph.

write_to_graph Flag to indicate whether to write results to the graph (i.e. save BC values asnode attributes).

Value

If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, BC) sorted by de-creasing BC. Else, returns the graph given as input to which BC values are added as node attributes.

Author(s)


See Also

compute_citation_ranking, compute_Ji_ranking

Examples





# Compute BCcompute_BC_ranking(gr, labels)

8 compute_citation_ranking

compute_citation_ranking

Function to compute the citation ranking.

Description

This function computes the citation count for each graph node (i.e. publication).

Usage

compute_citation_ranking(gr, labels, write_to_graph = F)

Arguments

gr Citation graphlabels Labels (i.e. names) of the two corpora featured in the graph.write_to_graph Flag to indicate whether to write results to the graph (i.e. save citation count

values as node attributes).

Value

If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations) sorted bydecreasing citation count. Else, returns the graph given as input to which citation count values areadded as node attributes.

Author(s)


See Also

compute_BC_ranking, compute_Ji_ranking

Examples





# Compute Citation Rankingcompute_citation_ranking(gr, labels)

compute_custom_modularity 9

compute_custom_modularity

Function to compute the custom modularity of a citation graph over atime window.

Description

This function computes custom modularity of the citation graph. Custom modularity here standsfor Newman’s modularity computed over the subgraph comprising nodes that belong to a singlecorpus only and are within the time window, as well as direct outgoing neighbors of the formernodes (whatever their year tag). Basically, the citation graph considered thus includes publicationswithin the time window as well as older papers that they cite.

Usage

compute_custom_modularity(gr, infLimitYear, supLimitYear)

Arguments

gr Citation graph

infLimitYear Start year of the time window considered (included)

supLimitYear End year of the time window considered (*excluded*)

Value

Returns the custom modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.

Author(s)


See Also

compute_modularity, plot_modularity_timeseries

Examples





10 compute_Ji

# Compute Custom Modularitycompute_custom_modularity(gr, 1990, 2018)

compute_Ji Function to compute the evolution of the Ji metric for a term (e.g. pub-lication title) present in the reference list of a bibliographic dataset.

Description

This function computes the Ji metric for a given term from a bibliographic dataset and returns itsannual evolution within the timeframe specified. This metric indicates how much the term (e.g.publication title, software name) is cited simulteaneously in the references of both corpora andis thus important for cross-fertilization between the two communities. This function is run on thebibliographic dataset (created with create_bibliography) and is thus useful before graph creationor when the term to be searched is not the title of a node in the resulting graph. For instance, if theuser knows that a publication (or, e.g. software or scientific database referenced only through aURL or grey literature) is cited and may have an impact on cross-fertilization between the twocommunities (the literature of which is represented by the two corpora) but does not have its ownentry in the bibliographic database and would therefore not be featured as a node in the graphcreated by build_graph, the compute_Ji function can be used to assess its importance.

Usage

compute_Ji(db, pubtitle, labels, from = -1, to = -1)

Arguments


pubtitle Publication title, or more generally term to be searched (e.g. software name).


from Start year of the time window considered (included)

to End year of the time window considered (*excluded*)

Value

Dataframe containing year and Ji metric value.

Author(s)


See Also

create_bibliography

compute_Ji_ranking 11

Examples




# Compute Jicompute_Ji(db, "Title1", labels, from=1990, to=2018)

compute_Ji_ranking Function to compute the Ji metric ranking for publications in a citationgraph.

Description

This function computes the Ji metric for each graph node (i.e. publication). This metric indicateshow much a publication is cited simulteaneously by both corpora and is thus important for cross-fertilization between the two communities.

Usage

compute_Ji_ranking(gr, labels, infLimitYear, supLimitYear, write_to_graph=F)

Arguments

gr Citation graph




write_to_graph Flag to indicate whether to write results to the graph (i.e. save Ji values as nodeattributes).

Value

If write_to_graph is FALSE, returns a list of entries (authors, title, year, corpus, citations fromcorpus 1, citation from corpus 2, Ji) sorted by decreasing Ji. Else, returns the graph given as inputto which Ji are added as node attributes.

Author(s)


12 compute_modularity

See Also

build_graph, precompute_heterocitation, compute_Ji

Examples





# Compute Ji rankingcompute_Ji_ranking(gr, labels, 1990, 2018)

compute_modularity Function to compute citation graph modularity over a time window.

Description

This function computes Newman’s modularity of the citation graph restricted to a given time win-dow and ignoring nodes belonging to both corpora simultaneously.

Usage

compute_modularity(gr, infLimitYear, supLimitYear)

Arguments

gr Citation graph



Value

Returns Newman’s modularity value of the subgraph restricted to the interval [infLimitYear;supLimitYear[.

Author(s)


See Also

compute_custom_modularity, plot_modularity_timeseries

create_bibliography 13

Examples





# Compute Modularitycompute_modularity(gr, 1990, 2018)

create_bibliography Function to create a bibliographic dataset

Description

This function creates a bibliographic dataset based on two external corpus files, each representingthe bibliography of a given domain.

Usage

create_bibliography(corpora_files, labels, keywords, retrieve_pubdates = F,clean_refs = F, encoding = NULL)

Arguments

corpora_files Vector containing the pathes to two corpus files (e.g. Scopus exports). TheCSV files should contain for each record at least Authors (comma separated),Publication Title, Publication Year, and References (semicolon separated). Theinclusion of DOI (for date checking; see the retrieve_pubdates option) as well asAbstract, Author.Keywords, and Index.Keywords (for the in-depth identificationof publications belonging to both corpora) are strongly recommended.

labels Labels (i.e. names) given to the two corpora to be analyzed.

keywords Keywords identifying the two corporaretrieve_pubdates

Flag indicating whether to confirm publication dates by retrieving them (seeget_date_from_doi)

clean_refs Attempt to clean references and keep titles only. NOT RECOMMENDED, es-pecially if build_graph should be used subsequently.

encoding Character encoding used in the input files.

14 create_bibliography

Value

Returns a dataframe containing a bibliographic dataset usable by Diderot and including all refer-ences from both corpora.

Author(s)


See Also

build_graph

Examples

## Not run:# Two corpora on individual-based modelling (IBM) and agent-based modelling (ABM)# were downloaded from Scopus. The structure of each corpus is as follows:tt<-read.csv("IBMmerged.csv", stringsAsFactors=FALSE)str(tt,strict.width="cut")### 'data.frame': 3184 obs. of 9 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sl"..### $ Title : chr "Coevolution of epidemics, social networks, and in"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.20"..### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2."..### $ Abstract : chr "This research shows how a limited supply of antiv"..### $ Author.Keywords: chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microe"..### $ References : chr "(2009) Centre Approves Restricted Retail Sale of "..

# Define the name of corpora (labels) and specific keywords to identify relevant# publications (keys).labels<-c("IBM","ABM")keys<-c("individual-based model|individual based model",

"agent-based model|agent based model")

# Build the IBM-ABM bibliographical dataset from Scopus exportsdb<-create_bibliography(corpora_files=c("IBMmerged.csv","ABMmerged.csv"),

labels=labels, keywords=keys)### [1] "File IBMmerged.csv contains 3184 records"### [1] "File ABMmerged.csv contains 9641 records"

# Processed output. Note the field name changes (for standardization with ISI Web# of Knowledge format) and addition of the "Corpus" field (with identification of# joint "IBM | ABM" publications based on keywords).str(db, strict.width="cut")### 'data.frame': 12504 obs. of 10 variables:### $ Authors : chr "Chen J., Marathe A., Marathe M." "Van Dijk D., Sloot "..### $ Cite Me As : chr "Coevolution of epidemics, social networks, and indivi"..### $ Year : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...### $ DOI : chr "10.1007/978-3-642-12079-4_28" "10.1016/j.procs.2010.0"..

get_date_from_doi 15

### $ Link : chr "http://www.scopus.com/inward/record.url?eid=2-s2.0-78"..### $ Abstract : chr "This research shows how a limited supply of antiviral"..### $ Author.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono"..### $ Index.Keywords : chr "Antiviral; Behavioral economics; Epidemic; Microecono"..### $ Cited References: chr "(2009) Centre Approves Restricted Retail Sale of Tami"..### $ Corpus : chr "IBM" "IBM | ABM" "IBM | ABM" "IBM" ...

## End(Not run)

get_date_from_doi Function to retrieve publication date based on Digital Object Identi-fier (DOI)

Description

This function retrieves precise publication date by querying the Digital Object Identifier (DOI)web server. Alternatively, if extract_date_from_doi is set to TRUE, the function will first try toextract a publication year from the publication DOI string. If create_bibliography is called withretrieve_pubdates = TRUE, it calls get_date_from_doi for each record to confirm publicationdates.

Usage

get_date_from_doi(doi, extract_date_from_doi)

Arguments

doi Character string representing the Digital Object Identifier (DOI) of the publica-tion

extract_date_from_doi

Flag indicating whether to try to simply extract publication year from the DOIstring before restorting to online queries to the DOI server

Value

Returns a date in YYYY-MM-DD format or YYYY-MM format if extract_date_from_doi is set toTRUE.

Note

Scopus records already contain the year of publication of scientific papers indexed. However, insome cases these are inaccurate and can be verified by comparing them with the date retrieved bythis function. Note that

Author(s)


16 get_reference_title

See Also

create_bibliography

Examples

## Not run:# Query publication date from DOI serverget_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)

## End(Not run)# Extract date from DOI stringget_date_from_doi(doi="10.1016/j.procs.2010.04.250",extract_date_from_doi=TRUE)

get_reference_title Function to extract the publication title from a reference using Freecite

Description

Function to extract the publication title from a reference using an online query to Freecite

Usage

get_reference_title(str)

Arguments

str Character string representing a reference

Value

Returns a character string of the publication title

Author(s)


heterocitation 17

heterocitation Function to calculate the heterocitation between two corpora

Description

This function calculates the heterocitation share and heterocitation balance between two corporaA and B in the time window specified. The heterocitation share (Sx) of a publication belongingto corpus A is defined as the percentage of citations to publications belonging to corpus B (orA|B) in its reference list. The global heterocitation share for corpus A is calculated as the averageheterocitation share of the publications that corpus A contains (e.g. a value of 0.2 for corpus Aindicates that, on average, publications in corpus A cite only 20% of papers from corpus B). Theheterocitation balance metric (Dx), on the other hand, takes into consideration the respective sizesof corpus A and B to discern how much the heterocitation share deviates from values expected inthe case of well-mixedness (i.e. if A and B originated from a unique community; e.g. a value of-50% for corpus A indicates that, on average, publications in corpus A cite papers from corpus Bhalf less frequently than expected, which suggests a lack of mutual awareness between the corporaand related communities).

Usage

heterocitation(gr, labels, infLimitYear, supLimitYear)

Arguments

gr Citation graph priorly preprocessed with precompute_heterocitation




Value

Returns a numerical vector containing, in this order, the heterocitation share (Sx) for corpus A, Band global, and the heterocitation balance (Dx) for A, B and global.

Note

precompute_heterocitation should be called before running this function.

Author(s)


See Also

precompute_heterocitation, plot_heterocitation_timeseries, heterocitation_authors,MC_baseline_distribution, significance_Dx

18 heterocitation_authors

Examples





# Heterocitationgr<-precompute_heterocitation(gr,labels, 1990, 2018)heterocitation(gr,labels, 1990, 2018)

heterocitation_authors

This function computes heterocitation metrics for authors

Description

This function computes heterocitation metrics for authors. The heterocitation share (Sx) and het-erocitation balance (Dx) of an author are calculated as the average of these metrics for paperspublished by this author within the given time window. See the man page of heterocitation fordefinitions of heterocitation metrics.

Usage

heterocitation_authors(gr, infLimitYear, supLimitYear, pub_threshold = 0,remove_orphans = F, remove_citations_to_joint_papers = F)

Arguments

gr Citation graph priorly preprocessed with precompute_heterocitation



pub_threshold Minimum number of publications for authors to be considered.

remove_orphans Do not consider publications that do not cite any other paper in the dataset (i.e.orphan nodes in the citation network)

remove_citations_to_joint_papers

Do not consider publications belonging to both corpora in the authors’ averagecorpus calculation.

load_graph 19

Value

Returns a data frame containing author name ("Authors"), number of publications ("NbPubs"), listof publication years ("Years"), list of publications corpora ("Corpus"), list of publication heteroc-itation share ("Sx"), list of publication heterocitation balance ("Dx"), average heterocitation share("avgSx"), average heterocitation balance ("avgDx"), average corpus value of publications ("avg-Corpus"), regression coefficient of the heterocitation share evolution ("coeffSx"), regression coeffi-cient of the heterocitation balance evolution ("coeffDx"), regression coefficient of the evolution ofthe corpus value of publications ("coeffCorpus").

Note

precompute_heterocitation should be called before running this function.

Author(s)


See Also

precompute_heterocitation, heterocitation

Examples





# Heterocitationgr<-precompute_heterocitation(gr,labels, 1990, 2018)

# Author heterocitationheterocitation_authors(gr, 1990, 2018)

load_graph Function to load a citation graph

Description

This function loads a citation graph saved on the filesystem.

20 load_graph

Usage

load_graph(filename)

Arguments

filename File to load

Value

Returns a graph object.

Note

This function basically supports only graph previously saved with Diderot’s save_graph. However,as the file is actually a graphml file handled by igraph, advanced users may use this function onappropriate graphs created elsewhere, as long as they respect Diderot’s structure (presence of a"Corpus"" field, etc).

Author(s)


See Also

save_graph

Examples





## Not run:save_graph(gr, "Saved.graphml")

# Load saved graphgr<-load_graph("Saved.graphml")

## End(Not run)

MC_baseline_distribution 21

MC_baseline_distribution

Function to compute baseline heterocitation values for the graph un-der study with random permutation of corpus attributions

Description

This function performs Monte Carlo runs with random permutations of corpus tags in the graphprovided and computes the heterocitation balance on the new graphs. Permutation is repeated overseveral iterations (set through the "rep" argument) and provides a baseline Dx values for the graphtopology considered. This can then be compared with the Dx value obtained for the original graphto evaluate whether it could merely be the result of chance (see significance_Dx).

Usage

MC_baseline_distribution(gr, labels, infYearLimit, supYearLimit, rep = 20)

Arguments

gr Graph file (created with build_graph)

labels List of the names of the two corpora studied (e.g. c("Computer Science", "Math-ematics")), present in the "Corpus" attribute

infYearLimit Minimum year considered in this study

supYearLimit Maximum year considered in this study

rep Number of Monte Carlo iterations

Value

This function currently plots the histograms of distribution of Dx values generated through randompermutations of corpus tags among the records. Returns a list containing:

Dx1 Dx value for corpus 1 per iteration

Dx1 Dx value for corpus 2 per iteration

DxALL Global Dx value per iteration

Author(s)


See Also

significance_Dx, heterocitation

22 nb_refs

nb_refs Function to calculate the number of citations in a bibliographicdatabase

Description

This function output the number of records and citations in a bibliographic database, and returnsthe latter.

Usage

nb_refs(db)

Arguments

db Bibliographic database

Value

Returns the number of citations in the bibliographic database

Author(s)


Examples




# NB refsnb_refs(db)

plot_authors_count 23

plot_authors_count Function to plot authors count timeseries

Description

This function plots and returns the annual number of new authors and cumulative number of authorsin the bibliographic database.

Usage

plot_authors_count(db)

Arguments


Value

Returns a dataframe containing year, annual number of new authors (i.e. not seen before), andcumulative number of authors.

Author(s)


See Also


Examples




# Plot authors count## Not run: plot_authors_count(db)

24 plot_heterocitation_timeseries

plot_heterocitation_timeseries

Function to plot the heterocitation share and heterocitation balancetimeseries

Description

This function plots and returns annual heterocitation share and heterocitation balance values.

Usage

plot_heterocitation_timeseries(gr_arg, labels, mini = -1, maxi = -1, cesure = -1)

Arguments

gr_arg Citation graph


mini Start year of the time window

maxi End year of the time window

cesure Year before which values should be cumulated. Default value is -1, which indi-cates that each year in the time window should be plotted.

Value

Returns a dataframe with year and annual values for heterocitation share (sx1, sx2 and sxall forcorpus A and B and global resp.) and heterocitation balance (dx1, dx2 and dxall for corpus A andB and global resp.).

Author(s)


See Also

precompute_heterocitation, heterocitation

Examples




plot_modularity_timeseries 25


# Heterocitation timeseriesgr<-precompute_heterocitation(gr,labels, 1990, 2018)plot_heterocitation_timeseries(gr, labels, 1990, 2018)

plot_modularity_timeseries

Function to plot graph modularity timeseries

Description

This function plots and returns annual graph modularity values for predefined corpora (representingcommunities). See compute_modularity for details on modularity calculation.

Usage

plot_modularity_timeseries(gr_arg, mini = -1, maxi = -1, cesure = -1, window = 1,modularity_function = "normal")

Arguments

gr_arg Citation graph

mini Start year of the time window

maxi End year of the time window

cesure Year before which values should be cumulated. Default value is -1, which indi-cates that each year in the time window should be plotted.

window The temporal sliding window size over which modularity should be computed.modularity_function

Modularity function to be used for the calculation: "custom" indicates thatcompute_custom_modularity will be used, whereas "normal" indicates thatcompute_modularity will be used.

Value

Returns a dataframe containing year and annual modularity value.

Author(s)


See Also

compute_modularity, compute_custom_modularity

26 plot_publication_curve

Examples





# Compute Modularity timeseriesplot_modularity_timeseries(gr, 1990, 2018)

plot_publication_curve

Function to plot publication count timeseries

Description

This function plots and returns the annual number of publications.

Usage

plot_publication_curve(gr, labels, k = 1)

Arguments

gr Citation graph


k Text font size (multiplier of cex values)

Value

Returns a dataframe containing year and annual publication count for each corpus and both together.

Author(s)


See Also


precompute_heterocitation 27

Examples





# Publication curveplot_publication_curve(gr,labels)

precompute_heterocitation

This function precomputes heterocitation values for eachnode/publication of a graph.

Description

This function computes heterocitation values for each publication and stores them as node attributesin the graph. The heterocitation share of a publication belonging to corpus A is defined as thepercentage of citations to publications belonging to corpus B (or A|B) in its reference list (e.g. avalue of 0.2 for a publication in corpus A indicates that the publication cites only 20% of papersfrom corpus B). The heterocitation balance metric, on the other hand, takes into consideration therespective sizes of corpus A and B to discern how much the heterocitation share deviates fromvalues expected in the case of well-mixedness (i.e. if A and B originated from a unique community;e.g. a value of -30% for a publication in corpus A indicates that it cites papers from corpus B 30%less frequently than expected).

Usage

precompute_heterocitation(gr, labels, infLimitYear, supLimitYear)

Arguments

gr Citation graph




28 save_graph

Value

Returns the graph gr with added node attributes Sx and Dx representing the heterocitation share andheterocitation balance respectively.

Note

Corpus-wide heterocitation values can be computed using heterocitation.

Author(s)


See Also

heterocitation, plot_heterocitation_timeseries, compute_Ji_ranking

Examples





gr<-precompute_heterocitation(gr,labels, 1990, 2018)

save_graph Function to save a graph

Description

This function saves a graph produced with Diderot. The resulting structure is actually a graphmlfile and can thus be exported to third party software.

Usage

save_graph(gr, filename)

Arguments

gr Graph object to save.

filename File to save the graph to.

significance_Dx 29

Author(s)


See Also

load_graph

Examples





## Not run: save_graph(gr, "Saved.graphml")

significance_Dx Function to evaluate the significance of the heterocitation balancevalue

Description

This function assesses to what extent the heterocitation balance (Dx value) calculated for a graph de-parts from baseline situation. The latter typically represents Dx values to be expected by chance, i.e.through random permutation of corpus assignation at the node/vertex level (see MC_baseline_distribution).A Shapiro-Wilk test is first executed on the control distribution (using shapiro.test) and if thenormality hypothesis is not rejected, a one-sample t test (see t.test) is used to test whether valueis significantly different from the control distribution. The strength of this difference is additionallyassessed through Glass’ delta, an estimator of effect size (Glass, McGraw, and Smith, 1981).

Usage

significance_Dx(value, control, normality_threshold=0.05)

Arguments

value Heterocitation balance (Dx) calculated for the citation network studied

control Baseline distribution of Dx values in control experimentsnormality_threshold

P value threshold under which the hypothesis of normality is rejected in thepreliminary Shapiro-Wilk test

30 significance_Dx

Value

Returns a list containing the p-value obtained in a one-sample t test comparing value and the controldistribution (with null hypothesis being that value could come from the control distribution) or NAif the control distribution is not normal based on a Shapiro-Wilk normality test, and Glass’ estimatorof effect size.

Author(s)


References

Glass, G. V., McGraw, B., & Smith, M. L. (1981). Meta-analysis in social research. Beverly Hills:Sage Publications.

See Also

significance_Dx, heterocitation

Examples

## Not run:# Heterocitation in our graphheterocitation(gr_sx, labels=labels, 1987, 2005)### [1] "Sx ALL / ABM / IBM"### [1] "0.047 / 0.214 / 0.007"### [1] "Dx ALL / ABM / IBM"### [1] "-0.927 / -0.690 / -0.982"

# Generate a baseline distribution for Dx values obtained through chance# Here, we run 200 iterations of node corpus permutationsbaseline<-MC_baseline_distribution(gr_sx, labels, 1987, 2018, 200)

# Assess whether our observed Dx is possibly due to chancesignificance_Dx(-0.927, baseline[["Dx ALL"]])### [1] "Distribution is normal. Performing t-test."###### One Sample t-test###### data: value - control### t = -323.0017, df = 319, p-value < 2.2e-16### alternative hypothesis: true mean is not equal to 0### 95 percent confidence interval:### -0.9159834 -0.9048923### sample estimates:### mean of x### -0.9104379###### [1] "Glass' effect size: -18.0563442219448"

significance_Dx 31

## End(Not run)

Index

∗Topic packageDiderot-package, 2

Author Hetero-citation(heterocitation_authors), 18

Author Heterocitation(heterocitation_authors), 18

Betweenness Centrality(compute_BC_ranking), 6

Bibliographic Network Creation(create_bibliography), 13

build_graph, 3, 5, 10, 12–14

Citation Graph Building (build_graph), 5Citation Graph Creation

(create_bibliography), 13Citation Graph Loading (load_graph), 19Citation Ranking

(compute_citation_ranking), 8compute_BC_ranking, 3, 6, 8compute_citation_ranking, 7, 8, 23, 26compute_custom_modularity, 3, 9, 12, 25compute_Ji, 10, 10, 12compute_Ji_ranking, 3, 7, 8, 11, 28compute_modularity, 3, 9, 12, 25create_bibliography, 3, 6, 10, 13, 15, 16Custom Modularity

(compute_custom_modularity), 9

Diderot (Diderot-package), 2Diderot-package, 2

get_date_from_doi, 13, 15, 15get_reference_title, 16Graph Saving (save_graph), 28

Hetero-Citation (heterocitation), 17Hetero-citation Timeseries

(plot_heterocitation_timeseries),24

heterocitation, 3, 17, 18, 19, 21, 24, 28, 30Heterocitation Significance

(significance_Dx), 29Heterocitation Timeseries

(plot_heterocitation_timeseries),24

heterocitation_authors, 3, 17, 18

igraph, 3

Ji Metric (compute_Ji), 10Ji Ranking (compute_Ji_ranking), 11

load_graph, 19, 29

makeCluster, 5MC_baseline_distribution, 17, 21, 29modularity (compute_modularity), 12Modularity Timeseries

(plot_modularity_timeseries),25

nb_refs, 22

plot_authors_count, 23plot_heterocitation_timeseries, 17, 24,

28plot_modularity_timeseries, 9, 12, 25plot_publication_curve, 26precompute_heterocitation, 3, 12, 17–19,

24, 27Publication Timeseries

(plot_publication_curve), 26

recover, 6

save_graph, 20, 28shapiro.test, 29significance_Dx, 17, 21, 29, 30

t.test, 29

32

package ‘diderot’ · diderot-package 3 version: 0.12 date: 2017-09-19 license: gpl (>=2) a...

Documents