comparison of methods – an unloved duty? examples from an ongoing bibliometric study
TRANSCRIPT
dans.knaw.nlDANS is an institute of KNAW en NWO
Comparison of methods – an unloved duty? Examples from an ongoing bibliometric study
Andrea Scharnhorst, Rob Koopman, Shenghui Wang
eHumanities group, Research meeting, Feb 11, 2016
What are pattern?
Related words[indicative] [occurring] [occurrence] [patterns] [consistent] [distribution] [restricted] [portions] [origin] [distinct]
http://thoth.pica.nl/relate? Pattern
Clusters as specific patterns
Related words[clusters] [clustering] [clustered] [distances] [molecular] [structure] [arrangement]
http://thoth.pica.nl/relate? Cluster
What are topics?
Related words[researchers] [topics] [reviewing] [discussion] [interested] [questions] [discussing] [methodological] [suggestions] [great deal]
http://thoth.pica.nl/relate? topic
What is this all about? ‘Same data – different results’
A group of bibliometricians and sociologists of science started a project to delineate scientific topics by means of looking into scholarly communication, more specifically into journal articles from the field of astrophysics.They applied different methods of clustering documents into groups, representing scientific topics, based on information from the bibliographic record.They compare the methods to gain a better understanding what kind of bibliometric approach actually produces what kind of representation of a topic.
Why is this important?Topics or bigger entities as fields are used:
- used to be used to classify and better order knowledge (subject headings) and to let us find things easierbibliometrics- to determine the degree of interdisciplinary - to understand how innovation emerges at the boundaries of fields- to determine in which emergent fields to invest- to evaluate individual researchers in comparison a their ‘reference field’
Can we understand the pattern (cluster) we find?
Hellsten, I., Lambiotte, R., Scharnhorst, A., & Ausloos, M. (2007). Self-citations, co-authorships and keywords: A new approach to scientists’ field mobility? Scientometrics, 72(3), 469–486. doi:10.1007/s11192-007-1680-5
Ambiguity at levels of the research process
Conceptual level
What are the atoms of science? Topics? Fields? Specialties?How are they defined?
Empirical level
What traces to be used?Journal articles Which part(s) of them?
Methodological level
On the basis of which approach we group articles? Because they share references, words, authors, journals, ….?
Correspond the structures/pattern/clusters wesee with the topical structure we wanted to explore?
Background: “Same data, difference results”• Evolved from annual meetings of advisory project funded by German Ministry for Education and Research on ‘Measuring Diversity in Science’• To measure epistemic diversity of a field, the field needs to be delineated and topics identified• Compare solutions derived from same data set • Series of workshops (Berlin 9/2014, Amsterdam 4/2015, Berlin 8/2015)• Special session at ISSI 2015, July in Istanbul
The Astro dataset
• Source: Web of Science (Thomson Reuters)• 8 years: 2003 -2010• 59 astrophysics and astronomy journals• 111,161 articles, letters & proceedings papers
Six teams
• Humboldt University of Berlin• University of Michigan• SciTech Strategies• University of Leuven• CWTS• OCLC & DANS
Eight clustering solutions
Topic Extraction Workflow
T. Velden. Same Data, Different Results-- On a Comparative Topic Extraction Exercise. SIGMET Workshop at ASIST 2015
Overview Approaches
Direct Citation
Bibliogr. Coupling
Hybrid (bc & terms/NLP)
Semantic matrix
Projection onto Global Direct Citation Map
Infomap UMSI -- -- -- --
SLMA CWTS -- -- -- STS
Memetic HU -- -- -- --
Louvian -- ECOOM ECOOM OCLC --
K-means -- -- -- OCLC --
HU: Humboldt University; CWTS: Centre for Science and Technology Studies, Leiden; ECOOM: Expertisecentrum Onderzoek en Ontwikkelingsmonitoring; UMSI: University of Michigan School of Information, OCLC: Online Computer Library Center, Inc.; STS: SciTech Strategies
T. Velden. Same Data, Different Results-- On a Comparative Topic Extraction Exercise. SIGMET Workshop at ASIST 2015
Cluster comparison
• Overlap measures• Normalised mutual information• Overlap index
• Visualisation• Thesaurus mapping• Semantic similarity • Topic affinity network• VOSView term maps
Cluster labelling
• Descriptive, human-readable labels for the clusters produced by automated processes• Different methods:
• Internal information• Differential labelling• External knowledge• Experts
Mutual information based labelling
https://en.wikipedia.org/wiki/Mutual_information
Normalised mutual information
Labelling results
Concept map by Marcus John
Visual comparison using labels
• Select 50 most informative labels for each clustering• Combine into one list of 61 labels• Re-compute the NMI between each cluster and each
label• Each cluster is represented by a 61 dimensional vector
Fingerprints of clusters
grb
Clus
ters
Labels
Ambiguity in the topic extraction workflowSchema courtesy of Theresa Velden
TopicsInterpretation Evaluation• Labelling• Visual representations
•Experts
Comparison• Set–based• Ensemble statistics• Labelling
dans.knaw.nlDANS is an institute of KNAW en NWO
Thanks for your attention!
[email protected]; xxxxTwitter: @knowescape