Download - Advantages and drawbacks of bibliometrics
Application of bibliometric analysis
Advantages & pitfalls
Thed van Leeuwen
Workshop on Research Evaluation in Statistical Sciences ,
Bologna, 25th March 2010
Introduction of bibliometrics
• Bibliometrics can be defined as the quantitative analysis of science and technology performance and the cognitive and organizational structure of science and technology.
• Basic for these analyses is the scientific communication between scientists through (mainly) journal publications.
• Key concepts in bibliometrics are output and impact, as measured through publications and citations.
• Important starting point in bibliometrics: scientists express, through citations in their scientific publications, a certain degree of influence of others on their own work.
• By large scale quantification, citations indicate influence or (inter)national visibility of scientific activity, but should not be interpreted as synonym for ‘quality’.
CWTS data system• CWTS has a full bibliometric license from Thomson
Reuters Scientific to conduct evaluation studies using the Web of Science.
• Our database covers the period 1981-2009.
• Some characteristics:– Over 31.000.000 publications.
– Over 350.000.000 citation relations between source papers.
– 100.000.000 authors (incl. variations), 15.000.000 ‘unique’ names.
– Over 60.000.000 addresses, some 90% cleaned up over the last 10 years.
– Contains reference sets for journal and field citation data.
Bibliometric indicators produced by CWTS
Some basic indicators are …
• P: number of publications in journals processed for the
Web of Science.
• C: number of received citations, excl. self-citations.
• CPP: mean number of citations per publication, excl. self-
citations
• Pnc: percentage of the publications not cited (within a
certain time-frame !!!)
• % SC: percentage self-citations related to an output set.
Important indicators are…
• CPP/JCSm: ratio between real, actual impact, and mean journal impact.
• CPP/FCSm: ratio between real, actual impact, and mean field impact.
• JCSm/FCSm: ratio between journal impact, and field impact, indicative for the ‘quality’ of the journal package in the field
Various types of analysis focus on …
• Research profiles: a break down of the output over various fields of science.
• Scientific cooperation analysis: a break down of the output over various types of scientific collaboration.
• Knowledge user analysis: a break down of the ‘responding’ output into citing fields, countries or institutions.
• Highly cited paper analysis: which publications are among the most highly cited output (top 10%, 5%, 1%) of the global literature in that same field(s).
• Social network analysis: how is the network of partners composed, based on scientific cooperation.
Journal & Field Normalization
Calculating the JCSm & FCSm ----------------------------------------------------------------------------------------------
Type publ. Journal Journal # citations
year category until 1999
----------------------------------------------------------------------------------------------
I review 1996 CANCER RES Oncology 17
II note 1997 J CLIN END Endocrinology 4
III article 1999 J CLIN END Endocrinology 6
IV article 1999 J CLIN END Endocrinology 8
----------------------------------------------------------------------------------------------
Calculating the JCSm & FCSm 2-----------------------------------------------------------------
CPP JCS FCS
-----------------------------------------------------------------
I 17 16.9 23.7
II 4 3.1 3.0
III 6 4.8 4.1
IV 8 4.8 4.1
-----------------------------------------------------------------
Calculating the JCSm & FCSm 3
The mean citation score is determined as:
17 + 4 + 6 + 8
CPP = ------------------ = 8.8
1 + 1 + 1 + 1
The mean journal citation score as: (1 x 16.9) + (1 x 3.1) + (2 x 4.8)
JCSm = -------------------------------------- = 7.4 1 + 1 + 2 The mean field citation score as:
(1 x 23.7) + (1 x 3.0) + (2 x 4.1) FCSm = -------------------------------------- = 8.7
1 + 1 + 2
CPP / JCSm
(8.8 / 7.4) = 1.19
CPP / FCSm
(8.8 / 8.7) = 1.01
Citation Windows & Impact Measurement
Citation measurement and ‘windows’
• Publication years, fixed citation ‘window’.
Publications of 2002, with three citation years (namely 2002, 2003, and 2004), followed by 2003, with three years, etc.
• Blocks of publication years with a window decreasing in length.
Publications of 2002-2005, with citation window of 4 years (2002-2005), 3 years (2003-2005), 2 years (2004-2005), and 1 year (2005).
Citation measurement with ‘fixed window’
Citation years
2002 2003 2004 2005 2006 2007 2008 2009
2002
2003
2004
2005
2006
2007
2008
2009
2002 2003 2004
2003 2004 2005
2004 2005 2006
2005 2006 2007
2006 2007 2008
2007 2008 2009
2008 2009
2009
Citation measurement with ‘year blocks’
Citation years 2002 2003 2004 2005 2006 2007 2008 20092002
2003
2004
2005
2006
2007
2008
2009
2002 2003 2004 2005
2003 2004 2005
2004 2005
2005
2003 2004 2005 2006
2004 2005 2006
2005 2006
2006
2004 2005 2006 2007
2005 2006 2007
2006 2007
2007
2005 2006 2007 2008
2006 2007 2008
2007 2008
2008
2006 2007 2008 2009
2007 2008 2009
2008 2009
2009
Methodological issues
Adequacy of citation indexes : implications for bibliometric studies
How to tackle this issue ?
• We conduct analyses on the adequacy of the citation indexes across disciplines based on reference behavior of researchers themselves.
• The degree of referring towards other indexed literature indicates the importance of journal literature in the scientific communication process.
WoSNon-WoS
Non-WoS WoS
Citing/Source
Cited/Target
?%?%
Assessment of WoS Coverage
Non-Wos Journals
Books
Conference proceedings
Reports
Etc.
WoSNon-WoS
Non-WoS WoS
Citing/Source
Cited/Target
75%25%
Total ISI/WoS Database (2002)
The medical & Life sciences
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
AGRICULTUREAND FOODSCIENCE
BASIC LIFESCIENCES
BASIC MEDICALSCIENCES
BIOLOGICALSCIENCES
BIOMEDICALSCIENCES
CLINICALMEDICINE
HEALTHSCIENCES
References non-ISI
References ISI
The natural sciences
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
ASTRONOMYAND
ASTROPHYSICS
CHEMISTRYAND
CHEMICALENGINEERING
COMPUTERSCIENCES
EARTHSCIENCES
ANDTECHNOLOGY
ENVIRONMENTALSCIENCES ANDTECHNOLOGY
MATHEMATICS PHYSICS ANDMATERIALSSCIENCE
STATISTICALSCIENCES
References non-ISI
References ISI
Statistical sciences0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
1991
1996
2001
2006
References ISI
References non-ISI
The engineering sciences
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006 1991 1996 2001 2006
CIVIL ENGINEERINGAND
CONSTRUCTION
ELECTRICALENGINEERING AND
TELECOMMUNICATION
ENERGY SCIENCEAND TECHNOLOGY
GENERAL ANDINDUSTRIAL
ENGINEERING
INSTRUMENTS ANDINSTRUMENTATION
MECHANICALENGINEERING AND
AEROSPACE
References non-ISI
References ISI
The social– and behavioral sciences
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%1
99
1
19
96
20
01
20
06
19
91
19
96
20
01
20
06
19
91
19
96
20
01
20
06
19
91
19
96
20
01
20
06
19
91
19
96
20
01
20
06
19
91
19
96
20
01
20
06
19
91
19
96
20
01
20
06
ECONOMICSAND BUSINESS
EDUCATIONALSCIENCES
MANAGEMENTAND PLANNING
POLITICALSCIENCE AND
PUBLICADMINISTRATION
PSYCHOLOGY SOCIAL ANDBEHAVIORALSCIENCES,
INTERDISCIPLINARY
SOCIOLOGY ANDANTHROPOLOGY
References non-ISI
References ISI
The humanities
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
1991
1996
2001
2006
INFORMATION ANDCOMMUNICATION
SCIENCES
LANGUAGE ANDLINGUISTICS
CREATIVE ARTS,CULTURE AND
MUSIC
HISTORY,PHILOSOPHY AND
RELIGION
LAW ANDCRIMINOLOGY
LITERATURE
References non-ISI
References ISI
Overall WoS coverage by main field
EXCELLENT (> 80%)
VERY GOOD (60-80%)
GOOD(40-60%)
Biochem & Mol Biol
Appl Phys & Chem
Mathematics &Statistical
sciences
Biol Sci – Humans
Biol Sci – Anim & Plants
Economics
Chemistry Psychol & Psychiat
Engineering
Clin Medicine Geosciences MODERATE (<40 %)
Phys & Astron Soc Sci ~ Medicine
Other Soc Sci
Humanities & Arts
Conclusions on adequacy issue
• We can clearly conclude that the application of bibliometric techniques, solely based on WoS (but very likely also Scopus) will not be valid for some of the ‘soft’ fields in the social sciences and the humanities.
• That is why the tool box has to be extended !
The H-Index and its limitations
The H-Index, defined as …
• The H-Index is the score that indicates the position at which a publication in a set, the number of received citations is equal to the ranking position of that publication.
• Idea of an American physicist, J. Hirsch, who published about this index in the Proc. NAS USA.
Examples of Hirsch-index values
• Environmental biologist, output of 188 papers, cited 4,788 times in the period 80-04.
• Hirsch-index value of 31
• Clinical psychologist, output of 72 papers, cited 760 time sin the period 80-04.
• Hirsch-index value of 14
0
50
100
150
200
250
300
350
0 20 40 60 80 100 120 140 160 180 200
Value of H-Index= 31
Citations
Publications
0
10
20
30
40
50
60
70
80
0 10 20 30 40 50 60 70 80
Value of H-Index= 14
Citations
Publications
Problems with the H-Index
• For serious evaluation of scientific performance, the H-Index is as indicator not suitable, as the index:
– Is insensitive to field specific characteristics (e.g., difference in citation cultures between medicine and other disciplines).
– Does not take into account age and career length of scientists, a small oeuvre leads necessarily to a low H-Index value.
– Is inconsistent in its ‘behaviour’.
• Actual versus field normalized impact (CPP/FCSm) displayed against the output.
• Large output can be combined with a relatively low impact
Soc
HumMat
Soc
Eng
Psy
Eng ChePsyMed
Med
Che
Med
Med
Phy
PhyBio
BioPhy
Psy
Env
Phy
Med
Bio
MedMed
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
0 50 100 150 200 250
TOTAL PUBLICATIONS
CP
P/F
CS
m
• H-Index displayed against the output.
• Larger output is strongly correlated with a high H-Index value.
Med
Med
Bio
MedPhy Env
PsyPhy
BioBioPhy
Phy MedMed
CheMedMed Psy
CheEng
PsyEng
SocMat
HumSoc
0
10
20
30
40
50
60
0 50 100 150 200 250
TOTAL PUBLICATIONS
H-i
nd
ex
Consistency: Definition
Definition. A scientific performance measure is said to be consistent if and only if for any two actors A and B and for any number n ≥ 0 the ranking of A and B given by the performance measure does not change when A and B both have a new publication with n citations.
35
Consistency: Motivation
• Consistency ensures that if the publishing behavior of two actors does not change over time, their ranking relative to each other also does not change
• Consistency ensures that if the individual researchers in one research group X outperform the individual researchers in another research group Y, the former research group X as a whole outperforms the latter research group Y.
36
Inconsistency of the h-index
37
Actor A Actor B
0 2 4 6 8 10 120
1
2
3
4
5
6
7
8
9
publications
cita
tions
0 2 4 6 8 10 120
1
2
3
4
5
6
7
8
9
publications
cita
tions
h = 4 h = 6
0 2 4 6 8 10 120
1
2
3
4
5
6
7
8
9
publications
cita
tions
0 2 4 6 8 10 120
1
2
3
4
5
6
7
8
9
publications
cita
tions
h = 6h = 8
ISI Impact Factors: calculation and validity
Methodology: ISI’s classical IF
• The ISI Impact Factor (IF) is defined as the number of citations received by a journal in year t, divided by the number of citeable documents in that same journal in the years t-1 and t-2,
• Or, as a Formula:
Citations in year t Number of ‘citeable documents’ in t-1 & t-2
Share ‘citations-for-free’ for The LancetPublications Citations
90+91 1992
Article 784 2986
Note 144 593
Review 29 232
Sub-total 957 (a) 7959 (b)
Letter 4181 (d) 4264 (e)
Editorial 1313 905
Other 1421 909
Total 7872 14037 (c)
• ISI Method:
Citations in 2000 .
Citeable documents in ‘98 and ‘99
14037 (c) 957 (a) IF=14.7
• CWTS Method:
Citations to Art/Not/Rev in 2000 .
Art/Not/Rev in ‘98 and ‘99
7959 (b) 957 (a)
Citations to Art/Let/Not/Rev in 2000 .
Art/Let/Not/Rev in ‘98 and ‘99
7959+4264 (b+e) 957+4181 (a+d)
IF=8.3
IF=2.4
ISI Impact Factors
• From 1995 onwards CWTS has analyzed the uses and validity ISI Journal Impact Factor (IF).
• Most important points of criticism were:
– Calculated erroneously.
– Not sensitive for the composition of the journal in terms of the document types.
– Not sensitive for the science fields a journal is attached to …
– Based on too short ‘citation windows’.
Distribution of citations used for the calculationof the IF value of The Lancet
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
• The IF-score of The Lancet is seriously ‘overrated’ by the scientific ‘audience’ of the journal.
• The red area indicates citations ‘for free’, while the blue area indicates ‘correct citations’
Impact Factors for Br. J. Clin. Pharm. and Clin. Pharm. & Ther.
• The graph shows the correct and erroneous impact factors of BJCP and CPT
• In the case of CPT, citations to published meeting abstracts are included, while BJCP has stopped publishing of meeting abstracts !
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
CPT Err IF
CPT IF
BJCP Err IF
BJCP IF
Document types and fields
IMMUNOLOGY ANN REV IMMUNOL 50.49 1 5.18 1
BIOCHEM & MOLECULAR BIOL ANN REV BIOCHEM 34.61 1 4.10 3
PHARMACOL & PHARMACY PHARMACOLOGICAL REV 27.74 1 4.75 1
CELL BIOL ANN REV CELL & DEVELOPM BIOL 27.53 1 1.72 13
DEVELOPMENTAL BIOL ANN REV CELL & DEVELOPM BIOL 27.53 1 1.72 3
PHYSIOLOGY PHYSIOLOGICAL REV 24.82 1 3.18 1
CELL BIOLOGY NATURE REV MOL CELL BIOL 22.21 4 2.76 8
ENDOCRINOL & METABOLISM ENDOCRINE REV 21.98 1 2.87 1
NEUROSCIENCES ANN REV NEUROSCIENCE 21.89 1 3.12 4
PHYSICS REV MODERN PHYSICS 20.14 1 5.02 1
CHEMISTRY CHEMICAL REV 19.67 1 2.89 2
Field Journal IF JFIS
The IF is for ‘02, JFIS covers ‘98-‘02
Fields and Citation windows0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
POLYMER SCIENCE (55)CHEM, APPLIED (25)
CHEM, CLIN&MEDIC (8)CHEM, PHYSICAL (78)
CRYSTALLOGRAPHY (18)ELECTROCHEMISTRY (10)
CHEM, INORG&NUC (37)BIOCH & MOL BIOL (169)
CHEM, ORGANIC (42)CHEMISTRY (128)
CHEM, MISCELLAN (7)CHEM, ANALYTICAL (54)
ENG, INDUSTRIAL (14)ENG, MANUFACT (5)
ENGINEERING (84)ENG, BIOMEDICAL (33)ENG, PETROLEUM (8)ENG, MECHANIC (69)
ENG, CIVIL (49)ENG, ENVIRONM (6)
ENG, CHEMICAL (69)ENG, MARINE (8)
ENG, ELECTRICAL (127)
PHYSICS, MATHEMA (10)ACOUSTICS (20)
THERMODYNAMICS (11)PHYSICS, FLUIDS (16)PHYSICS, MISCELL (6)PHYSICS, AT,M,C (22)
OPTICS (37)PHYSICS, APPLIED (49)
PHYSICS, COND MA (36)PHYSICS (85)
PHYSICS, NUCLEAR (16)PHYSICS, PART&FI (11)
Chem
istry
Engi
neer
ing
scie
nces
Phsy
ics
Citation measurement of IF
2002 2003 2004 2005 2006 2007 2008 2009
2002
2003
2004
2005
2006
2007
2008
2009
2002 2003 2004
2003 2004 2005
2004 2005 2006
2005 2006 2007
2006 2007 2008
2007 2008 2009
2008 2009
2009
CWTS answer to the problems of the IF
• This indicator is the JFIS, the Journal-to-Field Impact Score.
• The JFIS solves the main objections against the Impact Factor, as
– the calculation of JFIS is based on equally large entities,
– document types are taken into account,– JFIS is field-normalized, and finally,– based on longer citation windows (1-4 years)
Citation measurement of JFIS
Citation years 2002 2003 2004 2005 2006 2007 2008 20092002
2003
2004
2005
2006
2007
2008
2009
2002 2003 2004 2005
2003 2004 2005
2004 2005
2005
2003 2004 2005 2006
2004 2005 2006
2005 2006
2006
2004 2005 2006 2007
2005 2006 2007
2006 2007
2007
2005 2006 2007 2008
2006 2007 2008
2007 2008
2008
2006 2007 2008 2009
2007 2008 2009
2008 2009
2009
End of the presentation
For questions regarding the contents of the presentation, mail to: [email protected]