journal usage factor a usage-based alternative to impact factor richard gedye

55
` Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate

Upload: saad

Post on 10-Jan-2016

67 views

Category:

Documents


0 download

DESCRIPTION

`. Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye UKSG Annual Conference April 2011, Harrogate. UKSG Usage Factor Project. Project rationale Issues addressed before data collection and analysis Collecting and analysing the data What data we collected - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

`

Journal Usage FactorA usage-based alternative to Impact Factor

Richard Gedye

UKSG Annual Conference April 2011, Harrogate

Page 2: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 3: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 4: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

The challenge…….

ISI's Impact Factor compensates for the fact that larger journals will tend to be cited more than smaller ones

Can we do something similar for usage?

In other words, should we seek to develop a “Usage Factor” as an additional measure of journal quality/value?

Page 5: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

For example…..

Usage Factor =

Total usage over period ‘x’ of articles published during period ‘y’

Total articles published during period ‘y’

Page 6: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Usage factor advantages

Especially helpful for journals and fields not covered by ISI

Especially helpful for journals with high undergraduate or practitioner use

Especially helpful for journals publishing relatively few articles

Data available potentially sooner than with Impact Factors

Page 7: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

“Authors select journals that will give their articles prestige and reach. Impact Factor is a widely used surrogate for the former, while perceived circulation and readership reflect the latter. But usage is becoming more important as a measure of reach”

Carol Tenopir

Page 8: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 9: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 10: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Key data issues we have addressed

1. Consistency – numerator/denominator

2. Defining article usage year3. Defining article publication date4. Different usage patterns by

subject

Page 11: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

With key data issues addressed, we developed a specification for a report via which participating publishers would deliver real usage data for analysis

Page 12: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye
Page 13: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Real journal usage data analysed by John Cox Associates and Frontline GMS

Participating publishers:- American Chemical Society Emerald IOP Nature Publishing OUP Sage Springer

Page 14: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 15: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 16: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 17: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

The data 326 journals

38 Engineering 32 Physical Sciences 119 Social Sciences

29 Business and Management 35 Humanities 102 Medicine and Life Sciences

57 Clinical Medicine

c.250,000 articles

3350 spreadsheets

1GB of data

Page 18: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

The data

Page 19: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 20: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 21: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

The calculation

Usage Factor =

Total usage over period ‘x’ of articles published during period ‘y’

Total articles published during period ‘y’

‘x’ is the usage period ‘y’ is the publication period

Page 22: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

JUF variables to be tested

Page 23: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

JUF variables to be tested

Subject comparisons Broad subjects:

Physical Sciences Medicine and Life Sciences Social Sciences Humanities Engineering

Narrow subjects Business and Management Clinical Medicine

Page 24: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 25: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 26: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Issues and challenges Article metadata resides in multiple databases

Key article metadata needed for JUF not included in usage log records

Need to map and merge different records

Lack of standards for key schemas and practices

Different publisher policies on article version labelling and availability

Multiple schemas for “article type” – typically journal rather than publisher specific

Some difficulties retrieving detailed historical usage records by article, especially when data straddled transfer between systems

Page 27: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 28: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 29: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Results Content Type

In social sciences JUFs were higher for non-article content

In medicine and life sciences JUFs were higher for article content

In humanities, physical sciences, and business & management, JUF differences between article and non-article content were not significant

Page 30: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Results Article Version

In physical sciences the JUF was significantly (sometimes dramatically) lower when calculations were confined to the Version of Record

In all other subjects the JUF was significantly higher when calculations were confined to the Version of Record

Page 31: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Results JUF and Impact Factor

Little correlation apart from the Nature branded titles

Some titles with no or very low impact factors have very high JUFs

Page 32: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Usage Factor Project – some initial results

Medicine & Life Science JUFIF

0.00

500.00

1,000.00

1,500.00

2,000.00

2,500.00

0 5 10 15 20 25 30 35

Impact Factor

Jou

rnal

Usa

ge

Fac

tor

Publication 2008-09 Usage Period 1-24

Page 33: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Usage Factor Project – some initial results

Humanities JUFIF

0.00

200.00

400.00

600.00

800.00

1,000.00

1,200.00

0 0.5 1 1.5 2 2.5 3 3.5 4

Impact Factor

Jou

rnal

Usa

ge

Fac

tor

Publication 2008-09 Usage Period 1-24

Page 34: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

34

How stable are journal rankings based on impact factor over time?ISI impact factorsMedical and life science journals (n=36)

How stable are journal rankings based on the classic ISI impact factor? The previously mentioned paper by Amin and Mabe shows that impact factors may fluctuate year on year by as much as plus or minus 40 per cent and that this is in part a function of journal size.

As a point for comparing the properties of the journal usage factor, we start this section by looking at changes in rankings among 36 medical titles over three years.

Key findings

Journal rankings based on ISI impact factor are pretty consistent over the period 2006-2008.

Rank order correlations

2008 vs 2007: Spearman’s rho = 0.973, p < 0.01

2007 vs 2006: Spearman’s rho = 0.971, p < 0.01

2008 vs 2006: Spearman’s rho = 0.959, p < 0.0

How to read this graphic

We start by putting the journals into ranked order by ISI impact factor for each of the three years 2006-2009, These charts show how these ranked orderings compare across different years. For example, the middle chart in the right hand column compares 2007 and 2008. If there were no changes in journal ranking, all the journals would like on the diagonal. Journals below the diagonal have fallen down the order.

Page 35: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

35

How stable are journal rankings based on usage factor over time?UKSG usage factorsMedical and life science journals (n=48)

The journal usage factors reported here are based on a single publication year with use being measured in months 1-24.

Key findings

Rankings based on a journal usage factor (based on one publication year and use in months 1-24), we find reasonable stability with high and statistically significant correlations.

Rank order correlation

2008 vs 2007: Spearman’s rho = 0.886, p < 0.01

2007 vs 2006: Spearman’s rho = 0.862, p < 0.01

2008 vs 2006: Spearman’s rho = 0.755, p < 0.01

The correlations are smaller than for the impact factors in the previous slide, but they are still high. This analysis shows that usage factors are more volatile than impact factors and any journal rankings based on them will show greater churn year on year. But, broadly speaking, they do a similar job.

Page 36: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Recommendations – the metric

The most promising JUF metric for further testing will be based on:- All content types except standing matter

Non-article matter is published for a purpose and its usage forms part of the usage of the journal as a whole

Item type control is difficult to manage All versions published

For simplicity and completeness Publication period: 2 years

For a greater “smoothing” effect on occasional unexplained peaks and troughs in usage

To reduce the effect of pre-”Version of Record” publication

Usage period: 2 years contemporaneous with publication period

To capture peak post-publication usage To keep the metric as current as possible

Page 37: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Recommendations - infrastructure

Development of systems to automate the extraction and collation of data needed for JUF calculation is essential if calculation of this metric is to become routine

Development of an agreed standard for content item types, to which journal specific item types would be mapped, is desirable as it would allow for greater sophistication in JUF calculation

Development or adoption of a simple subject taxonomy to which journal titles would be assigned by their publishers

Page 38: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Recommendations - infrastructure

Publishers should adopt standard “article version” definitions based on NISO recommendations

But no specific recommendations for the labelling or making available of these versions

Page 39: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Project rationale2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 40: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

1. Brief background2. Issues addressed before data

collection and analysis3. Collecting and analysing the data

What data we have collected Methodology Issues and challenges

4. Results and Recommendations5. Next steps

Page 41: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Next steps

Progress Report summarising Phase 1 and 2 will be published in Q1 2011

Meanwhile further analysis of the usage data collected in Phase 2 is being undertaken by CIBER at UCL

Page 42: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

42

Patterns of use across timeMonthly use of all items published in 2006Engineering journals (n=21)

About this analysis

In order to have an informed discussion about the optimal length of the time window to use to record downloads for the usage factor, we need to understand how items are used over time.

In this and the following analyses, we take all items published in 2006 and look at their monthly pattern of use over the subsequent three years.

Ideally, we need a longer time series, but this is all we have.

The trend line, which admittedly does not give an excellent fit to the data, suggests that aggregate usage of 2006 engineering items will trickle to near zero (i.e. become `asymptotic’) at around 45 months after publication.

The life span of original research articles and review papers is likely to be longer, as the `all items’ approach used here will contain much relatively ephemeral material such as editorial material and rapid communications.

How to read this graphic

This chart shows the number of downloads each month with a trend line.

Page 43: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

43

Patterns of use across timeMonthly use of all items published in 2006Humanities journals (n=24)

About this slide

Humanities items follow a generally similar pattern to engineering but with a shorter and more delayed peak.

The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 humanities items will trickle to near zero (i.e. become `asymptotic’) at around 48 months after publication.

Page 44: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

44

Patterns of use across timeMonthly use of all items published in 2006Physical sciences journals (n=3)

About this slide

The monthly pattern of use for physical sciences items is very different from the other broad subjects in this study. There is a very sharp initial peak followed by continuing and steady interest in items in the period months 14-36 [caution: we only have three journals].

There is not enough data to justify calculating an end point for physical sciences articles, but the well-fitting trend line suggests it may have been reached at or just after 36 months.

Page 45: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

45

Patterns of use across timeMonthly use of all items published in 2006Social sciences journals (n=115)

About this slide

The pattern in the social sciences is broadly similar to that for humanities items.

The trend line, which offers a reasonable fit to the data, suggests that aggregate usage of 2006 social sciences items will trickle to near zero (i.e. become `asymptotic’) at around 47 months after publication.

Page 46: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

46

Patterns of use across timeMonthly use of all items published in 2006Medical and life sciences journals (n=47)

About this slide

Monthly usage in the medical and life sciences shows an interesting double peak: a very immediate one in the first few months and another from month 12, which may well be due to a delayed open access and / or citation effect.

The trend line, which offers a good fit to the data, suggests that aggregate usage of 2006 medical and life science items will trickle to near zero (i.e. become `asymptotic’) at around 40 months after publication.

Page 47: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

47

Which time window is `best’?Cumulative use of all items published in 2006 by usage time windowComparison by broad subject area

Usage months

Engineering

Humanities

Medical and life science

s

Social science

s

1-12 31.6% 32.4% 27.2% 30.9%

1-18 49.0% 50.0% 43.6% 47.5%

1-24 63.8% 65.7% 60.8% 63.0%

13-24 32.2% 33.3% 33.6% 32.1%

13-36 55.6% 57.7% 61.8% 56.5%

% of life time

use

Page 48: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

48

Patterns of use across timeEarly conclusions

We ideally need a longer time series to be sure but it appears that a good working estimate for the useful lifetime of all items in all versions is about four years. The longevity of original research articles and review papers is likely to be longer than this, and possibly more highly differentiated between subjects but if all items are used, then this seems a reasonable position to take.

All the subjects show a peak roughly between months 6 and 12 and a broadly similar (and steady) pattern of cumulative item use in years 1-3.

Tentatively, a time window based on months 1-24 would seem to be the most appropriate: capturing information both about the peak and the subsequent steady state growth. If the estimations of lifetime use are accurate, roughly four years for all items, then it would appear that a 1-24 month window will capture a substantial proportion of lifetime use (all items, all versions), probably of the order of 60 per cent as a global figure.

A 1-12 month window will capture around 30 per cent of lifetime use.

Page 49: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

49

Patterns of use across itemsA problem with averages: the Bill Gates problemMedical and life sciences journals

Mean = 335.5

Many articles used a few times

A few articles used many times

49

About this slide

The histogram shows the frequency with which individual items are downloaded. The publication year is 2008 and this chart shows use in month 2.

The pattern is lognormal: many items are used rarely, a few items used many times.

An issue using the arithmetic mean to summarise this data is that the few heavily used items will exert a major effect: the mean will be a lot higher than other averages such as the mode or median. The International Mathematical Union has criticised ISI’s citation impact factor for this reason.

Page 50: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

50

Provisional conclusionsExploratory data analysis of UKSG usage factorsMedical and life science journals (n=48)

Publication year 2008, usage in months 1-24Unadjusted Cox data, mean and 95% confidence intervals

Publication year 2008, usage in months 1-24Log transformed CIBER data, mean and 95% confidence intervals

354

539

466

653

303

407

438

554

Articles, final versions attract significantly higher usethan all items, all versionsANOVA F=6.1,p < 0.01

Articles, final versions attract significantly higher usethan all items, all versionsANOVA F=6.8,p < 0.01

Page 51: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

51

Patterns of use across itemsExploratory data analysis of UKSG usage factorsMedical and life science journals (n=48), average JUFs CIBER 2008 1-24

How to read this graphic

This heat map shows the 2008 JUF broken down by document type and version, with average downloads per category. The colour coding places the numbers in broad bands.

Numbers on italics show the % of document types in the test collection.

This is indicative only: publishers are not consistent in how they describe document types.

Page 52: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

52

Patterns of use across itemsEarly conclusions

To be purist about this, the journal usage factor should not be based on an arithmetic average of the raw data. Several approaches are possible to a `better’ estimate of the average. The simplest is just to take the median. The approach taken in this report (and for the rest of the CIBER study) is to use a natural log transform. This may well not be necessary in the `real world’ but CIBER is pursuing this course of action because it will allow a range of statistical tests to be applied to the data with much greater confidence.There is very considerable variation in average use both by document type and version. This does not matter in one sense: if the intention of the exercise is to simply report on usage for the journal as a whole package, then it is completely valid to include all items and all versions. This would certainly have practical advantages for data processing.However, and the same stricture applies to the classic impact factor*, a journal usage factor could easily be manipulated by simply changing the balance of item types in the final issue. This is a serious issue for usage data.

*See, for example, Mayur Amin and Michael Mabe, Impact factors: Use and abuse, Perspectives in Publishing 1, Elsevier, 2000.

Page 53: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Next steps - Phase 3? Stakeholders are considering how the next Phase

of the JUF project might be structured and funded:-

Issues relating to subject taxonomies, metadata and automation of publisher processes to be addressed

In collaboration with data suppliers, develop agreed taxonomical and metadata standards and templates which will streamline the process of data collection and analysis

More detailed practical recommendations for a cost-effective infrastructure to manage the Usage Factor process.

Scaled up testing of candidate metrics recommended in Cox report

Page 54: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

UKSG Usage Factor Project

Many thanks to the sponsors of this latest phase:-

GOLD

SILVER ALPSP American Chemical Society STM Nature Publishing Group Springer

Page 55: Journal Usage Factor A usage-based alternative to Impact Factor Richard Gedye

Thank you for your attention!

http://www.uksg.org/usagefactors/