välkomna till chals 2003 24 sep 2003 e-print repositories for research visibility : a journey from...
TRANSCRIPT
Välkomna till ChALS 200324 Sep 2003
e-Print Repositories for research visibility : a journey from there to
here
Pauline SimpsonSouthampton Oceanography
CentreUniversity of Southampton
England
Scholarly Communication and OAI
Välkomna till ChALS 2003(Chalmers Annual Library
Seminars)
24 Sep 2003
Välkomna till ChALS 200324 Sep 2003
Välkomna till ChALS 200324 Sep 2003
University of Southampton• Research led multidisciplnary
university:
• 20,000 students• 5000 staff (1500 researchers)
• Restructured Aug 2003: from 5 faculties, 65 departments
• 3 Faculties– Law, Arts and Social
Sciences– Medicine, Health and Life
Sciences– Engineering, Science and
Math
• 20 Schools• Education• Humanities• Law• Management• Social Sciences• Winchester School of Art• Biological Sciences• Health Care Innovation• Health Professions & Rehab• Medicine• Nursing & Midwifery• Chemistry• Civil Engineering & Environmental
Engng• Electronics and Computer Sciences• Engineering Sciences• Geography• Institute Sound & Vibration• Mathematics• Ocean and Earth Sciences (SOC)• Physics and Astronomy
Välkomna till ChALS 200324 Sep 2003
Southampton Oceanography Centre
SOC is one of the world’s leading centres for research and education in marine and earth sciences, for the development of marine technology and for the provision of large scale infrastructure and support for the marine research community.
Välkomna till ChALS 200324 Sep 2003
Road map
• Guide us through:
– Scholarly Communication– Open Archives Initiative– e-Print Archives
• Subject and institutional
– TARDis – Targeting Academic Research for Deposit and Disclosure
Välkomna till ChALS 200324 Sep 2003
Information space : building a global ‘collaboratory’
• The academic world is increasingly global and collaborative and needs the tools to support this
• …..center without walls, in which researchers can perform their research without regard to geographical location – interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries
Kouzes et al 1996 Collaboratories – doing science on the internet Computer, 29(8), 40-46
Välkomna till ChALS 200324 Sep 2003
How to get there
• Developing an infrastructure for data – the GRID– Other people will wish to use the same data so we
need tools to preserve and access it
• Developing an infrastructure for documents through ‘hybrid’ libraries:– Traditional and digital holdings– Commercial and open (free and interoperable) access– Bibliographic and full text
Välkomna till ChALS 200324 Sep 2003
PUB
SUB
LIB
A R
Primary channel - Scholarly Communication –
present model
Bibliometrics – citation analysis, impact factors
Evaluation – RAE, Tenure, Promotion
Research funding proposals
Välkomna till ChALS 200324 Sep 2003
1774 %
Välkomna till ChALS 200324 Sep 2003
‘Crisis in Scholarly Communication’
alternate models
• Open Access Journals
• Open Archive Initiatives
•‘Open’ = freely accessible - ‘open access journals’•‘Open’ = interoperable - Open Archives Initiative
The Case for Institutional Repositories: a SPARC position paper – prepared by Raym Crow July 2002
Supplemented by:SPARC Institutional Repository Checklist and Resources Guide October 2002
Välkomna till ChALS 200324 Sep 2003
Changing Publishing Paradigm
Authors Readers
OAI data providers OAI service providers
PUB
SUB
LIB
Authors Readers
PublishArchive/
access
Hybrid
roles
Information flow through Open Archives model
Citation analysis
Välkomna till ChALS 200324 Sep 2003
What are Open archives?
• Electronic repository of e-Prints, usually internet based for free access and dissemination
• Both Institutional and discipline based archives that allow public access to content and employ the Open Archive Initiative Metadata Harvesting Protocol
• nb. e-Print archives non OAI registered but still ‘open’
Välkomna till ChALS 200324 Sep 2003
e-Prints : variable definitions• e-Prints are electronic copies of any research
output (journal article, book section, conference paper, technical report etc.)
– preprints – unpublished papers before they are refereed
– postprints – papers after they have been refereed
• Also narrower and broader definitions:
– Peer-reviewed articles – original definition - Stevan Harnad
– Broad output – research + learning + datasets + multimedia + internal admin documents etc
Välkomna till ChALS 200324 Sep 2003
Variable definitions - spelling and what’s in a name?
• Eprints ; ePrints ; eprints• E-Prints ; e-prints
• e-Prints (Oxford English Dictionary)
• Archive - wrong connotations?
– repository – depository – service -
Välkomna till ChALS 200324 Sep 2003
e-Print origins
– ‘ invisible university’ culture» Exchange draft publications paper – high energy
physics : 50 – 1000 authors – needed electronic transmission
– Evolving digital environment» ARPA Internet 1970’s – Web 1990’s
– Culture + technology fix = the first archive
– Electronic preprints archives - Author self-archiving systems ArXiv (Los Alamos now at Cornell) (1991) set up by Paul Ginsparg for high energy physics community ( now physics (incl Atmospheric and Oceanic Physics, Math, Computing Science and nonlinear science).
Välkomna till ChALS 200324 Sep 2003
Subject based archives
• Early e-Print services were subject based and hosted by a single institution. Relied on distributed researchers remotely depositing their papers using the self archiving protocol
• Despite success of Los Alamos (now arXiv) - cautious uptake by other subject communities -
• Successful examples : Cogprints(1997), Chemistry Preprints Server, RePEc WoPEc (economics), etc
• Many of the subject based archives started by individual enthusiasts
Välkomna till ChALS 200324 Sep 2003
arXiv recent weekly usage
Red - Average number of connections.
Blue - Average number of hosts connecting (divide by 10 for correct number).
Green - Average number of new hosts. (divide by 10). Growing by 30,000 articles per month
Välkomna till ChALS 200324 Sep 2003
Major e-Print Drivers
– Crisis in scholarly publication
– Growing Call for Open Access
• Budapest Open Access Initiative http://www.soros.org/openaccess
• Launched 14 Feb 2002 by George Soros’s Open Society Institute• Worldwide coordinated movement dedicated to freeing online
access: OAI based self archiving and alternative journals• Open societies need open access• Scholars should be able to deposit their refereed journal articles
in open electronic archives which conform to OAI standards
Välkomna till ChALS 200324 Sep 2003
Support…
• Stevan Harnad, Univ Southampton ; leading advocate self archiving and now institutional model
Cogprints – September98 email list
• International Scholarly Communications Alliance– Worldwide organisations collaborate with scholars and publishers to establish equitable access to scholarly and research publications
• Funding …• Mellon Foundation $1.5m for seven USA OAI projects
• Budapest Open Access Initiative - Soros Foundation Open Information Society - $1m /3yrs
NSF funding grants for OAI projects (NSDL) $7M Focus—interoperability infrastructure (OAI)
Välkomna till ChALS 200324 Sep 2003
Origins of the Open Archive Initiative
• Oct 1999 – 1st meeting Santa Fe Convention• Universal Preprint Service – prototype –
renamed Open Archive Initiative• Dienst Protocol Metadata Harvesting Protocol• Early 2000 –the Cambridge Meetings• Aug 2000 - Support from Digital Library
Federation, Coalition for Networked Information and NSF
• Steering Committee Formed• Late 2000 Mission statement
Välkomna till ChALS 200324 Sep 2003
The OAI defines two participants
• Data Providers adopt the OAI technical framework as a means of exposing metadata about their content (held in repositories)– OAI conformant– OAI registered– OAI namespace-registered
• Service Providers harvest metadata from Data Providers using the OAI protocol and use the metadata as the basis for value added services
• Conceptually different but in reality Data Providers can offer both a service directly to users and also metadata for automated harvesters data providers need to offer value added services as well
Välkomna till ChALS 200324 Sep 2003
Open Archives Initiative Protocol for Metadata Harvesting :
OAI-PMH• Based on• HTTP Carrier protocol • Responses are encoded in XML• The Open Archives Metadata Set = Dublin Core
Metadata Element Set (unqualified)– Data providers must supply Dublin Core data via
OAI, so that all harvesters can use their data.• Question whether harvesting simple DC = loss of rich
metadata from the original record.
but• Now have a significant solution for open
(interoperable) archives• Laid down rules which make search services for many
distributed archives possible
Välkomna till ChALS 200324 Sep 2003
OAI Archive Model
Author
Open Repositories
Data Providers
Value-added Services
Service Providers
Reader
Institutional Servers
Disciplinary Servers
Journals(e.g., PLoS model) In
tero
per
abil
ity
Sta
nd
ard
sWorkflow
Applications
Integratedscholarly
communities
Search tools
OAI-PMH
Välkomna till ChALS 200324 Sep 2003
Supporting software
• Many enabling technologies, standards, and protocols to support institutional repositories already exist e.g. the OAI-PMH protocol to enable interoperability
• The World Wide Web is taken for granted as part of the infrastructure
• archiving software
• Initially one software freely available to implementers:• Eprints.org
Välkomna till ChALS 200324 Sep 2003
eprints.org GNU EPrints
• Software from IAM group University of Southampton is free
• Pioneered by Prof. Stevan Harnad to further the cause of self-archiving
• EPrints 2 (GNU Eprints) developed
by Chris Gutteridge
Välkomna till ChALS 200324 Sep 2003
Other e-Prints software emerging
• DSpace -Joint project of MIT Libraries and Hewlett Packard Company (Nov 2002) http://www.dspace.org
• CDSWare – CERN Document Server software http://cdsware.cern.ch
• ARNO – Academic Research in the Netherlands Online, Tilburg, Amsterdam, Twente http://www.uba.uva.nl/arno
• bPress – Univ California (eScholarship) http://www,cdlib.org
• Other own software (arXiv, Max Planck etc)
Välkomna till ChALS 200324 Sep 2003
CogPrints
(GNU EPrints)
1600 Records
www.orgprints.org
(GNU EPrints)
264 Records
arXiv(custom software)230,000 Records
D-Space @ MIT(D-Space Software)
769 Records
Harvester #1(Psychology Service)
500 Cogprints169 D-Space
Harvester #2(Physics Aggregator)
150,000 arXiv162 D-Space
Harvester #3(General Service)
230,000 arXiv769 D-Space264 OrgPrints
1600 CogPrints150,162 “Improved” records
from physics aggregator
Institutional repositories
Välkomna till ChALS 200324 Sep 2003
Service Providers (some)
• Arc Search engine• Callima Search engine• citeBaseSearc Search engine with citation ranking• CYCLADES Search engine• DP9 Search engine – deep web• iCite Citation indexing system for physics• My.OAI Search engine• NCSTRL Unified access computer sciences• OAIster Search engine• Perseus Search engine in humanities• Scirus Search engine – Elsevier• TORII Unified access physics-computer
» Ack: David Prosser
Välkomna till ChALS 200324 Sep 2003
Service provider - find the pearls
Välkomna till ChALS 200324 Sep 2003
Välkomna till ChALS 200324 Sep 2003
Entering another phase : Institutional repositories
• In 2000 - Complementary model to the subject
archives e-Print archives based on research output from one
institution.• Reawakening to value of greater access to an institution’s
research
• Essential increase in visibility of our intellectual output
• A preservation role (like our traditional archivists)
Välkomna till ChALS 200324 Sep 2003
Institutional repositories - early adopters
• Australian National University
• Aalborg University• Humbodlt-Universitat• Lund Universitet• National University of
Ireland• University of Glasgow• California Digital Library• MIT• University of
Southampton• Univerity of Cambridge• University of Tilberg
• Universite de Montreal• LMU Munchen• Utrecht University• CERN• University of Bath• University of Nottingham• Caltech• Academy of Sciences
Belarus• Hong Kong University• Netherlands (DARE)
Ack David Prosser
Välkomna till ChALS 200324 Sep 2003
Benefits of an Institutional Repository
•Provides Institutional information asset management
•Defines Institutional sources of research
•Identifies Institutions value to funding sources
•Raises the profile of the Institution
• Institutional research more visible, more impact and available in electronic form – cited more (Lawrence: Nature)
•Contributes to national and global initiatives which will ensure an international audience for Institution’s latest research.
• (Other universities are developing their own archives which, together, will be searchable by global search tools)
Välkomna till ChALS 200324 Sep 2003
Information community – taking a lead role – (1)
• Professional skills and expertise map to e-Print support and maintenance profile:
– Positioned in the scholarly communication process • Recorders of institutional scientific output• Publishers on behalf of institution
– Collection and dissemination of scholarly resources– Deliverers of seamless systems, e-resources etc– Resource discovery mechanisms in digital
environment (eg Z39.50)
Välkomna till ChALS 200324 Sep 2003
Information community – taking a lead role – (2)
– Database expertise– Records management – Work with metadata and preservation– Apply standards uniformly– IPR issues– Central service provider – Interact at all levels of the institution– Network culture– End user of free research corpus
Välkomna till ChALS 200324 Sep 2003
UK Programme
• 2002 UK Higher Education Funding Council– JISC FAIR Programme (Focus on Access to Institutional
Resources)
• Inspired by the vision of the Open Archives Initiative (OAI) that digital resources can be shared between organisations based on a simple mechanism allowing metadata about these resources to be harvested into services
• To support the disclosure of institutional assets: To support access to and sharing of institutional
content within Higher Education and Further Education and to allow intelligence to be gathered about the technical, organisational and cultural challenges of these processes…
Välkomna till ChALS 200324 Sep 2003
FAIR Programme
• £3 million on 14 projects starting August 2002– Museums and Images; e-Prints; e-theses; IPR; Institutional portals
• TARDis: Targeting Academic Resources for Deposit and dISclosure
• SHERPA: broader - Consortium of Research Libraries – filling archives and joint infrastructure
• HaIRST: A testbed for Scotland• ePrints-UK :harvesting UK e-Print archives also
investigating automated subject indexing using Dewey classification (with OCLC software in USA)
• eFAIR Cluster – exchange of experiences and work- includes e-Theses projects overlap in work areas
Välkomna till ChALS 200324 Sep 2003
Univ of Southampton e-Print Archive
• Project funding 30 months Aug 2002-2005:
Targeting Academic Research for Deposit and dISclosure (TARDis)– Project Manager, Research Assistants x 2,
Admin Officer• Implement a university e-Print archive –
sustainable product – e-Prints Soton• Evaluate self and mediated archiving measured
against discipline culture• Document the technical, organisational and
cultural issues of archiving• Feedback into the eprint software design
Välkomna till ChALS 200324 Sep 2003
TARDis Work Plan
• Early institutional e-Print archives have had problems with acquisition of content possibly because of self archiving protocol and discipline culture– Investigate the barriers
• Technical – hardware and software• Discipline culture• Depositors concerns
– Implementation• Policy considerations• Advocacy• Sustainability
Välkomna till ChALS 200324 Sep 2003
Barriers – hardware and skills set
Hardware and software requirements – GNU Eprints
– Apache WWW server– Unix / RedHat Linux
• Any computer capable of running GNU/Linux or similar operating system
– Perl programming language and modules– MySQL – public domain software
• Different skill sets needed for other software• eg. DSpace - requires Java skills
Välkomna till ChALS 200324 Sep 2003
Software Configuration (GNU EPrints v2.3) Everything should be made as simple as
possible
But not simpler. Albert Einstein • GNU Eprints - originally intended for self archiving – re- engineer for
institutional repository
• Simplify the deposit process– Reflect the look and feel of host web interface– Additional metadata fields for institutional structure:
• Faculties, Schools, Departments, Research Groups• Language• ISBN/ISSN?• Coporate author• On screen help
• Information management standards– Citation formats– Metadata fields to describe all document types – presented - logical
order– Global subject classification – or thesaurus– Deposit types & Document formats
Välkomna till ChALS 200324 Sep 2003
GNU EPrints requested software development
• Batch import• Export to personal bibliographic software –
EndNote• Authentication• Non Techie configuration• Automated subject classification• Automated metadata quality control• Automated metadata from full text• Full text searching• OpenURL compliant• etc
Välkomna till ChALS 200324 Sep 2003
Välkomna till ChALS 200324 Sep 2003
Document Formats – multidisciplinary needs
• Defaults : HTML, pdf, Postscript, ASCII• May want to subtract
– HTML• Unless carefully checked HTML output from
Word unsatisfactory• Add :
– Special document preparation formats: LaTex or common formats such as RTF
• Accept all formats – all research output, including imagery, Powerpoint, streaming videos etc
• Open source utility programs available to convert from non supported to supported formats
• Must ensure we have the viewers for users to download – eg postscript viewer
Välkomna till ChALS 200324 Sep 2003
Subject Classification / Thesaurus
• Early survey showed that all archives used either LoC or cut down version, or their own categories or published thesaurus JEL
• GNU EPrints Version 2 – installed Library of Congress as Default subject classification
– Established global scheme often used in University Libraries
– Top Level Headings• Subheadings to third level• Sufficient granularity?• Not a deposit friendly tool
• Possible to load additional classification or Subject based thesaurus?
• None at all – rely on title, keywords abstract or faculty structure as retrieval? But how can broad subject areas be harvested from a multidisciplinary archive without classification?
Välkomna till ChALS 200324 Sep 2003
Barrier – University cultureSurvey
– No central database record of University research output is maintained.
• Retrospective central research publications listings collated from individual departments and made available on the web (University Research Report)
• In interviews - researchers want from an archive– To enter a record only once and use for multiple
purposes– Export from e-Print repository for multiple purposes –
listings, web pages, University Research Report! etc – Import of existing School databases and listings– Definitive bibliographic records not just full text– Own branding
Välkomna till ChALS 200324 Sep 2003
E-Publishing on the University Web
Department
Total number of publications
Full text
Percentage of full text
Faculty of Law, Arts and Social Sciences
Archaeology 252 2 1% English 243 3 1% Modern Languages 160 0 0% Music 280 5 2% Politics 138 6 4% Economics 357 89 25%
Faculty of Medicine, Health and Life Sciences
Biology 796 24 3% Medicine 1603 247 15% Health Professions and Rehabilitation Sciences
332
0 0%
Nursing and Midwifery
439 0 0%
Faculty of Engineering, Science and Mathematics
Chemistry 1128 111 10% Electronics and Computer Science
7008 866* 12%
Maths Education 170 34 20% Mathematical Studies
849 310 37%
Ocean Circulation and Climate Group, SOES
286 9 3%
James Rennell Division, SOC
792 68 9%
* - personal web sites not counted
•Survey: researchers attitude to e-Publishing on the web.
–Snapshot–looked at web sites – personal and schools
Välkomna till ChALS 200324 Sep 2003
Addressing authors concerns
• Work load – (central bureacracy, new systems to learn (change overload), file format conversion)– Assisted submission – the library will do it! (medium
term)
• Quality control – loss of peer review. • Authors continue to submit articles to high impact
traditional journals and also contribute to e-print archives
• Undermining the status quo– Some editors paid by publishers– Reputations made within the present system– Dislike of anti-publisher stance– Self archive complements status quo
Välkomna till ChALS 200324 Sep 2003
Addressing authors concerns
• Visibility – compared with web pages– Standard search engines do pick up metadata from
archive but search must be specific eg Hall agent technology will be found but finding a paper from a subject search presents thousands of results (not efficient yet) - DP9 OAI Gateway Service for Web Crawlers to mine the deep web
• Ingelfinger rule - prior publication– Publishers gradually changing
• Authentication – probity (Life Sciences)– JISC project using TARDis as testbed
• Preservation– Implicit, Secure storage, migration
• Copyright!
Välkomna till ChALS 200324 Sep 2003
IPR particularly Copyright• Traditionally authors sign over copyright, whether they
own it or not!• Univ Southampton does not claim copyright on authored
works other than course material.
• We need to encourage/assist authors:– Place articles with open access publishers– Negotiate agreement with publisher to retain e-Print right – Deposit postprint (pre journal version in archive (Harnad-
Oppenheim strategy)
– FAIR Project ROMEO Copyright Transfer Agreement List http://www.lboro.ac.uk/departments/ls/disresearch/romeo/index.html
Välkomna till ChALS 200324 Sep 2003
Publisher copyright policies & self-archiving
Project ROMEO
Välkomna till ChALS 200324 Sep 2003
Publishers attitudes changingNature Publishing Group 19 Sep 2003
To ensure the continued success of our titles, and in recognition of the changing priorities of our authors, we have initiated a range of new policies and projects. Since early 2002, NPG no longer requires authors to transf er copyright. I nstead, we ask only f or an exclusive licence to publish. I n return, authors are f ree to reuse their papers in any of their f uture printed work and have the right to post a copy of the published paper on their own websites and in course packs. Further, we are introducing Advanced Online Publication (AOP) on all journals hosted by nature.com, allowing authors to distribute their papers more rapidly than ever before.
Välkomna till ChALS 200324 Sep 2003
Publishers making themselves OAI compliant
Institute of Physics: We are pleased to confirm that we have adopted this standard here at Institute of Physics Publishing and metadata records for our article abstracts are now available in Dublin Core. They can be ‘harvested’ from our server on request. August 2002
?How many library catalogues are OAI compliant?
Välkomna till ChALS 200324 Sep 2003
Archive Implementation - Policy decisions
• Software• Centralised or distributed databases - document type,
university grouping• Collection policy (research output from whom ?) • File formats • Deposit agreements • Authentication of depositers• Metadata quality control - level• Administrative/operational load • Sustainability• Copyright / IPR institutional policy of non transfer or
negotiate– Retain the right to distribute it for free for scholarly scientific purposes in particular,
the right to self archive it publicly online on the www.
• Long term archiving / preservation– Global problem- not just e-Prints – digital assets– UK – Digital Preservation Centre– Stanford USA - LOCKSS – investigating international federated preservation facility
Välkomna till ChALS 200324 Sep 2003
Implementation - Advocacy “if you build it they will come.” Costner: Field of dreams
• The biggest challenge is encouraging user participation:– Contribute content– Search/use the respository
• Leaflets• e-Print archive - demonstrator• Advocacy web site• Briefing paper to management – buy-in• Literature e.g. SPARC leaflet• Institutional magazines• Presenting at departmental meetings and university
committees• Special advocacy events• Carrots! – USB stick, pens etc
Välkomna till ChALS 200324 Sep 2003
Where are we now?• E-Prints Soton –new configuration pre trial with ‘friends’
• feedback– Font, colour, school names, cut and paste, LoC!, Unix systems
browsers etc
• Pilot two Schools– Ocean and Earth Sciences (60 papers already)– Social Sciences
• Researchers’ buy in - biggest challenge
• Demonstrate real value (save them time)• Build bibliographic database of university research
output not just full text!• School branding (Lund example but from a central
database)
Välkomna till ChALS 200324 Sep 2003
TARDis e-Prints Soton
Välkomna till ChALS 200324 Sep 2003
Information space - a national vision:e-Prints + data + e-learning
e-Banks UK
End of the journey?
When data and documents will be linked and easily accessible
They will be an integral part of the academic work space just as the World Wide Web is today
But the Web will acquire meaning and become the Semantic Web
Open Archive protocols and metadata standards are a part of this journey
Thank You
Implementing e-Prints is an emerging challenge for the information community
Pauline SimpsonSouthampton Oceanography Centre,
University of Southampton, [email protected]
Välkomna till ChALS 200324 Sep 2003
To keep up to date
Peter Suber keeps up to date with all these activities with the Free Online Scholarship Movement
Read his Open Access News blog (previously FOS Newsletter)
http://www.earlham.edu/~peters/fos/aboutblog.htm#namechange
Produced a Timeline to record the real momentum of archiving!