CRIS&OAR for Research Information Management
I.Filozova JINR LIT,
University “DUBNA”Dubna, Russia
SCHOOL ON JINR/CERN GRID AND ADVANCED INFORMATION SYSTEMS
Dubna
NOVEMBER 2-6, 2015
AcronymsCRIS&OAR
CRIS — Current Research Information System
OAR — Open Access Repository
[http://jds-test3.jinr.ru]
Mission of scientific organization:achievement scientific results, the satisfaction of the
scientific community
Search for Available
Information
Data Processing&
Data Generation
KnowledgeGeneration
Scientific Activity
New Knowledge Generation
New Knowledge Generation
Search for Available
Information
Data Processing&
Data Generation
KnowledgeGeneration
Publications:• printed articles• digital archives• repositories
Tables Plots Data Bases etc
Scientific Activity
Knowledge is fixed in images and signs of the natural and artificial languages.
Journal Crisis
end of the '90s:
The cost of subscription to scientific journals has grown 2-3 times faster than the growth rate of the budgets of academic libraries and inflation.
Price policy1 year cost ≥ 500 $The average cost of an annual subscription
to the Chemistry Journal ≥ 3000 $some journals ≥ 10 000 $
Journal Publisher
Year Price $
Journal of Comp. and Applied Mathematics
Elsevier 2008 4727
Applied Mathematics and Mechanics (6 issues)
Springer 2016 5 606
Applied Physics A Springer 2008 4989
Journal of Fluid Mechanics
Cambridge Univ. Press
2008 3200
Annals of Physics Elsevier 2016 3 928
Biochimica & Biophysica Acta
Elsevier 2012 20 930
Materials Science & Engineering A, B, C, & R
2015 Volkswagen Golf 1.6 ATnew
20 385 $
3 850 $Machu Picchu
2008: 17,986 $2016: 23 345 $
Open Access (OA) to Research
What about copyrights?• does not cancel the copyright and does not
contradict it;
How is OA realized?• public scientific archives and repositories —
Green road• publication in open access journals — Gold
road
Where does OA idea come from?1.Budapest Declaration Open Access Initiative
(http://www.budapestopenaccessinitiative.org/);
2.Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities
(http://openaccess.mpg.de/Berlin-Declaration).
Open Access BenefitsScientists and Researchers:• expansion readership and increasing readability;• increasing publication citation;• scientific impact;• growth of the author popularity and fastening of a
scientific priority.
Organization:• management of their digital resources;• increasing the scientific prestige of the
organization. Society:• return on investment in research; • removing barriers to information sharing; • creation of additional information services for
different users categories.
OAI-Protocol for Metadata Harvesting
HTTP
OAI-PMH
2 types of requests:1. SELECT ALL RECORDS;2. SELECT RECORDS WHERE <criteria>
6 commands:GetRecord, Identify, ListIdentifier, ListMetadataFormats, ListRecords, ListSets
BASIS
SUPERSTRUCTURE
Information Model OAI-PHMRESOURCE ↔ ELEMENT {ID_RECORD;
RECORDS}RESOURCE
IDENTIFIER METADATA SETS
Dublin Core
User Metadata
SetMARC21
RECORDS
. . .MARCXML
OAI Repositories over the World
Archives
USA 693 UK 231 Germ. 199 Japan 156 Spain 156 Brazil 136 India 102 China 90 France 87 Canada 81 Ukraine 73 Australia 75
Archives Italy 77 Taiwan 69 Russia 53 Portugal 48 Colombia 47 Sweden 45 S.Africa 40 Malaysia 36 Nether 35 Belgium 28 Greece 21
Number of Repositories — 4053 Number of Records ~ 39,000,000
according to the Registry of Open Access RepositoriesROAR – http://roar.eprints.org
Repository type
Open Access Statistics
Software to create and manage OARs
Software Number of repositories
DSpace 1579EPrints 567Bepress 366OPUS 72
Invenio 19
Greenstone 22Fedora 57
OAR Example 1
OAR Example 2JINR Document Server M http://jds.jinr.ru/
Research Information
Data/Metadata or Information about:
• Scientists• Project Managers• Ongoing and Completed Projects• Research Departments• Funding Organisations and Programmes• Research Results• Publications• Equipment• their timely Relationships (Semantics)
Who needs Research Information?
What is a CRIS?
Current Research Information System = CRIS
… information about
• People +• Organisations +• Projects +• Funding Programmes +• Research Results +• …
… that means
• Timeliness• Vitality
… driven by
• A Concept• A Model … incorporated as a
• Implementation (ICT)An integrated approach towards managing research
information
CERIF ModelCommon European Research Information Format
Instance Diagram
Person A
Publication X
OrgUnit O
OrgUnit M
OrgUnit N
Project P
member
member
employee
Part of
Part of
owns IPR
author
Project leader
Repository
HR System
webpages
webpages
ProjectManagementFinance
CERIF Features(1) data model (data-centric)(2) allows for a (metadata) representation of
–research entities –their activities / interconnections (research)–their output (results)
(3) allows for high flexibility with formal (semantic) relationships (4) enables quality maintenance, archiving, access and interchange of research information(5) supports knowledge transfer to decision makers, for research evaluation, research managers, strategists, researchers, editors, the general public
CRIS Example 1
CRIS Example 2ИСТИНА (https://istina.msu.ru/)
CRIS Example 3Personal INformation System JINR
PIN
CRIS&OAR Challenge
Collaboration of researchers, administration and librarians
CRIS and OARs should join forces to deliver the best possible services
Current Research Information Systems (CRIS)& Open Access Repositories (OAR)
Str
ate
gic
Layer
Op
era
tion
al
Layer
CRIS
OAR
Record the R&D (Research and Development) activity
Cover projects, people (expertise), organizational structure, R&D outputs, events, facilities and equipment
administrative
comprehensive
integrative
person-centric
analytics
public file-centric rights preservati
on distribute
d paradigm
Collect and preservate the R&D outputs
Services Set for the collaboration members to manage and distribute digital resources.
Commonalities: Bibliographic Information Affiliation Project Information
Managment: Financial
information Staff information R&D organisation
Managment: Bibliographic Data Full-Text Documents Authoritative Data
Resources Aggregative Approach
– Integrating with institutional HRM, project a.o. systems:
Sharing and re-using resources
Need Curation Processes & Human Responsibilities
CurationView
P
U
B
F
PProjectsresearch project
manager
Peoplestaff manager
Bibliographic
Information
bibliography specialist, librarian, content manager, identity manager
Materials&
Equipment
facility manager
Financefinancial officer
Normalize as much as possible: Authority Records*
+ More qualitative, consistent data + Minimizing the data input by end-users
+ More qualitative, consistent data + Minimizing the data input by end-users
Authority Controlidentify objects and concepts uniquely
Authorities Variety
Identifiers
Variety
LinkagesVariety
History Tracking
People, Institutes, Grants, Experiments, Projects, Journals, …
DOI, ORCID,...
n:m relations, Vertical linkages, Horizontal linkages
Predecessors/Successors
*search elements of bibliographic records
Authority Control
ToolResult
Source Data
CRIS & OAR Systems
Bibliographic Databases
Vocabularies,Ontologies, ORCID/AuthorClaim a.o. authors‘ identifying systems
Authority Control
1. Accounting of all name variants
2. Authoritative data disambiguation
in information search,
submission
Relevant Information
about R&D activity
Lists of Publications
Scientific Reporting
Bibliometrics & Scientometrics
JINR CRIS & OAR Systems
from file
JDS
JINR Document Server
Staff information: Employment profiles Bibliographic Archive Projects’ Information
©JINR
PINPersonal
Information System
from person
Scientific activities management:
entire lifecycle for conferences,
meetings, lectures
Indico, ©CERN
Integrated Digital Conferencing System
IDC
from event
Viewpoint
Open Access Repository of
materials concerning the R&D activity
Invenio, ©CERN
Jinr Document Server (JDS)JDS has created and developed as an
institutional repository with following content:
1. The research and scientific-related documents: – Publications issued in coauthorship with
JINR researchers;– Archive documents that describe all the
essential stages of the JINR research activity;
2. Documents providing informational support for scientific and technological research performed in JINR.
JDS: Information Services
• Search and navigation, • Creation of the user’s groups, • Saving search results, • Individual and group bookshelves, • Manuscripts deposition, • Discussions on the publications, • Sending out alerts and messages.
Invenio SOFT
• Unix-like OS - GNU/Linux distributions Debian, Gentoo, Scientific Linux (RHEL-based), Ubuntu
• HTML,CSS,JS• Python 2.7.5+• MySQL• Redis
Arc
hite
ctur
e
Trees
Collections
Subcollections
http://jds.jinr.ru
Collection Books
Information Card of Resource
Attachment toCollection
Authority Control RealizationSolved by: MARC21 Authorities + Invenio v1.2.1
API
MARC21 authorities Repeatable linking fields (fields 4xx, 5xx) Horizontal linking (subfield $w: $wa - predecessor, $wb-
successor) Vertical linking (subfield $w: $wt - parent) Repeatable System Control Number (field 035) Repeatable Standard Technical Report Number (field 027)
Module BibAuthority Enriching of bibliographic data with data from authority
records Re-indexing of bibliographic records containing links to
recently updated authority records Cross-referencing between MARC records($0 subfields)
Collection Authoritieshttp://jds-test3.jinr.ru
Collection Institutes. Record JINR
Record LIT. Detailed Information
Institute →Publication
Collection People. Author → Publication
Detailed Information about Author
Code Collection - MARC tag 980defines which documents
belong to the given collection
Experiment → Publication
Grant → Author → Publication
Thesaurus
Repository — place for storage and support any data.Archive — collection of the information resources + classification system (catalog).Knowledge — a existence and systematization form of the results of human cognitive activity.Knowledge (the subject) — the confident understanding of a subject, the ability to deal with it, to understand it and use to achieve some goals.Missing knowledge — knowledge known for humanity, but unknown to some person at the current moment (for example, the student and new subject of the educational program).
Knowledge in the wide meaning — a subjective image of reality in the form of concepts and ideas.
Knowledge in the narrow meaning — the possession of verified information (answers to questions), that allows to solve the challenge.Knowledge in the theory of artificial intelligence (AI) and expert systems — an information and inference rules about the world, objects properties, patterns of processes and phenomena, as well as the rules for the usage of them for decision-making.New knowledge — an information about the existence of any objects or their properties, of the real processes and phenomena, unknown for science previously, and not included in the current existing system of human representations about the world.
Open Access (OA) to Research — way of the scientific communication by realization of the author right of the product on publication in such a manner that any person can get access to product from any place and at any time at an own choice.Open Archives Initiative (OAI) — an organization to develop and apply technical interoperability standards for archives to share catalog information (metadata).Self-archiving — a deposition the digital documents (metadata + full-text) in a OAI-compliant Archive.“Proxy” self-archiving — a deposition on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves.Harvesting — automatic metadata gathering between repositories.OAI-PHM — Open Archives Initiative Protocol for Metadata Harvesting.
Metadata — structured data which describes the characteristics of a resource (“An Introduction to Metadata”, by Chris Taylor, University of Queensland)
Book: Title: Pushkin's Fairy TalesDate of Publication: 2012Author: Alexander PushkinEditor: Williams PaulTranslator: Elton Oliver, Krup JacobPublisher: Bright City
Structure: • Type of Resource• Title• Description• Source• Date• Author• Creator •…
Data about Data
Metadata
MARC21 — international standard for bibliographic data.A MARC bibliographic record consists of three main components: the Leader, the Directory, and the variable fields (http://www.loc.gov/marc/bibliographic/). 00X: Control Fields01X-09X: Numbers and Code Fields 1XX: Main Entry Fields20X-24X: Title and Title-Related Fields25X-28X: Edition, Imprint, Etc. Fields3XX: Physical Description, Etc. Fields4XX: Series Statement Fields5XX: Note Fields6XX: Subject Access Fields70X-75X: Added Entry Fields76X-78X: Linking Entry Fields80X-83X: Series Added Entry Fields841-88X: Holdings, Location, Alternate Graphics, Etc. Fields
Example MARC-record
Fields
035 - System Control Number (Repeatable)
100 - Personal Name (Not Repeatable)
245 - Title Statement (Not Repeatable)
700 – Add Entry - Personal Name (Not Repeatable)
SubFields
SubFields
Values
MAchine-Readable Cataloguing
XML — EXtensible Markup Language, metalanguage (language for description of other languages), universal format for structured documents and data (derived from SGML - Standard Generalized Markup Language) http://www.w3.org/XML/ Example:
<?xml version="1.0" encoding="utf-8"?> ]<->Prolog
<PRODUCTS> <PRODUCT> <TITLE> Product #1 </TITLE> <PRICE> 10.00 </PRICE> </PRODUCT> <PRODUCT> <TITLE> Product #2 </TITLE> <PRICE> 20.00 </PRICE> </PRODUCT></PRODUCTS>
Opening Tag
Closing Tag
Roo
t El
emen
t
Element Content
MARCXML — a framework for working with MARC data in a XML environment (http://www.loc.gov/standards/marcxml/)
Example MARCXML-record
Tag datafield = MARC fieldTag subfield = MARC subfieldElement Content = MARC subfield values
Open Access Idea
Digital LibrariesTools
Scientific and Educational Activity
Institutional Repositories in the form of Open Access
I. Digital Collection. Collection and preservation of intellectual output of organization.
II. Set of services for the collaboration members in order to manage and distribute digital resources.
Institutional Repository
CERIF — Common European Research Information Format1) CERIF is an EU Recommendation to Member States
(http://cordis.europa.eu/cerif/ )2) The European Commission (EC) has authorised euroCRIS to
maintain and develop CERIF and its usage(http://www.eurocris.org/cerif/cerif-releases/ )