multilingual subject access to catalogues of national libraries (msac) czech republic’s...

34
Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia, Slovenia Marie Balíková National Library of the Czech Republic [email protected] MSAC

Upload: phillip-park

Post on 15-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Multilingual Subject Access to Catalogues of National Libraries

(MSAC) Czech Republic’s collaboration with

Croatia, Latvia, Lithuania, Macedonia, Slovakia, Slovenia

Marie BalíkováNational Library of the Czech Republic

[email protected]

MSAC

Page 2: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

2

introduction

• the aim is to provide users with an authorized indexing and retrieval tool for multilingual subject searching in online environment

• the initiative is complying with the main goals currently defined by IFLA for the activity of Indexing and Classification Section: – Changing Roles of Subject Access Tools (Berlin)

– Implementation and Adaptation of Global Tools for Subject Access to Local Needs (Buenos Aires)

– Cataloguing and Subject Tools for Global Access: International Partnerships (Oslo)

MSAC

Page 3: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

3

CZENAS - MSAC

Czech National Subject Authority File/CZENAS • cooperative venture of three large libraries in Czechia:

– National Library of the Czech Republic– Moravian Library in Brno– Research Library in Olomouc

Multilingual Subject Access to Catalogues of National Libraries/MSAC• joined initiative of seven national libraries:

– National and University Library, Zagreb, Croatia– National Library of the Czech Republic, Prague – National Library of Latvia, Riga– Martynas Mazvydas National Library of Lithuania, Vilnius – National and University Library St. Kliment Ohridski, Skopje,

Macedonia– Slovak National Library in Martin, Slovakia– National and University Library, Ljubljana, Slovenia

MSAC

Page 4: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

4

factors affecting subject indexing

• the standardization of subject retrieval process and indexing and classification tools which– minimizes duplication of work in sharing information– supports shared cataloguing process at national and

international level• the possibility of interoperability among different indexing and

classification schemes which consists in– intellectual mapping between terms in different controlled

vocabularies– using a switching language as an intermediary for moving

among equivalent terms in different vocabularies, above all multilingual

• the possibility to increase precision and recall trough Z39.50 protocol and its profiles and to apply authority control whenever possible – in all databases searched through, introducing the same subject search criteria both in remote and local databases

MSAC

Page 5: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

5

multilingualism issue in online environment

• is a complex issue• users may want

– to search a multilingual collection by using queries in one language or

– to retrieve documents in a number of specific languages– to prefer an interface in the language of their choice

• solution: the users are provided with the language support they need

• possible limits: – technologies– language skills of the staff– financial means

Therefore, there have been only few attempts to create a multilingual subject access tool or to integrate already existing library systems in the area of multilingual subject access

MSAC

Page 6: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

6

subject analysis process in online environment

• to prefer post-coordinated indexing system

• to simplify application syntax in subject headings strings

• to support conceptual compatibility of indexing formulas/preferred terms used in various indexing languages

• to support harmonisation between various indexing languages

• to support mapping between verbal terms and equivalent notations of classification scheme

• to improve subject access for OPACs and for Web resources

MSAC

Page 7: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

7

UDC classification system in on-line environment

can enhance subject access, because it– provides context to search terms – covers all subjects – improves subject access to large databases using

sophisticated methods – enables language independent notations to be linked to

search terms of various verbal  languages – enables other languages to be joined later without the need

to classify the resources again – could serve as switching language, mapping mediator

which ensures convertibility between information languages

– supports very detailed expressions of complex subjects using a variety of common and special auxiliaries, specific symbols and punctuation

– is flexible more than other universal classification schemes– indicates entities which occur in more than one domain

(class)

MSAC

Page 8: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

8

examples

Heading water

UDC  546.212 (inorganic chemistry)

UDC 556-032.2 (hydrology)

UDC  628.1.03 (water management)

Heading incest

UDC  316.835.2 (sociology)  

UDC 343.542.5 (criminal law)  

UDC 616.89-008.442.38 (psychiatry)

MSAC

Page 9: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

9

MSAC and UDC• UDC system proved to be the most suitable for creation of a

multilingual common indexing tool • all the participating libraries used it, even if in different

versions • in MSAC is applied as an enumerative classification,

functionality very similar to that of DDC• UDC numbers – single and complex (pre-combined) are

treated as single numbers • present revisions of UDC - more faceted structure

– frequent need to use combination of numbers like 821 Literature and 94 History

• number 821 for literature has to be combined with the common auxiliary for language, e.g. 821.162.3 Czech literature

• class number captions (descriptions) added to the retrieval system and available for search in the end-user interface – most effective and user-friendly

• in MSAC system UDC class numbers are used alongside their descriptions

MSAC

Page 10: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

10

examples

• 602.44 -- biotransformation / biotransformace • 602.6 -- gene engineering / genové inženýrství • 602.6 -- genetic engineering / genetické inženýrství • 602.6 -- transgenosis / transgenoze • 602.641 -- viral vectors / virové vektory • 602.7 -- cloning / klonování • 604.4 -- secondary metabolites / sekundární

metabolity • 604.6 -- genetically modified organisms / geneticky

modifikované organismy • 608.1 -- bioethics / bioetika • 608.3 -- biological safety / biologická bezpečnost

MSAC

Page 11: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

11

citation order / UDC MRF in electronic form

Citation order• UDC facility to adapt the citation order to fit in with local

requirements• international exchange of information demands

consistency in building UDC class numbers• the same citation order should be adopted UDC MRF in electronic form • national language versions of UDC MRF in electronic

form have not been prepared yet• the language equivalents of controlled terms created by

participating libraries are being added to the Czech Subject Authority file

MSAC

Page 12: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

12

Czech National Subject Authority File - CZENAS

integrated indexing and retrieval tool in which verbal controlled terms are being linked to UDC equivalent notations

• respecting IFLA recommendation - to consider possible relationships between subject authority records and classification

• respecting LC practice

topical authority file - thesaurus in which following kinds of relationships between terms are defined:

• equivalence (expressed: USE)• hierarchy (expressed: BT-Broader term; NT-Narrower term)• association (expressed: RT-Related term)

Czech authority file of topical terms - base for multilingual controlled vocabulary

MSAC

Page 13: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

13

formats MSAC supports both UNIMARC and MARC 21• UNIMARC: Croatia, Lithuania• Comarc (based on UNIMARC): Slovenia, Macedonia• MARC 21: Latvia, Slovakia, Czechiaintention - to respect MARC formats as much as possible, but in

view of specific needs identified, some extensions and corrections have to be introduced

• fields for entering combinations of language variants and UDC notations extended by – subfield “b” (UDC equivalent notation) – subfield “c” (UDC qualifier)– UNIMARC - tag 450: subfields a, b, c – MARC 21 - tag 750: subfields a, b, c

• MARC 21 Format for Authority had to be extended by special field 089 for entering UDC number

MSAC

Page 14: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

14

English equivalents/approval process

English equivalents of preferred terms, mostly LCSH terms are being chosen

If LCSH equivalents are not found (LC terms being too broad), the reference sources like LC titles and subtitles file, encyclopedias, manuals, language vocabularies, www pages, full text databases are consulted.

Approval process:The proposals of preferred terms linked to the UDC class

numbers and English equivalents are being sent to the editorial staff for approval, then the approved authority records are entered via special programme procedure into the authority database.

MSAC

Page 15: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

15

mapping process

• is done intellectually • consists in establishing equivalents between the subject

controlled terms used in indexing systems of participating libraries through a switching language

• switching language: UDC notations based on UDC MRF and English equivalents

• mapping links are defined between preferred terms represented by isolated lexical units only

• subject headings strings as a whole are excluded, are not mapped

• authority records as a whole are excluded, are not mapped

• links are established only between topical main headings (main entries), UDC numbers and language equivalents

MSAC

Page 16: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

16

combination of verbal expressions – UDC notations

simple combination• one verbal expression is mapped to one simple UDC notation

– painting / malířství – UDC 75• one verbal expression is mapped to one compound/complex UDC

notation– medical law / medicínské právo – UDC 34:61 – history of law / právní dějiny – UDC 34(091) – Anglo-American law / angloamerické právo – UDC 34(410+73)

complex combination• one verbal expression is mapped to multiple UDC notations

– death/smrt – UDC equivalent 128 (metaphysics) – death/smrt – UDC equivalent 2-186 (theological anthropology)– death/smrt – UDC equivalent 233-186 (Hinduism) – death/smrt – UDC equivalent 393 (ethnography) – death/smrt – UDC equivalent 616-036.88 (medicine)

• one UDC notation is mapped to multiple verbal expressions– 34 -- law / právo * laws / zákony* legal aspects / právní aspekty *

legal regulations / právní předpisy

MSAC

Page 17: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

17

MSAC indexes

Topical terms – Multilingual Topical terms – CzechTopical terms – EnglishTopical terms – Croatian Topical terms – Latvian Topical terms – LithuanianTopical terms – MacedonianTopical terms – SlovakTopical terms – SlovenianUDC

Subject fields: Astronomy, Demography, Law, Politics, Sociology, Sport, Theater

MSAC

Page 18: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

18

Page 19: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

19

Page 20: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

20

Page 21: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

21

Page 22: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

22

MSAC – two phasesphase 1: • development of Czech topical authority file • integration of language variants of participating libraries in

Czech subject authority file phase 2: • combinations of UDC-natural languages and English

expressions to be inserted into the special fields of respective bibliographic records of cooperating libraries

• process: semiautomatic - intellectual checking of data• access via Z39.50 protocol or• small testing database (created at the NL CR)• after accomplishing the procedure of authorization and

authentication users are offered – access via one single interface in the UIG– both Czech and English interfaces and both Czech and

English languages for searching

MSAC

Page 23: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

23

Uniform Information Gateway (UIG)• allows uniform and easy access to both traditional and

electronic resources (local and remote)• developed by Czech National Library and Charles University• offers extended services feature (SFX) - navigation from the

source to other related targets is possible• basic SW: MetaLib and SFX• MetaLib - parallel browser

– enables to search catalogues, full texts, databases and archives

– is not limited to any predefined interfaces– uses Z39.50 for communication

• SFX system is a context-sensitive linking between Web resources providing and coordinating cooperation between resources and targets– RESOURCE - entity through which we have just made a

search – TARGET - entity where the service is being provided

• OpenURL is a mechanism that makes open linking in the Web-based information environment possible

MSAC

Page 24: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

24

MSAC and retrieval process in UIG

Metalib • enables rephrasing of queries into a format that is

appropriate for the resource selected• sends the queries and receives answers (results)• transforms them into its own format and output them• offers and performs deduplication of selected documents• enables personalized elements - My Resource List of

selected databases: Czech Authority Database and those of cooperating libraries (since October 2005)

MSAC

Page 25: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

25

MSAC : aplication in UIG ?????????????MSAC

Page 26: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

26

Page 27: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

27

Page 28: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

28

Page 29: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

29

Page 30: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

30

Page 31: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

31

future development

• idea to create a multilingual subject retrieval tool or to introduce a mapping scheme in existing systems is considered as an essential element of The European Library service

• MSAC project – beginning phase • problems:

– only voluntary work of teams of participating libraries– communication almost only via e-mails– no external financial support

• new perspective: – joining the TEL-ME-MOR project (The European Library:

Modular Extensions for Mediating Online Resources) funded by the European Commission under the Sixth Framework Programme of the Information Society Technologies (IST) Programme, where the ten new member states of European Union have been invited

– integration with MACS project ?

MSAC

Page 32: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

32

Multilingual Access to Subjects (MACS) project

• goal - to integrate the most developed and used subject indexing systems LCSH, RAMEAU and SWD

• feasibility of linking mentioned Subject Heading Languages was investigated

• the approach by creating links between LCSH, RAMEAU and the SWD/RWSK was tested in the fields of sport and theatre

• a prototype was created• the ways how to extend the use of MACS project has been

discussed• crossing the language barrier

– adding new subject indexing systems – or investigating the use of other tools such as

classifications if the same are available across several institutions

• demanding significant resources

MSAC

Page 33: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

33

comparison of MACS and MSAC

• MACS - fully functional prototype : – MSAC - first stage of a multilingual initiative

• MACS - linking existing verbal Subject Heading Languages : – MSAC - creating a multilingual retrieval system based on

UDC• MACS - all SHLs of the equal status, no pivot language :

– MSAC - switching language UDC and English equivalentsMACS and MSAC

– common nouns (in MSAC the special name entities like Washington Declaration, 1918, October 18th)

– only headings mapped as equivalent headings judged to be synonymous in meaning

– only preferred forms mapped, hierarchical structures and thesaural relationships not mapped

– syntactical structures of subject headings strings not mapped

MSAC

Page 34: Multilingual Subject Access to Catalogues of National Libraries (MSAC) Czech Republic’s collaboration with Croatia, Latvia, Lithuania, Macedonia, Slovakia,

Marie Balíková Czech National Library

34

Multilingual subject access

A challenge

Thank you for your attention

MSAC: http://sigma.nkp.cz/eng/auv

CZENAS: http://sigma.nkp.cz/eng/aut

JIB: http://www.jib.cz/

MSAC