Download - Voyager® with Unicode™ : A Catalogers Session Connie Braun Training Consultant

Voyager® with Unicode™ : Voyager® with Unicode™ : A Catalogers Session A Catalogers Session

Connie BraunConnie Braun

Training ConsultantTraining Consultant

AgendaAgenda

Introduction

Your Work Environment

Conversion

New Features

Learning More

Q&A

Release UpdateRelease Update

General release occurred October 6, 2004! 4 production partners

1 Windows Server, 3 Solaris 8 test server partners

4 Task Force members (large non-roman collections) 1 large consortia with Universal Borrowing &

Universal Catalog 2 European customers

As of 01/20/05, 71 customers have upgraded and are functioning in a production environment with Voyager with Unicode. Approximately 50 upgrades are scheduled between now and May 2005.

Why Unicode™ in Why Unicode™ in

Voyager?Voyager?

Brings Voyager up to current IT standards

Finds and displays records in the native language

Create and edit any MARC record using UTF-8

Import and export of records with any supported character set

Operators may select a Unicode-compliant font of their choice

Display Unicode characters in OPAC without proprietary software

Implementing Voyager with Implementing Voyager with UnicodeUnicode

For our customers, it’s business as usual, but with some interesting changes and improvements, especially in Cataloging.

Helping everyone to implement a Unicode-compliant system is Endeavor’s aim.

The Unicode standard is an important step towards realizing that goal.

Implementing the Unicode standard is an extension of Endeavor’s original mission: access to information regardless of location or format.

Following StandardsFollowing Standards

Follows Standards (not proprietary) See http://www.unicode.org for much more detail on these

standards. See

http://lcweb.loc.gov/marc/specifications/speccharucs.html for details on LC’s format of MARC records that use Unicode. Voyager follows this specification.

Specifics on the Code Tables may be viewed at http://www.loc.gov/marc/specifications/specchartables.html

The Voyager implementation of the Unicode standard gives libraries and their users greater flexibility when accessing collection materials that contain both Roman and non-Roman text.

Multilingual Input and DisplayMultilingual Input and Display

By introducing improved multilingual input and display capabilities in Voyager, characters now display correctly according to the Unicode and MARC standards.

Greater script coverage for cataloging items in your collections, published in languages around the world.

How many? The total number of possible characters for UTF-8 is: 2,147,483,648!

Preview ServerPreview Server

• Anyone interested in trying out Voyager with Unicode before your upgrade? You can!

• http://support.endinfosys.com/cust/voy/upgrade/unicode/testwv_pre.html provides all the details necessary to get you started

• Preview Server uses the Voyager training database that has been augmented with numerous records in both Roman and non-Roman languages

• Try keyword searches: • “non roman script japanese”• “non roman script arabic”• “roman script french”• “roman script italian”

AgendaAgenda

Introduction

Your Work Environment Workstation Requirements Setting Up For Languages Other Than English Tag Tables Session Defaults and Preferences

Conversion

New Features

Q&A

Workstation RequirementsWorkstation Requirements

In order to enjoy the full range of benefits, PCs must have up-to-date operating systems and productivity software.

This means that staff PCs will need: Windows 2000 or XP operating system Unicode standard compliant Internet browser

IE 6+ Netscape 6+

Unicode-compliant font: Lucida and Arial Unicode MS

MS Windows™MS Windows™

Voyager is more integrated with Windows in terms of

Standard Windows 2000/XP Unicode support

Standard Unicode fonts

Standard input using Input Method Editors (IMEs)

Standard browser support

Setting Up for Languages Setting Up for Languages Other Than EnglishOther Than English

• Workstations need to be specifically configured to work with languages other than English

• Likely will require technical IT assistance to install needed languages on staff PCs

• Best to install all languages so that cataloger may easily include new ones as necessary

Adding Languages to PCsAdding Languages to PCs• Regional and language

options are specific to each PC

• Among options available via Start – Settings – Control Panel

• Details button on Languages tab lets operator view or change languages and methods to enter text

• Can include supplemental language support, too

Choosing LanguagesChoosing Languages• Languages added

to PCs will match languages for items found in your collections

• Add and remove according to your needs; as few or many as necessary

• May also set preferences for language bar and key settings

Tag TablesTag Tables

MARC Tag Tables have been completely revised and rewritten for Voyager with Unicode

Tag TablesTag Tables

• Ability to modify tag table configuration remains the same as in earlier releases

• But, may not specify anything for Leader position 9 since that byte is now hard-coded to identify records that have been converted to UTF-8

• May want to consider whether or not library will need or want to revise Tag Tables for local use

• See Appendix A of Cataloging User’s Guide for full details on revising, maintaining and updating the Tag Tables

Record ValidationRecord Validation

MARC validation

MARC21 character set validation

Authority control validation

Decomposition of accented characters for MARC21

Session Defaults and Session Defaults and Preferences:Preferences:

Record ValidationRecord ValidationBypass MARC21 Character set validation

• Uses MARC21 Repertoire.cfg to control validation of the MARC21 character set

• Helps to enforce MARC21 standard

Bypass Decomposition of accented characters for MARC21• Allows records to be saved to the database without

decomposing the characters • IMPORTANT: If you select this option, MARC21 rules

are ignored. We strongly recommend that this check box be un-checked, in order to comply with the MARC21 standard.

Session Defaults and Session Defaults and Preferences:Preferences:Mapping TabMapping Tab

Expected Character Set of Imported Records now has six options

Session Defaults and Session Defaults and Preferences: Colors/Fonts TabPreferences: Colors/Fonts Tab

AgendaAgendaIntroduction


Conversion• Data Conversion• Conversion Error Logging• Conversion Details• Identifying Non-Unicode Data• The Rest of Voyager

New Features

Learning More

Q&A

Data ConversionData ConversionConversion process during upgrade treats data

differently than when importing records through Cataloging client or via BulkImport

MARC records are converted from VRLIN (Voyager legacy encoding) to MARC21 compliant UTF-8 encoding Leader position 9 becomes an ‘a’ Conversion Log Created UTF-8 allows for variable length characters. The

majority of characters in the database occupy the same amount of space as before conversion.

Note: All indexes and database columns with MARC data are regenerated after conversion.

Conversion DetailsConversion DetailsIMPORTANT! NO RECORDS ARE LOST

Each field in the record handled individually.

As each field is processed, it may change length, requiring adjustments to the leader and directory of the record.

Records are saved to the database with a leader position 9 = ‘a’.

Both record-level and field-level checking are performed. In rare cases an entire record might fail conversion; it is more likely that an individual field fails to be converted.

Records may not convert if they contain text that cannot be mapped into Unicode according to the standard MARC-8 to Unicode mappings.

Records that do not convert are stored in the database as is, without being converted to Unicode.

Conversion Error Logging Conversion Error Logging

Libraries need to know the details about the results of the conversion process.

Full error checking and logging is included as part of the upgrade

Technical User’s Guide, Chapter 4

Cataloging User’s Guide, Appendix C

Library designates should review this file to plan for correcting any records that have errors

Sample from Conversion Log Sample from Conversion Log FileFile

Conversion Log Details 1Conversion Log Details 1

1 2 3 4 5 6 7

# 11 secs read=982 changed=791 880=0 okay=982 errors=0 written=982






Legend1 number of seconds used by job so far2 read=number of records processed3 changed=number of records changed

4 880=how many records contain 880s5 okay=# records processed successfully6 errors=# records not processed due to errors7 written=# records written to the database

Conversion Log Details 2Conversion Log Details 21 2 3 4 5 6 7 8

=bib 6213: [17](700): c->8 loose char page=0 at 20 '091e ..‘

9=bib 35322: [14](856): c->8 undefined char page=0 at 61

'fc7220486973746f .r Histo‘

10=bib 35516: [23](856): c->8 no char to combine to page=0 at 82 '1e

.‘

=================================================================

1 record type and id2 index within record of field that generated error3 tag that generated error4 c->8 indicates conversion to UTF-8 encoding5 description of error

6 page=subset to which source character belongs7 at # position of source character that caused error8 hex dump of source character9 description of error10 description of error

Conversion Log Details 3Conversion Log Details 3

loose char: a warning message indicating that a character not strictly part of Voyager encoding has been converted (e.g. unexpected carriage return)

no char to combine to: a warning message indicating that a combining character appeared but it lacks a base character with which to combine (e.g. umlaut but no a, o, u base letter)

undefined char: an error message indicating that there is a single character that cannot be mapped to UTF-8

Identifying non-Unicode dataIdentifying non-Unicode data• To identify a non-Unicode record in the Cataloging client, select a

color for Conversion records in Session Defaults and Preferences > Colors-Fonts tab.

Identifying non-Unicode dataIdentifying non-Unicode data

• Any non-converted record displays in the color selected in Options/Preferences.

Identifying non-Unicode dataIdentifying non-Unicode data

There are other ways to identify records that have conversion errors.

Records that cannot be converted to Unicode are viewable in the Cataloging module with nc (not converted) displayed in the Title Bar.

Any characters that cannot be matched or recognized are replaced with a Unicode substitution character.

Fonts and UnicodeFonts and Unicode

• A MARC record may contain non-Roman characters even though you cannot see them. Records are sure to display correctly if a Unicode-

compliant font has been selected.

• Lucida Sans Unicode installed by default with Windows

• Arial Unicode MS Good choice for libraries with mixed cataloging Included with Microsoft Office and other Microsoft

products

The Rest of VoyagerThe Rest of Voyager

• Non-MARC data is not converted Acquisitions data Circulation data (patron info, etc.) Item data

• Reporter Not Unicode standard compliant Translates data to LATIN1 Dots appear where you used to see squares

AgendaAgenda

Introduction


Conversion

New Features• Cataloging

Diacritics & Special Characters, Importing Records, New Record Views, Search URIs

• WebVoyágeBrowsers, Searching, Displaying

• Interacting with Other Systems

Learning More

Q&A

Diacritic and Special Character Diacritic and Special Character EntryEntry

• Cataloging practices: then and now

Pre-Unicode input in Cataloging = accent character (diacritic) precedes the base character.

Example: Espa~na Post-Unicode input in Cataloging = accent character

(diacritic) follows the base character. Example: Espan~a

Ability to display combined characters is an improvement over past versions and a way to insure accurate entry

Example: España

Special Characters.cfgSpecial Characters.cfg

SpecialCharacters.cfg, located in the C:\Voyager\Catalog folder, defines the content of the special character entry dialog box.

Operators may define their most frequently used characters here.

Special Character EntrySpecial Character Entry

This is what the dialog box in Cataloging looks like.

The key press column identifies the keyboard equivalent that may be used instead of turning on Special Character Mode in Cataloging.

Finding Little Used CharactersFinding Little Used Characters

• For situations where a character not part of the Special Characters list is needed, operator can use Character Map from MS Windows

• Start – Programs – Accessories – System Tools – Character Map

• Locate character or perform search• Select and Copy character, then paste into

position in bib record

Cataloging: Input of Non-Roman Cataloging: Input of Non-Roman TextText

Voyager® with Unicode allows Cataloging operators to use all of the standard Microsoft Windows keyboard and input method editors (IMEs).

With this functionality in place, operators may search for, display, and edit the contents of all MARC records using the full range of UTF-8 characters.

Entire JACKPHY group is part of the UTF-8 character set which includes right-to-left input needed for Arabic, Persian, Hebrew and Yiddish.

Reminder: JACKPHY = Japanese, Arabic, Chinese, Korean, Persian, Hebrew, Yiddish

Linking in a MARC21 RecordLinking in a MARC21 RecordTag I1 I2 Subfield Data

100 1 ‡6 880-01 ‡a An, Zhen.

245 1 0 ‡6 880-02 ‡a Ri yue yun yan / ‡c An Zhen zhu.

250 ‡6 880-03 ‡a Di 1 ban.

260 ‡6 880-04 ‡a Changchun Shi : ‡b Changchun chu ban she, ‡c 1997.

300 ‡a 4, 2, 291 p. ; ‡c 21 cm.

440 0 ‡6 880-05 ‡a Zhongguo li dai wang chao xing shuai qu shi lu

500 ‡a Non-Roman script – Chinese

651 0 ‡a China ‡x History ‡y Ming dynasty, 1368-1644.

880 1 ‡6 100-01/$1 ‡a 安　震 .

880 1 0 ‡6 245-02/$1 ‡a 日月　云烟 / ‡c 安　震　著 .

880 ‡6 250-03/$1 ‡a 第 1 版 .

880 ‡6 260-04/$1 ‡a 长春市 : ‡b 长春　出版社 ,‡c 1997.

880 0 ‡6 440-05/$1 ‡a 中国　历代　王朝　兴衰　启示录

Using On-Screen KeyboardUsing On-Screen Keyboard

Typically, the path is Start—Programs—Accessories—Accessibility—On-Screen Keyboard

Importing RecordsImporting Records

• Conversion process is separate and distinct from the process of importing records

• Important distinction for operators who import records through the Cataloging client or via BulkImport

• Expected character set needs to be accurately identified if records are to be imported correctly

• Some experimentation may be necessary to determine the correct character set

• Let’s look at some details to help everyone understand what is happening

Record Exchange ScenariosRecord Exchange Scenarios

Voyager 2001.2 and earlier Voyager 2001.2 and earlier • In Voyager 2001.2

and earlier, there were several options from which to choose regarding the character set:

• Latin1• OCLC• RLIN legacy• MARC21 MARC8

• Until now it has been quite simple to choose the correct option when importing records through the Cataloging client or processing large numbers of records through BulkImport.

After Upgrade to Voyager After Upgrade to Voyager 2003.12003.1

• From Voyager 2003.1 forward, there are numerous options from which to choose regarding the character set:

• Latin1 (non-Unicode)• MARC21 MARC8 (non-

Unicode)• MARC21 UTF8• OCLC (non-Unicode)• RLIN legacy (non-Unicode)• Voyager legacy (non-Unicode)

• With Voyager 2003.1 and beyond, it is very important to determine the character set of records before importing records through the Cataloging client or processing large numbers of records through BulkImport. Some experimentation may be necessary.

• * transition to MARC21 UTF8 occurs as Unicode standard becomes pervasive

One Year From NowOne Year From Now• In Voyager 2003.1 and

beyond, numerous options for character sets will continue to be needed:

• Latin1 (non-Unicode)• MARC21 MARC8 (non-

Unicode)• MARC21 UTF8• OCLC (non-Unicode)• RLIN legacy (non-Unicode)• Voyager legacy (non-

Unicode)

• But, the Unicode standard will be much more pervasive, having been adopted and deployed by bibliographic utilities, vendors who massage records, vendors who supply records, and others.

• This means that selecting the correct option will again be simpler, even though knowing the character sets will continue to be very important.

Bulk ImportBulk Import• Bulk Import of MARC Records

Fundamentally the same as before

Leader byte 9 is checked against the incoming character set identified in the import rule.

Blank = non-Unicode™; converted & imported ‘a’ = Unicode™; imported Neither Blank nor ‘a’; errors out – not imported See log.imp.yyyymmdd for details on import

success Records that cannot be converted are not imported;

found in err.imp.yyyymmdd

Bulk Import and Expected Bulk Import and Expected Character SetCharacter Set

Character set mapping for Bulk Import is designated in the Bulk Import rule in SysAdmin > Cataloging > Bulk Import Rules.

MARC ExportMARC Export

Default export character set is MARC21 UTF-8

Use the –a option to choose different character set (in the command line) See page 10-8, in Technical User’s Guide for

more detail

LATIN1 records will get a dot exported for characters outside the LATIN1 character set

If mapping for a composed character is not found, it decomposes and Voyager® attempts to find a match for each part.

New ISBN IndexesNew ISBN Indexes

For improved duplicate detection:

New ISBN Index 020N 020a Number only 020R 020z Number only

020 |a 1234567890 (Knopf)020 |a 1234567890

Check Bibliographic and Authority duplicate detection profiles in System Administration!

HTTP PostingHTTP Posting

Much easier access to WebVoyáge display from clients

Available in Cataloging, Acquisitions & Circulation

Toggle record view from staff client to WebVoyáge Record menu in Cataloging contains a Send Record to option

Send Record To: WebVoyáge LinkFinderPlus available in Cataloging, Acquisitions &

Circulation Record menu in Cataloging contains a Send Record to option

Send Record To: LinkFinderPlus

Configured in voyager.ini file [MARC POSTing] stanza

Enabling HTTP PostingEnabling HTTP Posting

To enable HTTP posting, a stanza is added to the voyager.ini file. An example is shown below.

• [MARC POSTing]• WebVoyage="http://train20031-

c1db.comet.endinfosys.com/cgi-bin/Pbibredirect.cgi"

• LinkfinderPlus="http://207.56.64.116/cgi-bin/Phttplinkresolver.cgi"

Easier Access to OPAC DisplayEasier Access to OPAC DisplaySend Record To…….in Cataloging

•Send Record To…….in Acquisitions

Search URISearch URI• Staff Client Search URI in Cataloging,

Circulation and Acquisitions

Drive searches to resources on the web

Add new button to search interface in staff clients

Click button…a browser is opened & search is executed

This is PC specific (voyager.ini)

Possible applications Link to another OPAC Link to one of your vendors Link to an online book seller

Presenting Search URIPresenting Search URI

Staff client search URI

Available in Cataloging, Circulation, and Acquisitions

Adding Search URIsAdding Search URIsclipped from voyager.ini

• [SearchURI]

• Name=Google• URI=http://www.google.com• Copy=Y• SearchSyntax=/search?&q=<searchtext>

• #Name=Barnes&Noble• #URI=http://search.barnesandnoble.com• #Copy=Y• #SearchSyntax=/booksearch/results.asp?WRD=<searchtext>

• #Name=Gale Group• #URI=http://www.galegroup.com• #Copy=Y• #SearchSyntax=/servlet/SearchPageServlet?

region=9&imprint=<searchtext>

WebVoyWebVoyááge and Unicodege and Unicode

• MARC data supplied to the browser in UTF-8

IE 6+ generally displays Unicode characters correctly. Some characters do not display correctly unless a Unicode-compliant font is selected.

Netscape 6+ figures out that it needs to display Unicode characters without any special settings

Consider new help text in your OPAC to help patrons understand about language options, especially if there are records using different languages in your database

• New UTF-8 download/save format

Searching in WebVoySearching in WebVoyáágege

Search and display in native languages for staff and users.

WebVoyáge and Cataloging allow Unicode character input; you can search for and retrieve records in native languages.

Record display includes non-Latin scripts, including right-to-left scripts like Arabic and Hebrew. Voyager takes advantage of the web browser’s native rendering support.

Records with Other Languages in the Records with Other Languages in the OPACOPAC

Displaying Records in Displaying Records in WebVoyágeWebVoyáge

Linking in a MARC21 RecordLinking in a MARC21 RecordTag I1 I2 Subfield Data

100 1 ‡6 880-01 ‡a An, Zhen.

245 1 0 ‡6 880-02 ‡a Ri yue yun yan / ‡c An Zhen zhu.

250 ‡6 880-03 ‡a Di 1 ban.

260 ‡6 880-04 ‡a Changchun Shi : ‡b Changchun chu ban she, ‡c 1997.

300 ‡a 4, 2, 291 p. ; ‡c 21 cm.

440 0 ‡6 880-05 ‡a Zhongguo li dai wang chao xing shuai qu shi lu

500 ‡a Non-Roman script – Chinese

651 0 ‡a China ‡x History ‡y Ming dynasty, 1368-1644.

880 1 ‡6 100-01/$1 ‡a 安　震 .

880 1 0 ‡6 245-02/$1 ‡a 日月　云烟 / ‡c 安　震　著 .

880 ‡6 250-03/$1 ‡a 第 1 版 .

880 ‡6 260-04/$1 ‡a 长春市 : ‡b 长春　出版社 ,‡c 1997.

880 0 ‡6 440-05/$1 ‡a 中国　历代　王朝　兴衰　启示录

Interacting with Other SystemsInteracting with Other Systems

• Incoming Z39.50 Connections

Records in Unicode databases are UTF8 encoded

z3950svr may send either or both MARC8-encoded or UTF8-encoded records

Default is set to send MARC8 encoded records

But, two different z3950svr ports can be configured to provide records in both formats, thereby accommodating all sites connecting to database

Interacting with Other SystemsInteracting with Other Systems

• Outgoing Z39.50 Connections Retrieves and displays records of any type in UTF-

8 Converts incoming records based on new

Database Definitions setting in System Administration called ‘Source Character Set’

Latin1 (non Unicode™) MARC 21 MARC8 (non Unicode™) MARC21 UTF8 OCLC (non Unicode™) RLIN legacy (non Unicode™) Voyager® legacy (non Unicode™)

AgendaAgenda

Introduction


Conversion

New Features

Learning More

Final Q&A

If you want to know more about…..If you want to know more about…..

Coded Character Sets - EndUser 2004: Session 29Title: Coded Character Sets: A Technical Primer for LibrariansPresenters: Michael Doran, Systems Librarian, University of Texas at Arlington; Dan Sweeney, Business Analyst II, Endeavor Information SystemsGreat Website: http://rocky.uta.edu/doran/charsets/

Strategies and Tools for Cleaning Up Your Data -- EndUser 2004: Session 45Title: Transitioning To Unicode: Strategies for Tidying Your DataPresenters: Fran Budde, Acquisitions & Cataloging Specialist, Pacific Lutheran University; Francesca Lane Rasmus, Director, Technical Services, Pacific Lutheran University; Layne Nordgren, Director of Instructional Technologies/Library Systems, Pacific Lutheran University

If you want to know more about…..If you want to know more about…..

Special Character Input/Issues – EndUser 2004:Session 65Title: Why Unicode?Presenter: Martin Heijdra, Chinese Bibliographer/ Head of Public

Services,East Asian Library, Princeton University

Preparing for Unicode Conversion & Cataloging Issues – EndUser 2004: Session 74

Title: Unicode Conversion at the Library of CongressPresenter: Ann Della Porta, Assistant Coordinator, Integrated

SystemsOffice, Library of Congress

SupportWeb: KnowledgeBase, EndUser archiveshttp://support.endinfosys.com/cust/index.html

If you want to know more If you want to know more about….about….

880 – Alternate Graphic Representation (R)http://www.loc.gov/marc/bibliographic/ecbdhold.html#mrcb880

OCLC Character Setshttp://www.oclc.org/support/documentation/worldcat/records/

subscription/5/5.pdf

Original Scripts in RLG Databaseshttp://www.rlg.org/origscripts.html

MARC 21 Concise Bibliographic: Control Subfieldshttp://www.loc.gov/marc/bibliographic/ecbdcntf.html

MARC 21 Concise Bibliographic: Multiscript Recordshttp://www.loc.gov/marc/bibliographic/ecbdmulti.html

Thank you!Thank you!

Download - Voyager® with Unicode™ : A Catalogers Session Connie Braun Training Consultant

Top Related