filing and word breaking procedures. 2 session agenda pre-14.x tab_word_breaking table structure...

28
Filing and Word Breaking Procedures

Upload: alannah-griffith

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

Filing and Word Breaking Procedures

Page 2: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

2

Session Agenda

• Pre-14.x• tab_word_breaking table• Structure• Procedures

• Special remarks• tab_filing table• Structure• Procedures

Page 3: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

3

Pre-14.x

• Various filing and word breaking procedures existed. Each procedure included many parts, but was a closed box.

• Each procedure was assigned a code, such as B1, B5, C1, A3, AM, etc.

• Each procedure was a separate program, requiring new program development to create new procedures. For example, there was no A3 + AM filing procedure.

Page 4: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

4

From 14.1 onwards

• ALEPH provides ready-made components (programs) for creation of filing and word breaking procedures

• /tab/tab_word_breaking -an ALEPH table which identifies word breaking procedures and defines their component parts

• / tab/tab_filing - a table which identifies filing procedures and defines their component parts

Page 5: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

5

• /tab/tab_word_breaking -is an ALEPH table which identifies word breaking procedures and defines their component parts.

• Each word breaking procedure is made up of a group of one or more programs.

tab_word_breaking

Page 6: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

6

tab_word_breaking

1 2 3 4

!!-!-!!!!!!!!!!!!!-!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

03 L abbreviation

03 L numbers

03 L compress -

03 L to_blank !@#$%^&*()_+={}[]:";'<>,.?/|\

• col.1: procedure identifier• col.2: alpha of the text• col.3: procedure name• col.4: procedure parameters

Page 7: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

7

Procedures (1)

• compressStrips characters listed in col. 4

• delete_subfieldChanges sub-field sign (e.g., $$x)

to blank• to_blank

Changes characters listed in col. 4 to blanks

Page 8: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

8

Procedures (2)

• subf_to_signChanges second and subsequent

sub-field signs to the single character listed in col. 4

• blank_to_caratChanges blanks to carat (^)

• marc21_41 041 for separating languages in

MARC21 field 041

Page 9: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

9

Procedures (3)

• AbbreviationCompresses a dot between single characters (e.g., I. B. M. changes to I B M; I.B.M. changes to IBM)

• NumbersCompresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)

Page 10: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

10

Procedures (4)

• IMPORTANT NOTEThe procedures must be listed in logical order. For example, numbers must be listed before compress or change_to_blank if a comma or a dot is included in them.

Otherwise, they will no longer be present when the numbers procedure is used.

Page 11: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

11

Procedures (5)

• ReminderWord breaking procedures are used in tab11, section W. A line can be listed several times in tab11, in order to index it multiple times, with different word breaking each time.For example, an apostrophe:O’hara Ohara O hara

11 W 100## abcdq 01 B WRD WAU

11 W 100## abcdq 04 B WRD WAU

Page 12: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

12

unicode_to_word_genWord indexing routines, as well as retrieval

routines, use the table defined under instance WORD-FIX in ./alephe/unicode/tab_character_conversion_line. The table is traditionally called unicode_to_word_gen.

Page 13: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

13

unicode_to_word_genThis table defines equivalencies for characters, for the purpose of creating words in the words file.All characters naturally retain their unicode value, and are stored in the system in UTF encoding. In order to translate one character into another character (e.g. translating an accented "e" to "e"), you can set an equivalency. The equivalency can be up to 5 characters:

00E6 0061 0065 #LATIN SMALL LETTER AE

Page 14: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

14

unicode_to_word_gen The library's tab_word_breaking table can define

different treatment for the same characters. In

separate procedures specific characters can be set to

compress or to be changed to blank. Characters dealt

with in this manner should be left in their natural

value, and not translated in this table.

For example, you might want an apostrophe to be

considered like a blank, like itself, and as if it were

not there at all (e.g. o'hara, ohara). In order to be

able to set the apostrophe in tab_word_breaking as

both as a compressed character, it must retain its

natural value, and NOT be translated in this table.

Page 15: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

15

Special Remarks

2. When browsing a word index in the OPAC, special characters are always displayed in their converted state.

I.e., if unicode_to_word_gen table sets umlaut to ue, the word will be displayed with ue, and not with an umlaut.

Page 16: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

16

tab_filing - Example

01 L del_subfield

01 L to_lower

01 L abbreviation

01 L suppress

01 L compress '

01 L to_blank !@#$%^&*()_+- ={}[]:";<>?,./~`

01 L mc_to_mac

01 L pack_spaces

01 L char_conv FILING-KEY-01

01 C chi

Page 17: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

17

tab_filing - Structure

1 2 3 4

!!-!-!!!!!!!!!!!!!!!!!!!!-!!!!!!!!!!!!!!>

01 L compress ’

01 L char_conv FILING-KEY-01

• col.1: procedure identifier• col.2: alpha of the text• col.3: procedure name• col.4: procedure parameters

Page 18: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

18

tab_filing Procedures (1)

• compressStrips characters listed in col. 4

(e.g., ()[]:,)• delete_subfield

Changes subfield sign to blank (e.g., $$x) • to_blank

Changes characters listed in col. 4 to blanks

Page 19: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

19

tab_filing Procedures (2)

• to_lowerChanges all characters to lower case

• to_caratChanges subfield sign to two carat (^^) signs in order to achieve hierarchical sorting of headings

• suppressSuppresses all text contained within <<…>>, as well as the signs themselves

Page 20: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

20

tab_filing Procedures (3)

• expand_numFor filing numbers numerically, adds leading zeroes to numbers to fixed length of 7 (e.g. 17 -> 0000017)

• mc_to_macChanges initial “mc” to “mac” (for interfiling McKay and MacKay)

• non_filingSuppresses initial text according to non-filing indicator defined in tab11

Page 21: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

21

tab_filing Procedures (4)

• compress_blankStrips blanks (e.g. ISBN)

• numbersCompresses a comma and a dot between numbers (e.g., 2,153 changes to 2153)

• non_numericDeletes all non-numeric characters (for ISBN, ISSN)

Page 22: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

22

tab_filing Procedures (5)

• abbreviationCompresses a dot between single characters (e.g., I. B. M. changes to I B M, I.B.M. changes to IBM)

• build_filing_key_lc_call_noSpecial procedure for correct sequencing of LC call numbers

Page 23: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

23

tab_filing Procedures (7)

• char_convTranslates one character for another (up to 5), using the char_conv procedure listed in the matching line of the tab_character_conversion_line in alephe/unicode For example:

01 L char_conv FILING-KEY-01

refers to the lineFILING-KEY-01 ##### # line_utf2line_sb unicode_to_filing_01

Page 24: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

24

unicode_to_filing_nn_source

This table is used for character conversion for filing. The table must be processed using UTIL P/3 in order to create the unicode_to_filing_nn table. This latter table is the one actually used by the system. It performs an additional translation in order to remove null characters.

Page 25: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

25

unicode_to_filing_01_source

• Examples:Latin capital letter AE:00C6 0041 0045Small letter sharp s:00DF 0053 005A

Page 26: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

26

IMPORTANT NOTE

The procedures must be listed in logical order. For example:

numbers must be listed before compress or change_to_blank if comma or dot are included in them. Otherwise, they will no longer be present when the numbers procedure is used.

Page 27: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

27

./tab/tab_filing - usage

• Filing procedures are used when building filing key for headings (Z01), index entries (Z11) and sort keys (Z101)

Page 28: Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure

28

./tab/tab_filing - usage

• Note: if no procedure for creation of sort keys

has been defined in tab01.lng, the system will use the default filing procedure 99.

Filing procedure 99 MUST be defined tab_filing, as far as it installs the default sort order.