CLASSIFICATION ON THE NETWORK: MACHINE READABLE,
SHARED, CLONED AND HIDDEN
Aida Slavicassociate editor of the UDC
Glasgow, 3-5 September, CILIP Cataloguing and Indexing Group Conference
2
CLASSIFICATION ON THE NETWORK
the use of classification outside bibliographic domain brought about by the Internet
• broad knowledge browsing, presentation (initially)• automatic classification
moving ‘behind the screen’ with digital repositories and cross repository resource discovery
• information integration and searching across distributed collections • mapping between vocabularies• supporting cross language searching• supplementing simple text retrieval techniques to enable search
expansion • alerting services• filtering by subject areas for various type of reports, auditing,
statistics• source of vocabulary to build new vocabularies
common to all is interest in readily available rich classification data on which services and tools can be built at lower costs
3
VOCABULARY SHARING ON THE NETWORK
Need for generally applicable standards for representing vocabularies in machine readable way
Preferance XML and XML/RDF technology – to promote domain, system and platform independence
Publishing, exposing and sharing controlled vocabularies on the network
• ISO/IEC 13250 Topic Maps • BS 8723 Structured vocabularies for information
retrieval • Simple Knowledge Organization System (SKOS)
See: Sharing Vocabularies on the Web via SKOS
4
VOCABULARY SERVICES & REGISTRIES
Making the content of knowledge organization systems (KOS) available through web services
initiative by NKOS – Network Knowledge Organization Systems and Services (http://nkos.slis.kent.edu/)
For registries we need: machine accessible vocabularies using representations standard
and access protocols metadata for describing KOS (using a standard for identifying
and describing vocabularies) business case/cost effectiveness upload of vocabularies into registries by owners and regular
maintenance and upload of versions
See: Tudhope, D. Knowledge Organization System Services: brief review of NKOS activities and possibility of KOS registrieshttp://www.iskouk.org/presentations/tudhope_ISKOUKseminar1.pdf
5
CLASSIFICATION & SEMANTIC WEB
classification’s capacity to represent and control complex semantic relationships across universe of knowledge is compatible with the semantic web goals - universal and meaningful linking of concepts
large collections of resources already organized according to classifications schemes are source of concept/subject relationships that can be utilized to improve automatic information integration
prerequisite: full machine readability of data! open access to classification data on the network
6
ABOUT CLASSIFICATION: OUTLINE
role of classification in supporting subject access
subject authority control: managing, sharing, re-use of classification
improving classification source data
539.1 Nuclear physics. Atomic physics. Molecular physics
539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons
SEMANTIC RELATIONSHIPS
AntineutrinosAntineutronsAntiprotonsAtomic physicsBaryons Beta-particlesBosonsElectrons HadronsHyperonsLeptonsMesonsMesonsMolecular physicsMuonsNeutrinosNeutronsNuclear physicsNucleiNucleonsPositronsProtonsResonances
words alone can only be arranged ordered alphabetically
grouping concepts into classes according to similarity
539.1 Nuclear physics. Atomic physics. Molecular physics
539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons
539.1 Nuclear physics. Atomic physics. Molecular physics
539.12 Elementary and simple particles539.123/.124 Leptons. Including: Muons539.123 Neutrinos539.123.6 Antineutrinos539.124 Electrons (including beta-particles)539.124.6 Positrons539.125/.126 Hadrons. Baryons and mesons539.125 Nucleons539.125.4 Protons539.125.46 Antiprotons539.125.5 Neutrons539.125.56 Antineutrons539.126.3 Mesons539.126.4 Resonances539.126.6 Hyperons
SEMANTIC RELATIONSHIPS
AntineutrinosAntineutronsAntiprotonsAtomic physicsBaryons Beta-particlesBosonsElectrons HadronsHyperonsLeptonsMesonsMesonsMolecular physicsMuonsNeutrinosNeutronsNuclear physicsNucleiNucleonsPositronsProtonsResonances
alphabetical orderno semantic relationships
systematic ordersemantic relationships fixed by notation
NOTATION – enables mechanical ordering of subjects
9
WORDS
classification is ‘language independent’ but... words are an essential part of every
classification system the separation of concepts from words using
notation - simply means that an infinite number of natural language expressions can be attached to every class notation in order to provide access points
verbal access points managed separately as captions subject-alphabetical index (relative index, chain
index) alphabetical controlled vocabularies (thesauri,
subject headings) folksonomy
10
HIERARCHICAL ORGANIZATION
6 Applied sciences. Medicine. Technology62 Engineering. Technology in general621 Mechanical engineering in general. Nuclear technology. Electrical
engineering. Machinery621.8 Machine elements. Motive power engineering. Materials handling.
Fixings. Lubrication621.88 Fastening, fixing devices. Fasteners621.882 Threaded fasteners. Screws. Nuts and bolts. Washers621.882.2 Screws, bolts according to head form. Screws and bolts for various
materials621.882.21 Screws and bolts according to head form621.882.214 Other polygonal-headed screws and bolts621.882.214.2 Screws and bolts with knurled or milled head. Thumb screws
freedom to choose and change the level of specificity
browsing function
semantic search expansion
11
UNIVERSAL KNOWLEDGE CLASSIFICATION – ASPECT CLASSIFICATIONS
organizes the universe of knowledge by disciplines - based on some scientific and educational consensus (criticism!)
groups phenomena according to the way they are researched, described and studied in documents
assumption – collocation of books by the field in which they are used saves user’s timeusers looking for books on managing rabbit as a pest will not be interested in fur industry or physiology of rodents... They will find all books on rodent pest control in the closest proximity
same phenomenon will find its place in all disciplines in which it may be subject of study
12
SUBJECT CONTEXT – ASSOCIATIVE RELATIONSHIPS
Chemical industry Pest-control chemicals Chemicals for controlling rodents. Rodenticides Mouse
Agriculture Animal husbandry Rodents kept for fur Mouse
Zoology Mammals Rodentia. Lagomorpha Myiomorpha
Muridae. Mice and ratsMouse
Agriculture Plant protection Control of plant diseases and pests Destruction of vertebrate pests Mouse
see also
see also
see also
13
LINEAR PRESENTATION OF KNOWLEDGE
the role of classification is to establish systematic, linear presentation of knowledge – order of classes
two types of classifications with respect to the flexibility of access points• enumerative – single, pre-established order of
simple and complex subjects (e.g. Dewey, LCC) • faceted and semi-faceted classification – allow
a range of options in class ordering, control over access points to subjects, and unlimited combinations of subjects
14
SUBJECT ACCESS POINTS
bibliographic classifications are designed to denote the following elements of content :
subject and subject facets: entity (its parts, kinds), processes, materials, agents, operations, instruments, space, time
relationships between subjects treated within the document (influence, bias, application, comparison)
inner form of presentation: theoretical, historical or criticism
outer form of presentation such as audience, purpose, form of expression
manifestations: text, image, sound
carriers: paper, magnetic/optical discs, film, analogue recordings
15
CLASSIFICATION VOCABULARY (e.g. UDC)
COMMON AUXILIARY NUMBERS
TIME
“ ”
ETHNICS(=...)
PLACE(1/9)
FORM(0...)
PROPERTIES-02
MATERIALS-03
PERSONS-05
LANGUAGE=…
RELATIONS-04
MAIN CLASSES
(DISCIPLINES)
‘
.0
SPECIAL
AUXILIARY
NUMBERS
-1/-9
16
SYNTHESIS
Discipline 1
Discipline 2
Discipline 3
81 Linguistics and languages
811.12.2 German811.12.22 Upper German811.12.24 Middle German811.12.3/.4 Low German811.12.3 Plattdeutsch811.12.4 Frisian811.12.5 Dutch811.12.58 Dutch based
pidgin and creole
MAIN TABLES
Materials
Language
Time
Form
(1/9) Place
(4) Europe(430) Germany(436) Austria(437.3) Czech Republic(437.5) Slovakia(438) Poland
COMMON AUXILIARIES
-1 /-9 Schools, trends, methods
-116 Structuralism-116.2 Geneva school
‘0 Origins and periods of langusg
‘0 Origin and periods
‘1/’9 General theory of linguistics
‘1 Metatheory ‘2 Subject fields, facets of lin.‘34 Phonetics. Phonology’35 Graphemics. Orthography’36 Grammar’37 Semantics
SPECIAL AUXILIARY NUMBERS
17
RELATING SUBJECTS ACROSS DISCIPLINES = PHASE RELATIONSHIPS
37 :004 Education : Computers
338.48 :61 Tourism : Medicine
602.72 :17 Embryonic cloning : Ethics
-04 Relations, Processes and Operations
-042Phase relations-042.1 Bias phase-042.2 Comparison phase-042.3 Influence phase-042.4 Tool phase. Exposition phase
18
SUBJECT FACETS AND FLEXIBILITY OF ORDER
History Scotland
94 (410.5) “18” 19th century
History Scotland
94 (410.5) “18” 19th century
19
FACETS OF PERSONS
-057 Persons according to occupation, work, livelihood, education -057.17 Managers in general. The management -057.177 Higher management. Top management-057.177.3 Directors. Board members -057.177.32 Non-executive directors -057.177.321 Deputy directors. Assistant directors
-056 Persons according to constitution, health, disposition, hereditary or other traits-056.2 Persons according to physical state and health-056.25 Persons according to nourishment (nutritional
state) or body weight -056.257 Overweight persons. Overnourished. Fat. Obese. Hypertrophic
-053 Persons according to age or age-groups -053.8 Adults. Grown-ups -053.88 Persons in late middle age (troisième âge)-
-056.257
-057.177
-053.88
Top management – Persons in late middle age- Overweight
612.12-009.92
Angina pectoris
21
MANAGING SUBJECT ACCESS
DOCUMENT
authortitlepublisherformat ...
METADATA
SUBJECTCLASSMARK: 94(410)"19"
AUTHORITY FILE UDC CLASS: 94(410) "19"
DESCRIPTION: History of the U. K.
WAS BEFORE: 941.0
BROADER: 94(4)SEE ALSO: 94(73), 94(54), 94(366)
SEARCH TERMS:HistoryUnited KingdomGreat Britain20th century
DISPLAY AS: United Kingdom - History
-----------------------------------------------------------
MAPPING TO:
Dewey: 94
LCSH: History, 20th century United KingdomLCC: DA566-592
IS DESCRIBED BY
IS DESCRIBED IN
22
SEMANTIC SEARCH EXPANSION
SUBJECT # HITS
539.12 Elementary and simple particles 132539.125/.126 Hadrons. Baryons and mesons 58539.125 Nucleons 38
hadrons search
539.125.4 Protons 5
539.125.46 Antiprotons 2
539.125.5 Neutrons 7
539.125.56 Antineutrons 1
539.126.3 Mesons 9
539.126.4 Resonances 11539.126.6 Hyperons 6
23
SUBJECT HITS #
ASTRONOMY. Mercury 2 PHYSICS. Mercury barmeters 3
ANALYTICAL CHEMISTRY. Mercury38 INORGANIC CHEMISTRY. Mercury, Hg10 ENGINEERING. Mercury vapour generators 9 CHEMICAL INDUSTRY. Mercuration 3
ADVANTAGES IN RESOURCE DISCOVERY: DISAMBIGUATION
mercury search
results.....
SUBJECT # HITS
523.41 ASTRONOMY. Mercury 2
531.787.4 PHYSICS. Mercury barmeters 3
543.272.81 ANALYTICAL CHEMISTRY. Mercury 38
546.49 INORGANIC CHEMISTRY. Mercury, Hg 10
621.181.232 ENGINEERING. Mercury vapour generators 9
66.095.712.49 CHEMICAL INDUSTRY. Mercuration 3
24
ADVANTAGES IN RESOURCE DISCOVERY: PRECISION
results....
SUBJECT # HITS
569.32 Zoology: Rodentia and Lagomorpha 7 632.935.7 Protection of crops 3
636.92 Animal husbandry. Domestic rabbits 38636.92.045 Animal husbandry. Domestic rabbits. Pets 10636.932 Animal husbandry. Rodents kept for fur 9639.112 Hunting. Small game generally 22641.8 Cooking. Main dishes 2677.354 Textile industry. Hare fur. Rabbit fur 8
rabbit search
25
SEARCHING INTERFACE
Lang 1 Lapin
Lang 2 Coniglio
Lang 3 Kaninchen
Lang 4 Rabbit
CLASSIFICATION AUTHORITY FILESUBJECT AREAS
ZOOLOGY
ANIMAL HUSBANDRY
FUR INDUSTRY
599.325.1
636.92
677.354
Lapin, Coniglio, Kaninchen, Rabbit...
Lapin, Coniglio, Kaninchen, Rabbit...
Lapin, Coniglio, Kaninchen, Rabbit
hierarchical organization of concepts
search terms
SUPPORTING MULTILINGUAL SEARCHING
26
INTEGRATION OF INFORMATION
UDC
Vocabulary 2
Vocabulary 1
library classification is often used as a pivot i.e. a central mapping structure - for the alignment of different vocabulares as a central mapping structure
27
EXAMPLE
Nebis subject authority file record, ETH-Biliothek (Zürich) - http://www.ethbib.ethz.ch/index_e.html
28
MARC CLASSIFICATION FORMATS
MARC 21 Concise Format for Classification Datahttp://www.loc.gov/marc/classification/
Concise UNIMARC Classification Format http://www.ifla.org/VI/3/p1996-1/concise.htm
• offer sufficient support to semantic relationships but no support for managing and exploiting complexity of classification syntax, managing global changes i.e.
heading field is not structured and does not allow multidirectional access to the meaningful elements of a complex notation
29
REQUIREMENT
machine readable identification of each structural part of notation separates display of numbers/symbols from their function
data element identifiers
51 (410) (091)
UDC number encoding for database management
30
NETWORK – INITIALLY RESOURCE DISCOVERY German Harvest Automated Retrieval and Directory - GERHARD
subject gateway - automatic classification of the German web based on UDC data from the ETH library authority file (GERHARD website was shut down in 2005).
read more at http://www.bis.uni-oldenburg.de/abt1/waetjen/publ/Article.pdf
32
TASKS FOR CLASSIFICATION DEVELOPERS
Improving classification data at their source:
• provide rich, machine readable classification data exposing semantic relationships and providing multiple access points to notation and words
• enable sharing by distributing data in different standard formats
• find way of releasing part of data in public domain for testing and training
• make sure that copyright regulations do not impede the use of classification in information integration and exchange
33
EXAMPLE - UDC
UDC Master Reference File (MRF) data has been distributed to users in a file format since 1993.
data is improved: unique identifier for every class (independent from notation), semantic and syntactic relationships declared, syndectic structure improved
MRF 2008 exports will be available in MARC and SKOS standards or as on demand SQL statements, + various TEXT/XML outputs
pending:-improvement of verbal access (subject-alphabetical index)-merging the existing multilingual data into one database
future plans: inclusion of mapping to other vocabularies
looking for projects to test semantic technologies and how part of UDC data can be tested in an open m2m environment
34
IN SUMMARY
development of new standards opens new possibilities for sharing and use of classification: new services and new solutions
to support new kind of users classification has to be exposed in machine readable, standardized format and made accessible to programs and services on the network
issues for owners: costs, copyright policy
--- END OF PRESENTATION ---