epc exhibit 136-25 april 22, 2013 the library of congress · keywords: frsad, dewey decimal...

16
EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS Dewey Section To: Caroline Kent, Chair Decimal Classification Editorial Policy Committee Cc: Members of the Decimal Classification Editorial Policy Committee Karl E. Debus-López, Chief, U.S. General Division From: Rebecca Green, Assistant Editor Dewey Decimal Classification OCLC Online Computer Library Center, Inc. Via: Michael Panzer, Editor in Chief Dewey Decimal Classification OCLC Online Computer Library Center, Inc Re: Modeling topic relationships Attached is a preprint of a paper to be presented as part of UDC Seminar 2013, Classification & Visualization: Interfaces to Knowledge, to be held in The Hague, 24-25 October 2013.

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

EPC Exhibit 136-25

April 22, 2013

THE LIBRARY OF CONGRESS

Dewey Section

To: Caroline Kent, Chair

Decimal Classification Editorial Policy Committee

Cc: Members of the Decimal Classification Editorial Policy Committee

Karl E. Debus-López, Chief, U.S. General Division

From: Rebecca Green, Assistant Editor

Dewey Decimal Classification

OCLC Online Computer Library Center, Inc.

Via: Michael Panzer, Editor in Chief

Dewey Decimal Classification

OCLC Online Computer Library Center, Inc

Re: Modeling topic relationships

Attached is a preprint of a paper to be presented as part of UDC Seminar 2013, Classification &

Visualization: Interfaces to Knowledge, to be held in The Hague, 24-25 October 2013.

Page 2: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

Rebecca Green OCLC, Inc., Dublin, Ohio, USA Diane Vizine-Goetz OCLC, Inc., Dublin, Ohio, USA Marcia Lei Zeng Kent State University, Kent, Ohio, USA Maja Žumer University of Ljubljana, Ljubljana, Slovenia

From Modeling to Visualization of Topic Relationships in Classification Schemes Abstract: Although developed primarily for controlled vocabularies, the Functional Requirements for Subject Authority Data (FRSAD) conceptual model has been extended to classification schemes, with a class corresponding to FRSAD’s thema, and a class notation and the hierarchically-contextualized caption of a class both corresponding to FRSAD’s nomen. This paper explores extending the FRSAD model to accommodate a topic-centered view of the Dewey Decimal Classification (DDC), in which topics are recognized as themas and Relative Index terms as nomens; a complex series of relationships involving topics and/or RI terms is also recognized. These subject authority data (which require local extensions to MARC classification and authority formats) support different user groups, including—in the DDC context—editors, translators, classifiers, information professional intermediaries, and end users. Use scenarios based on a topic-centered view of the DDC require system assistance for, e.g., an editor’s revision of a topic’s treatment throughout the DDC and an end user’s discovery of resources topically related to a known resource, but not necessarily assigned the same class number. Visualization strategies supporting these use scenarios are proposed.

Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization

1. Introduction In previous work, the Functional Requirements for Subject Authority Data (FRSAD) conceptual model (2011), although developed primarily for controlled vocabularies, has been extended to classification schemes (Mitchell, Zeng, & Žumer, 2011, 2012). For example, a class in the Dewey Decimal Classification (DDC) system corresponds to FRSAD’s thema (any entity used as a subject of a work), while a class notation and the hierarchically-contextualized caption of a class both correspond to FRSAD’s nomen (the appellation by which a thema is known). Relative Index

1 (RI) terms with a functional-equivalence relationship

to the class (RI terms that match either the caption or topics in the class-here note) and RI terms with a near-synonym relationship between such index terms

1 The DDC’s Relative Index is an alphabetical index that connects subjects/topics to the system’s

classes, which are arranged by discipline.

Page 3: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

2

(e.g., lexical variants) are considered as alternative nomens for the class. This paper explores extending the FRSAD model to accommodate a topic-centered view of the DDC. To help illustrate this work, a partial description of the DDC class with notation 388.42 and with ―Social sciences / Commerce, communications & transportation / Communications and transportation / Transportation / Local transportation / Specific kinds of local transportation / Local rail transit systems‖ as its hierarchically-contextualized caption is given as Figure 1; this class is the source of examples in sections 2 and 3.

Figure 1: Partial class description of 388.42 Local rail transit systems

2. Modeling topic relationships in the DDC

300 Social sciences

380 Commerce, communications & transportation

383–388 Communications and transportation

388 Transportation

388.4 Local transportation

388.41–388.46 Specific kinds of local transportation

388.42 Local rail transit systems

Including monorail systems, guided-way systems

Class here elevated rail transit systems, local surface rail systems using conventional (heavy) rail technology; rolling stock; underground systems; comprehensive works on local rail transit systems with multiple transit modes

Class rail terminals and stations in 388.472

For light rail transit systems, see 388.46

See also 388.34 for trolleybuses; also 388.413223 for trolleybus services

Selected indexing:

Guided-way systems Rail transit systems Local rail transit systems Surface rail transit systems    Local rail transit systems --  Surface rail transit systems -- transportation services transportation services Local railroads Underground railroads Monorail railroads Underground transportation Monorail railroads --   transportation services

Page 4: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

3

Topical/semantic relationships abound within the DDC (Mitchell, 2001; Green & Panzer, 2010). To model them requires taking the application of the FRSAD model to classification systems that has been worked out thus far (Mitchell, Zeng, & Žumer, 2011, 2012) and extending it further. One extension is to recognize topics, which are discrete subjects (e.g., the topic LOCAL

RAILROADS), as themas. Another extension is to recognize all RI terms as nomens (while retaining the restriction that only RI terms with a functional-equivalence relationship to a class can be considered alternative nomens for the class). A third extension is to recognize a series of relationships involving topics and/or RI terms: topics can be related (1) to classes, (2) to other topics (both thema-to-thema relationships), or (3) to RI terms (a thema-to-nomen relationship); RI terms can be related to each other (a nomen-to-nomen relationship). Indeed, our extensions incorporate modeling the Relative Index as a separate controlled vocabulary under the FRSAD model, which facilitates the exposing of topic-to-topic relationships. Additional extensions will be presented below. Figure 2 depicts the full data model discussed here; thema entity classes are located on the left, nomen entity classes on the right. Portions of the data model shown with solid lines reflect the thema-nomen relationships currently recorded in the Dewey Editorial Support System (the interface used by the Dewey editors to maintain DDC data). Portions of the data model shown with dashed lines reflect the thema-nomen entities and relationships needed to support an enhanced, topic-centered view of the data.

Figure 2: FRSAD-oriented data model of DDC (themas on left, nomens on right; only portions in solid lines now systematically captured in DDC database)

The topic-to-class relationship in a classification scheme has a quality unique among thema-to-thema relationships because of the nature of classes. A classification system presents a class through a description, enriched by the

Page 5: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

4

mention of specific topics associated with the class, as conveyed primarily through captions (e.g., Local rail transit systems) and notes (e.g., ―Including monorail systems, guided-way systems‖), both those associated with the class or, in some cases, those associated with classes in the upward hierarchy (e.g., the class-here note at 388.4 Local transportation, ―Class here urban and suburban transportation, rapid transit, mass transit, commuter services‖). A class can be seen as a conceptual space in which topics are located; for most classes, at least some of their space does not correspond to any named topic. Therefore, the conceptual extent of a class cannot be defined by summing the extent of explicitly mentioned topics. This claim is perhaps most clearly seen in the context of different language versions of a classification scheme: while the meaning of a class does not vary across languages, for cultural reasons the topics mentioned may vary. The topic-to-class relationship calls for extending the basic FRSAD conceptual model by adding an attribute to the topic-to-class relationship; this attribute records whether or not the topic is functionally equivalent to the class. For example, the topic UNDERGROUND SYSTEMS (mentioned in the class-here note) is functionally equivalent to the class at 388.42, but the topic MONORAIL SYSTEMS (mentioned in the including note) is not. A primary topical thema-to-nomen relationship in a classification scheme is the relationship between a topic (e.g., MONORAIL SYSTEMS) and its expression as an index term (e.g., Monorail railroads). In the DDC, the Relative Index operates as a separate controlled vocabulary, but with some nuances. On the one hand, RI terms can participate in nomen-to-nomen equivalence relationships (e.g., the relationship between Local rail transit systems and Local railroads) and whole-part relationships (e.g., the relationship between Surface

rail transit systems -- transportation services and Surface rail transit systems), as set forth in the FRSAD model (2011, pp. 31-32). On the other hand, RI terms are not applied to an external set of works, but are designed to apply to the system’s topics. At present, the relationship between RI terms and topics uses the notation of the class in which the topic is located as an intermediary. Green (2008) reports on work that supports making the RI-heading-to-topic relationship visible. Ultimately, this visibility hinges on identifying the text in a class description that reflects a specific topic (e.g., ―local surface rail systems using conventional (heavy) rail technology‖), the overall set of text constituting a parallel set of nomens in the DDC (but not a fully controlled vocabulary). Consequently, nomen-to-nomen equivalence relationships are of two kinds: equivalence relationships between two RI terms and equivalence relationships between an RI heading and a textual expression. As with other controlled vocabularies, the various semantic relationships posited between RI terms reflect the fact that their associated themas/topics can participate in generic, whole-part, instance, or perspective hierarchical relationships, as well as associative relationships. Further complexity in modeling thema-to-thema topic relationships in classification schemes arises from notes that send the classifier from one class

Page 6: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

5

to another class. The DDC has class-elsewhere notes (e.g., ―Class rail terminals and stations in 388.472‖), see references (e.g., ―For light rail transit systems, see 388.46‖), and see also references (e.g., ―See also 388.34 for trolleybuses‖) that perform this function. Studies of these notes (Green, 2011; Green & Panzer, 2011) reveal that they depend on standard semantic relationships between topics in the referring and referred-to classes, although often it is only the topic in the referred-to class that is explicit. 3. Data format revision for topic relationships Data in the Dewey Editorial Support System uses the MARC classification format (Library of Congress, Library and Archives Canada, & British Library, 2000–) to record data about classes (e.g., notation, parent notation, caption, notes) and the MARC authority format (Library of Congress, Library and Archives Canada, &

British Library, 1999–) to record data about Relative Index terms, including relationships between RI terms. In addition, data in Relative Index authority records reflect which classes are indexed by which RIs, by associating RIs with class notation. (The authority format is also used to capture mappings between headings in other vocabularies and DDC classes, but those relationships are outside our present scope.) Some of the data elements and relationships needed to inform a topic-centered view of the DDC are not currently accommodated in the MARC classification and authority formats. For these, local extensions to the formats will be required. Our foremost concern is with the expression of topic relationships in the MARC classification and authority formats. As a thema, a topic is conceptual and requires a corresponding nomen to make it known/accessible. On the one hand, topics are reflected in their corresponding Relative Index terms. On the other hand, topics are reflected in the text of a class description, e.g., in a caption, in a note. Indeed, many fields in the MARC classification format provide a specific subfield (t) for topics. For example, the 680 field for the including note of 388.42 can be coded as: $i Including $t monorail systems $i, $t guided-way systems The crucial question is how to show the equivalence relationship between a topic expressed in a class description (captured in the MARC classification format) and the (same) topic expressed by a Relative Index term (captured in the MARC authority format), for example, between ―monorail systems‖ in the 388.42 including note and Monorail railroads, one of its Relative Index terms. Because we assume that all topics are expressed by one or more Relative Index terms, the most natural place for expressing this relationship is in the authority record, which is where relationships between Relative Index terms and DDC classes are maintained.

Page 7: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

6

Links between DDC class numbers and authority headings are given in the 083 field of the authority format, as shown in the following excerpt:

001 och00081442

040 ##  $a OCoLC-D $b eng $c OCoLC-D $d OCoLC-D $f ddcri

083 04 $a 388.42 $0 (OCoLC-D)ocd00142795 $2 23 150 ##  $a Monorail railroads

This 083 field shows that the heading in the 1XX field, i.e., Monorail railroads, is associated In DDC 23 with the class identified by notation 388.42, located by following the link in the $0 subfield. We propose to add local subfields to the 083 field to identify the text in the 388.42 record that reflects this topic ($9 rt=, where rt is a mnemonic for reflecting text) and to indicate in which field that text is found ($9 mt=, where mt is a mnemonic for MARC tag); we show further that the topic is in standing room ($9 fe=SR) in 388.42:

001 och00081442

040 ##  $a OCoLC-D $b eng $c OCoLC-D $d OCoLC-D $f ddcri

083 04 $a 388.42 $0 (OCoLC-D)ocd00142795 $2 23 $9 fe=SR $9 rt=monorail systems $9 mt=680

150 ##  $a Monorail railroads We note that the reflecting text for a topic may incorporate certain complexities, for example, distributed modifiers. For example, the caption at 005.86, Data backup and recovery, names two topics, DATA BACKUP and DATA RECOVERY. The latter topic would be handled with an ellipsis:

001 och00035967

040 ##  $a OCoLC-D $b eng $c OCoLC-D $d OCoLC-D $f ddcri

083 04  $a 005.86 $0 (OCoLC-D)ocd00117092 $2 23 $9 fe=SR $9 rt=data...recovery $9 mt=153

150 ##  $a Data recovery Links between authority records and class records also display in the corresponding classification records, in dedicated (Index Term, 70X–75X) fields; these links, however, are maintained solely through authority records. We assume the like-minded enhancement of this data display to include the new reflecting text and MARC tag subfields. For example: 001 ocd00142795

084 0# $a ddc $c 23 $e eng 153 ## $a 388.42 $e 388.41 $f 388.46 $g * $j Local rail transit systems

750 #7 $a Monorail railroads $0 (OCoLC-D)och00081442 $2 ddcri $9 fe=SR $9 rt=monorail systems $9 mt=680

We can use a similar strategy to flesh out Topic-in-Class-<related to>-Topic-in-Class relationships for see and see-also references. For example, see

Page 8: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

7

references lead from a comprehensive number for a topic to the number for a component part. The see reference is found in the record for the comprehensive number and explicitly states the number for and the topic of the component part; the comprehensive number topic, which is only implicit, can be captured in reflecting text and MARC tag subfields. Additionally, all class numbers are treated as clickable links to the classification records for those numbers. 001 ocd00142795

084 0# $a ddc $c 23 $e eng

153 ## $a 388.42 $e 388.41 $f 388.46 $g * $j Local rail transit systems 253 0# $i For $t light rail transit systems $i , see $a 388.46 $9 rt= Local

rail transit systems $9 mt=153 4. User scenarios As noted in the FRSAD report (2011), subject authority data support different user groups. In the DDC context, user groups include editors, translators, classifiers, information professional intermediaries, and end users. Use scenarios based on our topic-centered view of the DDC incorporate system assistance for, e.g., an editor’s revision of how a specific topic is treated throughout the system, a classifier’s assignment of a DDC number to a resource, an end user’s discovery of resources that are topically related to a known resource, but not necessarily assigned the same class number. (For further insight into classification support of use scenarios, see, for example, Markey, 2006, on online classification as a tool for catalogers and end users, and Nicholson et al., 2006, on HILT, a project using the DDC as a spine to promote end user access across disparate knowledge organization systems.) We will explore two of these scenarios in greater detail: (1) the editor’s revision of a topic’s treatment throughout the DDC and (2) an end user’s discovery of resources topically related to a known resource, but not necessarily assigned the same class number. For purposes of discussion, we will assume that the prerequisite steps of topic identification and data capture have already been implemented. We take as our first example the revision of CONCERTOS in the DDC, where the immediate concern is to identify all classes having to do with CONCERTOS. Our topic-centered approach is able to make all relevant classes discoverable, based on the following data and data relationships:

The Relative Index term Concertos indexes 784.23 Orchestra with one or more solo instruments.

A class-here note for ―comprehensive works on concertos‖ is found at 784.23 (the use of a class-here note communicates that concertos approximate the whole of the class).

The following see references occur at 784.23 (the function of see references is to lead from a comprehensive number for a topic to the subordinate parts of the topic):

“For concerto form, see 784.18”

Page 9: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

8

“For orchestra with more than one solo instrument, see 784.24”

“For orchestra with one solo instrument, see 784.25”

A class-here note for ―comprehensive works on solo concertos‖ is found at 784.25.

A see reference leads from 784.25 to 784.26–784.28 for ―specific solo instruments.‖

Numerous Relative Index terms of the form [Specific instrument] concertos, (e.g., Piano concertos, Violin concertos) index numbers built by following the add instruction at 784.26–784.28. The add instruction from which a specific built number is derived is ascertainable from the 765 / Synthesized-number-components field in the class record for the number.

The Relative Index term Concertos—musical form indexes 784.186 Concerto form.

The centered entry immediately above 784.186 in its upward hierarchy, 784.183–784.189 Instrumental forms, includes a note that begins, ―Except for concerto form, comprehensive works on an instrumental form . . .‖ The upward notational hierarchy for a class can be traversed by following parent links in class records.

Incidentally, a string search for ―concerto*‖ (where * serves as the truncation character) across all class descriptions would have failed to identify two of those classes, 784.24 Orchestra with more than one solo instrument and 784.26–784.28 Specific solo instruments with orchestra. Our second example focuses on identifying bibliographic resources topically related to a resource classed at 616.462 Diabetes mellitus. For our current purposes, we will limit our task to identifying topically-related classes of the source class. Our topic-centered approach uses the following data and data relationships:

The class 616.462 is indexed by two Relative Index terms, Diabetes -- medicine and

Diabetes  mellitus -- medicine.

Relative Index terms including the word ―diabetes‖ or the phrase ―diabetes  mellitus‖ index the following class clusters (clusters have been formed on the basis of the DDC’s hierarchically-expressive notation; 59 Relative Index terms for diabetes lead to 17 different numbers, grouped into 5 clusters): o Under Social problems and services / People with physical illnesses / Services to

patients with specific conditions: 362.196462 (Diabetes), 362.1964622 (Diabetes mellitus (Type 1)), 362.1964624 (Diabetes mellitus (Type 2))

o Under Medicine and health / Diseases: 616.462 (Diabetes -- medicine), 616.46200835 (Diabetes -- adolescent medicine), 616.46206 (Diabetes -- therapy),  616.4620654 (Diabetes -- diet therapy), 616.4622 (Diabetes mellitus (Type 1) -- medicine), 616.462200835 (Diabetes mellitus (Type 1) – adolescent medicine), 616.4624 (Diabetes mellitus (Type 2) -- medicine), 616.462400835 (Diabetes

mellitus (Type 2) -- adolescent medicine), 616.47 Diabetes insipidus -- medicine o Under Medicine and health / Diseases and complications of pregnancy: 618.3646

(Diabetes -- pregnancy complications -- obstetrics) o Under Medicine and health / Pediatrics: 618.92462 (Diabetes mellitus -- pediatrics),

618.924622 (Diabetes mellitus (Type 1) -- pediatrics), 618.924624 (Diabetes mellitus (Type 2) -- pediatrics),

o Under Home and family management / Cooking: 641.56314 (Diabetes mellitus -- cooking for)

The user should first choose which aspects of DIABETES are of concern—SOCIAL SERVICES, MEDICAL DISEASES, PREGNANCY, PEDIATRICS, and/or COOKING. A search in the online catalog under class numbers deemed to be of interest can then be used to identify topically-related resources.

Page 10: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

9

5. Visualization Because of the complexity of the relationships involved, these user scenarios call for the use of visualization techniques

2 for effective and efficient execution.

Specifically, visualization techniques should help users scan the cognitive terrain, gaining a perspective on relationships among topics, relationships among classes, and relationships between classes and topics. The visualizations used should also help users navigate among classes and topics, based on these relationships. Because we intend for all topics to be represented by one or more Relative Index terms, for visualization purposes we will assume the interchangeability of topics and RIs. We also assume machine access to DDC data. Specifically, the following topic-oriented relationships need to be addressed:

1. Given a topic, what other topics is it (directly) related to? 2. Given a topic, what classes is it (directly) associated with? 3. Given a-topic-in-a-class, what classes is it topically related to? 4. Given a class, what classes are in its upward hierarchy?

For the first relationship, we assume that interaction with the system commences with a topic / Relative Index term search on authority records. If the number of authority records retrieved does not exceed some threshold, the system presents a simple list of relevant RIs, but if the number of records returned exceeds the threshold, the system responds with an ordered list of subheadings. For example, a search for the topic DIABETES (implemented as a keyword search on ―Diabetes‖) retrieves 59 Relative Index terms distributed across 18 classes. The RIs are presented in Figure 3 in the context of their associated subheadings. Subheadings are ordered by class number to facilitate the choice of the aspect or discipline of interest. Clicking on a subheading produces a list of RI headings containing the subheading. Selecting a specific heading from such a list leads to its authority record and a display of related RIs/topics, presented as clickable links. The standard equivalence, hierarchical and associative topic relationships are supported by the MARC authority format. For the second relationship, we assume that interaction with the system commences with a topic / Relative Index term search on classification records (implemented as a keyword search on RI terms associated with classes). Searches may return multiple classes, a single class, or no classes, using a display that visually maps classes across DDC’s expressive notation, as seen in Figure 4. In this display option, dots represent clusters of classes (where a cluster may consist of multiple classes or of only a single class). For

2 We assume the foundational definition of visualization put forth by Card, Mackinlay, &

Shneiderman (1999, p. 6): ―the use of computer-supported, interactive, visual representations

of data to amplify cognition.‖

Page 11: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

10

Figure 3: Tree-style display of Relative Index terms ordered by class number and

subheading

example, we start again with a search on the topic DIABETES. Figure 4 shows that this topic is related to five clusters of classes—one cluster in the 360s, three clusters in the 610s, and one cluster in the 640s.

3 Mousing over a dot

3 Some user groups want to find only those classes where the topic is part of the class. Other user

groups want to identify all classes where the topic is mentioned in the class description,

including in a class-elsewhere note, see reference, or see-also reference. Color coding could

be used to distinguish between topic inclusion and topic mention including in a class-elsewhere

Page 12: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

11

displays the specific class numbers in the cluster. Clicking on a dot supports drilling down, with each lower level display providing access to two notational levels more specific than what all the classes in the source cluster have in common (see Figure 5, where dots represent classes at 616.462,

616.46200835, 616.46206, 616.4620654, 616.4622, 616.462200835, 616.4624, and 616.462400835). At the finest level of granularity, a dot represents a single class; clicking on such a dot returns a class description / a

Figure 4: Mousing over dendogram display (borrowing from Denton 2012)

Figure 5: Drilling down in dendogram at 616.462

note, see reference, or see-also reference. Color coding could be used to distinguish between

topic inclusion and topic mention.

Page 13: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

12

classification record. Each text segment corresponding to a topic is a clickable link leading to the corresponding authority record. The text segment reflecting the searched-for topic will be highlighted. (Figure 6 represents an alternative view on the data shown in Figure 4.) For the third relationship, we start from the display of a relevant classification record, where, as previously noted, topics are reflected by text segments. All Topic-in-Class-<related to>-Topic-in-Class relationships are accompanied by clickable links for the topically-related referred-to classes. The desired navigation is, therefore, supported directly by data in the classification record. The fourth, and final, relationship is also supported directly by data in the classification record. The upward hierarchy can be notational and/or structural. (See references, for example, set forth structural hierarchy relationships that are not supported notationally.) Classes in the upward notational hierarchy can be accessed by clicking on the parent class link in the 153 $e subfield. Classes in the upward structural hierarchy rely on data in the 553 / Valid number tracing field, which captures the information from a see reference in the record for the referred-to class.

Figure 6: Pyramid display (borrowing from Ibarra, 2008)

Page 14: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

13

6. Conclusion Much of the conceptual power of the Dewey Decimal Classification depends on relationships that involve classes, topics, and Relative Index terms. However, these topics have not been systematically identified, and their relationships to other system components have not been captured. Enhancing DDC data with explicit topics and topic relationships will permit users of the system to accomplish a variety of tasks. Thus, the topic modeling, revision of data formats, and analysis of use scenarios addressed here are precursor tasks to the long-term effort of identifying topics and topic relationships throughout the classification system, capturing those data in the revised data formats, and generating visualizations to support use scenarios. Acknowledgments We acknowledge our debt to Joan S. Mitchell, former Editor in Chief of the Dewey Decimal Classification, who spearheaded the work on applying FRSAD to classification schemes and without whom this paper would never have been written. We thank JD Shipengrover for assistance with the design and preparation of figures. References Card, S. K.; Mackinlay, J. D.; Shneiderman, B. (1999). Readings in information

visualization: using vision to think. San Francisco, Calif: Morgan Kaufmann Publishers.

Denton, W. (2012). On dentographs, a new method of visualizing library collections. Code4Lib Journal (16). Available at: http://journal.code4lib.org/articles/6300.

Functional requirements for subject authority data, a conceptual model (FRSAD).

(2011). IFLA Working Group on Functional Requirements for Subject Authority Records (FRSAR). Edited by M. L. Zeng, M. Žumer, A. Salaba. Berlin/München: De Gruyter Saur. Available at: http://www.ifla.org/files/ assets/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf.

Green, R. (2008). Making visible hidden relationships in the Dewey Decimal

Classification: how Relative Index terms relate to DDC classes. In: Culture and identity in knowledge organization: proceedings of the Tenth International ISKO Conference, 5-8 August 2008, Montréal, Canada. Edited by C. Arsenault, J. T. Tennis. Würzburg: Ergon, pp. 8-14.

Green, R. (2011). See-also relationships in the Dewey Decimal Classification.

Knowledge Organization, 38, pp. 335-341. Available at: http://journals.lib.washington.edu/index.php/nasko/article/view/12789.

Green, R.; Panzer, M. (2010). The ontological character of classes in the Dewey

Decimal Classification. In: Paradigms and conceptual systems in knowledge organization: proceedings of the Eleventh International ISKO Conference, 23-26

Page 15: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

14

February 2010, Rome, Italy. Edited by C. Gnoli, F. Mazzocchi. Würzburg: Ergon, pp. 171-179.

Green, R.; Panzer, M. (2011). Relationships in the notational hierarchy of the Dewey

Decimal Classification. In: Proceedings of the International UDC Seminar: Classification and ontology: formal approaches and access to knowledge, The Hague, Netherlands, 19-20 September 2011. Edited by A. Slavic, E. Civallero. Würzburg: Ergon, pp. 161-176.

Ibarra, N. (2008). Dewey Decimal Classification System. Available at:

http://www.designisthenorm.com/dewey.html. Library of Congress, Library and Archives Canada, and British Library. (1999–). MARC

21 format for authority data. Washington, DC: Library of Congress, Network Development and MARC Standards Office. Available at: http://www.loc.gov/marc/authority/ecadhome.html.

Library of Congress, Library and Archives Canada, and British Library. (2000–). MARC

21 format for classification data. Washington, DC: Library of Congress, Network Development and MARC Standards Office. Available at: http://www.loc.gov/marc/classification/eccdhome.html.

Markey, K. (2006). Forty years of classification online: final chapter or future unlimited?

In: Moving beyond the presentation layer: content and context in the Dewey Decimal Classification (DDC) system. (Co-published as Cataloging & Classification Quarterly 42,3/4.) Edited by J.S. Mitchell, D. Vizine-Goetz. New York: Haworth Press, pp. 1-63.

Mitchell, J. S. (2001). Relationships in the Dewey Decimal Classification system. In:

Relationships in the organization of knowledge. Edited by C. A. Bean, R. Green. Dordrecht: Kluwer Academic, pp. 211-226.

Mitchell, J. S.; Zeng, M. L.; Žumer, M. (2011). Extending models for controlled

vocabularies to classification systems: modelling DDC with FRSAD. In: Classification & ontology: formal approaches and access to knowledge: proceedings of the International UDC Seminar, 19-20 September 2011, The Hague, The Netherlands. Edited by A. Slavic, E. Civallero, 241-250. Würzburg: Ergon Verlag.

Mitchell, J. S.; Zeng, M. L.; Žumer, M. (2012). Modeling classification systems in

multicultural and multilingual contexts. Paper presented at the IFLA Satellite Post-Conference: Beyond Libraries – Subject Metadata in the Digital Environment and Semantic Web.17-18 August 2012, Tallinn, Estonia. (Preprint available at: www.nlib.ee/html/yritus/ifla_jarel/papers/3-3_Mitchell.docx)

Nicholson, D.; Dawson, A.; Shiri, A. (2006). HILT: A pilot terminology mapping service

with a DDC spine. In: Moving beyond the presentation layer: content and context in the Dewey Decimal Classification (DDC) system. (Co-published as Cataloging & Classification Quarterly 42,3/4.) Edited by J.S. Mitchell, D. Vizine-Goetz. New York: Haworth Press, pp. 187-200.

Page 16: EPC Exhibit 136-25 April 22, 2013 THE LIBRARY OF CONGRESS · Keywords: FRSAD, Dewey Decimal Classification, DDC, topic modeling, data formats, classification visualization 1. Introduction

15

About authors Rebecca Green is an assistant editor of the DDC, with specific responsibilities related to DDC training modules and investigation of relationships in the DDC (with a long-term goal of developing a version of the system to support automated applications). Rebecca came to OCLC from her position as associate professor in the College of Information Studies at the University of Maryland. While there she co-edited two volumes on relationships, Relationships in the Organization of Knowledge and The Semantics of Relationships: An Interdisciplinary Perspective. Rebecca is a member of the Scientific Advisory Council of the International Society for Knowledge Organization (ISKO) and of the Editorial Board of Knowledge Organization. Diane Vizine Goetz is a senior research scientist at OCLC. Diane's research interests include knowledge organization, indexing and retrieval, and database quality control. Her current research activities involve applying principles of the Functional Requirements for Bibliographic Records (FRBR) model to large bibliographic datasets. Diane has also developed automated tools for catalogers and classifiers. Marcia Lei Zeng is professor at Kent State University. She has been involved in the development and research of knowledge organization systems for over 20 years and has been contributing to related standards including NISO Z39.19 and ISO 25964 for controlled vocabularies. She was also the chair of IFLA Working Group that developed the model of Functional Requirements for Subject Authority Data (FRSAD), and an Invited Expert on the W3C Library Linked Data Incubator Group. She is Director-at-large of the American Society for Information Science and Technology (ASIS&T). Maja Žumer is Professor of Information Science at the University of Ljubljana (Slovenia). Her research interests include design and evaluation of information retrieval systems, end-user interfaces, and conceptual modeling. She has been involved in several IFLA working groups, NISO committees, and several EU projects. She has received several international and national research grants. She is a member of the IFLA FRBR Review Group and was the co-chair of IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR).