taxonomy development and digital projects

38
© Copyright 2009 Dow Jones and Company Taxonomy Development and Digital Projects Laura Dorricott Project Delivery Manager, Taxonomy Services Dow Jones Client Solutions January 25, 2009 Networked Resources and Metadata Interest Group ALA Midwinter 2009

Upload: daniela-barbosa

Post on 11-May-2015

2.689 views

Category:

Technology


0 download

DESCRIPTION

Presentation from ALA Midwinter 2009 (American Library Association) meeting as part of the Networked Resources and Metadata Interest Group (NRMIG). A discussion on taxonomy development lead by Laura Dorricott a Taxonomy Project Delivery Manger with Dow Jones Taxonomy Services on Sunday, January 25th 2009. Corresponding Blog post with notes from session by Laura available here: http://synapticacentral.com/content/notes-session-taxonomy-development-and-digital-projects

TRANSCRIPT

Page 1: Taxonomy Development and Digital Projects

© Copyright 2009 Dow Jones and Company

Taxonomy Development and Digital Projects

Laura Dorricott

Project Delivery Manager, Taxonomy Services

Dow Jones Client Solutions

January 25, 2009

Networked Resources and Metadata Interest Group

ALA Midwinter 2009

Page 2: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Introduction

Laura Dorricott, Project Delivery Manager, Taxonomy Services, Dow Jones Client Solutions

IHS, Inc. – Indexer and Lexicographer

Synapse – 1995-2005

Taxonomist and Operations Director

Dow Jones – 2005 – Project Delivery Manager

Page 3: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Information management needs – What do we do with this???

American

Theo LeSieg

Theodore Seuss Geisel

Children’s writer

March 2, 1904

Springfield, MA

Articles about “Dr. Seuss

Dr. Seuss

Page 4: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

© 2007, Dow Jones

Taxonomy’s Evolutionary Path

Dictionaries& Flat Lists

HierarchicalTaxonomies

ControlledVocabularyThesauri

Ontologies

StructuredAuthority Files

Taxonomies are the building blocks for ontologies and ontologies are semantic

representations of the real world in all its rich diversity.

Taxonomy is evolving

organically…

Page 5: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Definitions of Controlled Vocabularies

List:

“Sometimes called a pick list, a limited set of terms arranged as a simple alphabetical list or in some other logically evident way.”

Synonym ring:

“A group of terms that are considered equivalent for the purposes of retrieval.”

Taxonomy:

“A collection of controlled vocabulary terms organized into a hierarchical structure. Each term has one or more parent/child (broader/narrower) relationships to each other term.”

Thesaurus:

“A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.”

Page 6: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Next Generation

Ontology:

“A controlled vocabulary developed to bridge the gap between the real

world and the information world, by striving to exactly model and

control all the fundamentals of information concepts with the goal

of building a new class of intelligent technologies and knowledge

systems.”

Page 7: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Purposes of Controlled Vocabularies

Translation Consistency

Provide a framework of concepts that accurately represents the real world.*

Indication of semantic relationships Hierarchical arrangement to assist browsing Search and retrieval

• Improve precision and recall• Reduce search time

* Real world includes physical objects, databases, digital content, and abstract domains of knowledge

Page 8: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

SEARCH

Page 9: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Keyword Search

Keyword searching is insufficient People do not always know what they want People all have different “keywords” People don’t perform complex keyword searches One word can have many meanings

Two or more words can share the same meaning

Page 10: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

one thing can have many different names

Dr. Peter Roget

one word can mean very different things

Page 11: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Page 12: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Taxonomy helps people filter out the noise and discover the relevant

things regardless of what they are called.

Page 13: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

NAVIGATE

Page 14: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Search and Navigation are not

alternative solutions, they are

complementary solutions

Users expect both

Page 15: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Points of view…

Page 16: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

one point of view…

Page 17: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

another point of view…

Page 18: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Different audiences will have different views and

good navigation will serve all of them.

Page 19: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Building a Taxonomy or Controlled Vocabulary

Now that we know what taxonomies and controlled vocabularies are and can see some of the reasons we need them – what do we do next???

Page 20: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Building a Taxonomy or Controlled Vocabulary

Basic issues and principles

One word can have multiple meanings (ambiguity) Two words can share the same meaning (synonymy) Semantic relationships Facets Warrant Structures Metadata

Page 21: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Ambiguity

Polysemes (homonyms, homographs)

cranes (birds)cranes (equipment)

Mercury (planet)Mercury (god)Mercury (car)Mercury (metal)

Ambiguity

Page 22: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Synonymy

Two words with the same or similar meaning Popular vs. scientific names Generic vs. trade names Slang vs. traditional terms Dialectical variants

Near-synonyms Lexical variants Generic postings

Synonymy

Page 23: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Semantic Relationships

Basic Types: Equivalence (USE/UF) Hierarchical (BT/NT) Associative (RT/RT)

Represented by standard codes/symbols

Reciprocity

Semantic Relationships

Page 24: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Hierarchical Relationships

Allow for browsable structures Information discovery Search expansion Three types:

Generic Instance Whole-part

Page 25: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Hierarchical Relationships

Between a class and its members

“IsA” relationship

A cactus IsA succulent plant, therefore:

succulent plants NT cacti

Generic Hierarchical Relationships

Page 26: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Hierarchical Relationships

Between a general category of things or events and an individual instance of that category

Instance is often a proper noun

Also an “IsA” relationship type

Example: mountains NT Rocky Mountains

Instance Hierarchical Relationship

Page 27: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Hierarchical Relationships

One concept inherently included in another

Examples: Systems and organs of the body Geographic locations Corporate, social, or political structures

Whole Part Hierarchical Relationships

Page 28: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Polyhierachy

Concept logically fits into two different hierarchical structures

Advantage of electronic structures, allows for different viewpoints

Example: Biochemistry

BT biologyBT chemistry

Page 29: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Associative Relationships

May suggest additional terms for indexing or searching

Between terms in the same hierarchyOverlapping sibling termsDerivational relationships

Between terms in different hierarchiesMany typesExamples: Process/agent; Action/property;

Cause/effect

Page 30: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Form of Terms

Single word or compound terms

Grammatical forms: Nouns and noun phrases Singular / plural

Capitalization Predominantly lowercase characters, except for proper

names, acronyms, trade names, etc.

Punctuation

Page 31: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

2007 Factiva, Inc. All Rights Reserved.

Standards

•“Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies,” ANSI/NISO Z39 19-2005

•“Z39 50: A Primer on the Protocol,” ANSI/NISO Z39 50

•“Structured Vocabularies for Information Retrieval. Guide. Definitions, Symbols and Abbreviations,” BS 8723-1:2005

•“Structured Vocabularies for Information Retrieval. Guide. Thesauri,” BS 8723-2:2005

•“Guidelines for the Establishment and Development of Multilingual Thesauri,” ISO 5964-1985

•“Guidelines for the Establishment and Development of Monolingual Thesauri,” ISO 2788-1986

•Web Ontology Language (OWL) Overview

Standards

Page 32: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

2007 Factiva, Inc. All Rights Reserved.

Value Proposition

“40% of corporate users…cannot find the information they need to do their jobs on their intranets.”

Susan Feldman, “The High Cost of Not Finding Information,” KMWorld, March 2004

Value Proposition, or “So what?”

Page 33: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Low productivity

High frustration

Little leverage of information

assets

Too many search results

Too many irrelevant hitsThe more precise

I get the more I miss

End-user search illiteracy Multilingual

content

Ambiguous results

Information retrieval issues within companies

Page 34: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

The controlled vocabulary value proposition

Unlock the value of internal and external content to:

Improve productivity

“Stop searching, start finding”

Reduce cost

Make existing content actionable, not dormant

Avoid reinventing wheels

Gain competitive advantage

Be better informed, act quicker

Page 35: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Controlled vocabulary’s role in portal success

Drive usage Improve user experience, leverage portal

investment Drive cultural change

Help develop a common language Support information exchange/reuse

Leverage information management skills Turn information officers into information

architects

Page 36: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Value Proposition

Taxonomies make it easier to find information so people are more likely to use intranets and extranets. This results in better return on the time and effort already invested in these intranets and extranets.

Taxonomies improve “hit” rates - people find what they need Everyone has experienced irrelevant results from internet search engines

because • Two or more words or terms can be used to represent a single concept

salinity/saltiness • Two or more words that have the same spelling can represent different

concepts Mercury (planet) Mercury (metal) Mercury (automobile)

Taxonomies eliminate much of this problem

People spend less time searching and more time finding

With a common taxonomy across the organization, knowledge can be more readily shared, reused and repurposed

Page 37: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

Controlled vocabulary can help reduce costs and increase revenue

Taxonomies can help organizations save money

Reduces the number of hours spent seeking information. Hierarchical relationships allow users to easily narrow or broaden searches as well as look for related information.

Improves productivity by reusing and repurposing content

A taxonomy can help increase revenue Increase customer satisfaction by improving

search efficiency findability Relevance

Provide timely information with up to date terminology Provide more precise information retrieval

Page 38: Taxonomy Development and Digital Projects

|© Copyright 2009 Dow Jones and Company

THANK YOU!

Laura Dorricott

[email protected]