taxonomies and metadata for content management

55
Taxonomies and Metadata for Content Management Michael Huff Information Resource Officer U.S. Department of State

Upload: norman-owens

Post on 02-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Taxonomies and Metadata for Content Management. Michael Huff Information Resource Officer U.S. Department of State. E-Government Act of 2002. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Taxonomies and Metadata for Content Management

Taxonomies and Metadatafor Content Management

Michael HuffInformation Resource OfficerU.S. Department of State

Page 2: Taxonomies and Metadata for Content Management

E-Government Act of 2002

• The use of computers and the Internet is rapidly transforming societal interactions and the relationships among citizens, private businesses, and the Government.

• The Federal Government has had uneven success in applying advances in information technology to enhance governmental functions and services, achieve more efficient performance, increase access to Government information, and increase citizen participation in Government.

• Most Internet-based services of the Federal Government are developed and presented separately, according to the jurisdictional boundaries of an individual department or agency, rather than being integrated cooperatively according to function or topic.

Page 3: Taxonomies and Metadata for Content Management

Which U.S. Government organizations are experienced in using metadata & taxonomy tools?

– Defense Intelligence Agency– USDA Economic Research Service (ERS)– Federal Aviation Administration– FirstGov

– NASA– Small Business Administration– Social Security Administration– Department of State

Page 4: Taxonomies and Metadata for Content Management

Terms Definitions

Metadata Data about data - a label that describes a content object so unstructured content can be managed like structured content.

Taxonomy The specification and classification of the names of people, places, things, and everything else that is needed to allow search engines and other content applications to work better.

Facet Classification

Discrete set of elements (or fields) for labeling content and content components.

Controlled Vocabulary

A managed set of terms for which there is an agreed upon value or definition.

Page 5: Taxonomies and Metadata for Content Management

Field Data Type / Source

Title string

Creator string

Identifier URL

Date date

Subject (~10,000 categories)

Metadata Taxonomy

Page 6: Taxonomies and Metadata for Content Management

• Adding metadata to unstructured content allows it to be managed like structured content.

• Enriching content with structured metadata is critical for supporting search and personalized content delivery.

• Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.

Why use metadata?

Page 7: Taxonomies and Metadata for Content Management

User experience. How content is presented and how users experience and interact with it dictates its perceived and actual value.

Content architecture: Scalable metadata framework to enable content reuse, and handle changes in organization goals, user needs, and retrieval concerns.

Tools and technology. The information supply-chain platform that enables workflows, and supports organizational and operational concerns.

Where does metadata fit in the information system architecture?

Page 8: Taxonomies and Metadata for Content Management

Content architecture defined

Scalable metadata framework to enable content reuse, and handle changes in organizational goals, user needs, and retrieval concerns.

Business objectives

Metadata specification

Vocabulary specification

Training

Content Types Organization Audience Location Function /Process

Market Product/ Services

Topics

Web Content Images Code Rich Media Document Type

- General Management

- Reports & Documentation

- Tracking- Control / Policies & Procedures

- Legal & Compliance

- Personnel- Learning/Training- Templates & Forms

- Public Relations- Models- Meeting- Credit

Internal- IT- HR/CRE- PASB- International- ESG- USCO- US Consumer Card

- Credit Risk Management

- Thrift- Auto Finance- Finance- Investor Relations- Legal- Strategy- Brand- Enterprise Risk Management

Committees- Executive- Cross-Functional

External- Contractors- Vendors- Affinity Relationships

- Partnerships- Board of Directors

International- Africa- Asia- Europe- Canada

US- Idaho- Massachusetts- Texas- Virginia- California- Washington- Florida

Business Processes- Develop Business Strategy

- Develop Products & Services Strategy

- Market Products- Process Orders- Service Customers

- Manage Customer Relationships

- Manage Collections and Recoveries

- Staff Services Operating / Supporting Processes

- Analytical Functions

- Communications Functions

- Financial Functions

- Information Handling Functions

- Maintenance- Organizational Functions

- Sponsoring- Project Management

• SDM• PMM

Lines of Business- Partnership Implementation

- Under-served- Lifestyle- Cross-Sell- Hispanic- Canada- Young Adults- CRS- E-Commerce- Smile- Small Business

Asset Type- Sub-Prime- Prime- Super Prime

LifePhases- Marriage- New U.S. Residents

- Young Adults- New Parents- Moving- Divorce- Death

Card- Credit Card

• Classic• Premium• Secured• Small

Business• Equity• Others

- Debit Loans

- Auto• PeopleFirst

- Home Equity• Full Spectrum• Countrywide• LoanCenter

- Medical• AmeriFee

- Installment- Home Equity

Insurance- Auto- Life

Savings Products- CDs- Money Markets- IRAs

Product Attributes- Annual Fee- Credit Line Levels- APR- Balance Transfer Rate

- Other Benefits

Contracts Credits Credit Line Management

Fee & Charges Finance Financial Institutions Financial Instruments

Management Market Strategy Marketing Mass Media Public Relations Purchasing Rates Rates and Rankings Ratios Research Risk Settlement and Damages

Statistics

Internal- By Tier- Level

• Leadership• Associate• Employee• Administrative

- Associate Type• Phone• Non-Phone

- Function• Manager• People Manager

• Non-Manager- Type

• Exempt• Non-Exempt

- Time with Firm• New Employee

• Old Employee External

- Customers- Regulators- Media- Non-Profit- Contractors- Vendors- Affinity Relationships

- Partnerships- Board of Directors

Content inventory

Content model

Rules & procedures

Page 9: Taxonomies and Metadata for Content Management

What is Dublin Core?

• Dublin Core is the metadata standard for describing Internet resources so they are easy to find.

Dublin Core approved as ISO 15836.

03 04

For more information: http://www.dublincore.org

Original workshop held in Dublin, Ohio.

Shanghai meeting.

95

Page 10: Taxonomies and Metadata for Content Management

Asset metadata – Who, Where & When:

Title, Creator, Publisher, Contributor, Date, Type,

Format, Identifier, Source, Language

Subject metadata –What & Why:

Subject, Description, Coverage

Relational metadata – Links between and to:

Relation

Use metadata – How can it be used:

Rights & Permissions

Enabled Functionality

Co

mp

lex

ity

Why is metadata important?

http://dublincore.org/documents/dcmi-terms/

More efficient editorial process

Better navigation &

discovery

Page 11: Taxonomies and Metadata for Content Management

The specification of the names of people, places, things The specification of the names of people, places, things

What is a taxonomy?

Kingdom Phylum Class Order Family Genus Species

AnimaliaChordata

MammaliaCarnivora

CanidaeCanis

C. familiari

Segment Family Class Commodity

44-Office Equipment and Accessories and Supplies .12-Office Supplies

.17-Writing Instruments

.05-Mechanical pencils

.06-Wooden pencils

.07-Colored pencils

Linnaeus …

UNSPSC …

The specification of the names of people, places, things … and everything else that is needed

to allow search engines and other content applications to work better.

The specification of the names of people, places, things … and everything else that is needed

to allow search engines and other content applications to work better.

Page 12: Taxonomies and Metadata for Content Management

Sample Recipe TaxonomySample Recipe Taxonomy

Controlled VocabulariesControlled Vocabularies

Main Ingredient

s

Cooking Methods

CoursesMeal Type Cuisines

ChocolateDairyFruitsGrainsMeat & SeafoodNutsOlivesPastaSpices & SeasoningsVegetables

AdvancedBakeBroilFryGrillMarinadeMicrowaveNo CookingPoachQuickRoastSautéSlow CookingSteamStir-fry

AppetizersBeveragesBreadsCheeseCocktailsDessertsFish & ShellfishFruitHors d'OeuvresMeatPastaSaladSandwichesSoupVegetables

BreakfastBrunchLunchSupperDinnerSnack

AfricanAmericanAsianCaribbeanContinentalEclectic/ Fusion/ InternationalJewishLatin AmericanMediterraneanMiddle EasternVegetarian

Facet CategoriesFacet Categories

Page 13: Taxonomies and Metadata for Content Management

The power of taxonomy facets

• 4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104)• Easier to maintain• Can be easier to

navigate

Main Ingredients

Cooking Methods

Meal Type Cuisines

ChocolateDairyFruitsGrainsMeat & SeafoodNutsOlivesPastaSpices & SeasoningsVegetables

BreakfastBrunchLunchSupperDinnerSnack

AfricanAmericanAsianCaribbeanContinentalEclectic/ Fusion/ InternationalJewishLatin AmericanMediterraneanMiddle EasternVegetarian

AdvancedBakeBroilFryGrillMarinadeMicrowaveNo CookingPoachQuickRoastSautéSlow CookingSteamStir-fry

Page 14: Taxonomies and Metadata for Content Management

7 Common taxonomy facets

Facet Definition Example Source

Products and Services

Names of products and services. ERP system, Your products and services, etc.

Organization Organizational structure. FIPS 95-2, Your organizational structure, etc.

Content Type Structured list of the various types of content being managed or used.

AGLS Document Type, AAT Information Forms , Records management policy, etc.

Industry Broad market categories such as lines of business, life events, or industry codes.

FIPS 66, SIC, NAICS, etc.

Location Place of operations or constituencies. FIPS 5-2, FIPS 55-3, ISO 3166, US Postal Service, etc.

Function Functions and processes performed to accomplish mission and goals.

FEA Business Reference Model, Enterprise Ontology, AAT Functions, etc.

Audience Subset of constituents to whom a piece of content is directed or intended to be used.

GEM, ERIC Thesaurus, IEEE LOM, etc.

Topic Business topics relevant to your mission and goals.

Federal Register Thesaurus, ERIC Thesaurus, ProQuest, etc.

Personalized content delivery requires defining taxonomy facets

… and re-use of existing vocabulary sources

Page 15: Taxonomies and Metadata for Content Management

Content types Organization Audiences TopicsFunctionality and

process Locations MarketsProduct and

services

High-Level Taxonomy

Document

Rich-Media

Web content

Internal

External

Internal

External

Contracts

Credit

Fees andcharges

Finance

Competition

Financialinstitutions

Financialinstruments

Management

Marketstrategy

Marketing

Product design

Customeracquisition

Credit policies

Riskmanagement

Collectionpractices

Retentionprocess

Cross-selling

Projectmanagement

Governance

Testing

Contractors

Vendors

Customer

Vendors

Regulators

Contactors

Media

US

City

Country

Provences

States

LOB

Life events

Demographics

Credit cards

Insurance

Loans

FinancialServices

Source code

Suppliers

Partners

International

Military

Applying the facets to the Dublin Core metadata elementsDublin Core

ElementsDefinition Vocabulary

Source

Title Resource name. Not applicable

Creator Content maker. LDAP

Subject Content topic. Keyword Topic facet

Description Description of content, summary. Not applicable

Publisher Publisher of this manifestation. Agency facet

Contributor Content contributor. LDAP

Date Content lifecycle event for this manifestation.

Not applicable

Type Genre. Form Type facet

Format Format of this manifestation. RFC 2045

Identifier Reference for this manifestation, e.g., URL.

Not applicable

Source Source from which this manifestation has been derived.

Not applicable

Language Language of this manifestation. ISO 639

Relation Reference to related resource. None

Coverage Space, period, date, jurisdiction, etc.

Jurisdiction facet

Rights Who has rights to use this manifestation.

Privacy level

Applied taxonomy metadata facilitates a multi-faceted view of content

Applied taxonomy metadata facilitates a multi-faceted view of content

Page 16: Taxonomies and Metadata for Content Management

Facets at work on FirstGov site

OrganizationOrganization

Content TypeContent Type

FrequencyFrequency

AudienceAudience

http://www.firstgov.gov

Page 17: Taxonomies and Metadata for Content Management

http://www.tesco.com/winestore

Guided Navigation2-3 clicks to productNo dead ends

Powered by

Page 20: Taxonomies and Metadata for Content Management

Seven practical rules for taxonomies

1. Incremental, extensible process that identifies and enables owners, and engages stakeholders.

2. Quick implementation that provides measurable results as quickly as possible.

3. Not monolithic—has separately maintainable facets.

4. Re-uses existing IP as much as possible.

5. A means to an end, and not the end in itself.

6. Not perfect, but it does the job it is supposed to do—such as improving search and navigation.

7. Improved over time, and maintained.

Page 21: Taxonomies and Metadata for Content Management

What is the general purpose of the content you are managing?

What types of content are you handling?

Who is the audience for this content?

What are the core organizational objectives that the content is related to?

Page 22: Taxonomies and Metadata for Content Management

• Creating a taxonomy is only part of the job• How will it be put to use?• In a new application, or by

modifying an existing application?

• What’s the effort around that?

• Additional Issues• Tagging – Who will add the

metadata and how?

Link to Bios from Personal Names

Link to info on Countries

Link to company data (quotes, news, ...)

from Company names

Alerts on People, Companies, and

Topics

Browse by Topic

Page 23: Taxonomies and Metadata for Content Management

1 Identify Objectives

Conduct interviews

2 Inventory Content

ID sources, spider assets & extract

metadata

Define fields & purpose

3 Specify Metadata

4 Model ContentDefine content chunks & XML

DTDs

5 Specify Vocabularies

Compile controlled vocabularies

6 Specify Procedures

Develop workflow, rules & procedures

7 Train StaffDevelop

materials & train staff

Page 24: Taxonomies and Metadata for Content Management

Task 1 – Identify objectives

What do you do? What kinds of digital assets are being produced? For what audiences?

What is the business process for submitting, selecting, editing, maintaining digital assets?

How many digital assets are there? How fast is this growing?

Are there particular industry or other standards that are important?

What types of assets are hard to search for (that should be easier to find)?

What tools would be helpful in locating assets? Acronyms? Abbreviations? Nick names? Glossary? Thesaurus? Taxonomy?

Who else should we be talking to?

Page 25: Taxonomies and Metadata for Content Management

Task 2 – Inventory content

Path/URL

1. Identify target asset file path/URL.

Spider-generated

2. Automatically generate inventory metadata by

crawling file stores.

Audit process

3. Audit assets using inventory.

New facets

4. Enhance metadata with new facets.

Page 26: Taxonomies and Metadata for Content Management

Task 3 – Specify metadata Element

Data Type Length

Req. / Repeat Source Purpose

Identifier String 48 chars 1System supplied Basic accountability

Author String Variable * LDAP validated Credits

Title String Variable ? User Text search, results display

Embargo Date Date Fixed ? System Obey rights

Description String Variable ? User Text search, results display

Asset Type List Fixed 1Asset Types vocabulary

Browse or group search results

Subject

Audience List Fixed *Audience vocabulary

Custom interface for group of users

Location List Fixed * ISO 3166 Filter or rank search results

Organization List Fixed *Organization vocabulary

Key index to retrieve & aggregate assets

...

Legend: ? – 1 or more * - 0 or more

Page 27: Taxonomies and Metadata for Content Management

Task 4 – Model content

Factor asset types from inventory into canonical types.

Select examples from inventory (possibly with spider).

Identify useful chunks for each asset type.

Factor chunks into element superset.

Identify relationships between chunks.

Iterate until agree on asset types, elements, and relationships.

Footer area

Header area

Main content area

Left navigation area

Page 28: Taxonomies and Metadata for Content Management

Task 5 – Specify vocabularies

Develop broad taxonomy outline (1-3 levels deep)

Review, revise, and approve taxonomy outline with stakeholders and subject matter experts.

Fill in taxonomy outline

Tag random samples from content inventory

Review, revise, and approve draft taxonomy with stakeholders and subject matter experts.

Page 29: Taxonomies and Metadata for Content Management

Task 6 – Specify procedures

Develop taxonomy style rules, ensure that the taxonomy follows them.

Develop tagging rules and procedures, along with software to assist in the task.

Specify taxonomy maintenance process and the update procedures to follow.

Page 30: Taxonomies and Metadata for Content Management

Task 6 – Governance & Maintenance

Recommendations by Editor1 Small taxonomy changes (labels, synonyms)2 Large taxonomy changes (retagging, application changes)3 New ‘best bets’ content

Committee considerations1 Business Goals2 Change in user experience3 Retagging cost

The taxonomy must be changed over time.

Suggestions for changes can come from users, through query log analysis, and staff, from feedback form.

Governance structure needed to make sure changes are justified.

End User

Steering Committee

Firewall

Taxonomy

ContentApplicationLogic

TaggingLogic

ApplicationUI

TaggingUI

Tagging Staff

Taxonomy Editor

Staff notes

‘missing’ concepts

Query log analysis

Page 31: Taxonomies and Metadata for Content Management

Task 6 – Steering Committee Roles

Business Lead

Keeps committee on track with larger business objectives

Balances cost/benefit issues to decide appropriate levels of effort

Specialists help in estimating costs

Obtains needed resources if those in committee can’t accomplish a particular task

Technical Specialist

Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.

Helps obtain data from various systems

Content Specialist

Committee’s liaison to content creators

Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc.

Taxonomy Specialist

Suggests potential taxonomy changes based on analysis of query logs, indexer feedback

Makes edits to taxonomy, installs into system with aid of IT specialist

Content Owner

Reality check on process change suggestions

Page 32: Taxonomies and Metadata for Content Management

Task 7 – Train staff

Staff will require training onThe UI they use to tag the content

The rules to follow when deciding what codes to apply

The end-effect of the codes they apply

The structure of the taxonomy

Tagging examples come from the content inventory

Hardcopies of the taxonomy, and yellow highlighters, are helpful during training

Indexing rulesRule Description

Specificity rule

Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.

Repeatable rule

All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.

Appropriateness rule

Not all attributes apply to all assets. Only supply values for attributes that make sense.

Usability rule

Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.

Indexing UI

Page 33: Taxonomies and Metadata for Content Management

What about Automatic Categorization?

• Automatic vs. Manual Categorization is a cost/benefit tradeoff– Semi-automated recommended over pure

manual in production situations.– Automatic performance not bad, but not equal to

trained manual tagging.• Software is not sane, so errors look crazy.

– Large backlogs of content can’t justify investment of high-quality manual tagging

• Old articles rarely accessed.• Recommend automated bulk tagging with

error reporting and correction process.

Page 34: Taxonomies and Metadata for Content Management

What about automatically-created taxonomies?

Typically a single hierarchy with no overall plan

Results hard for people to navigate

What about automatic categorization?

Accuracy close to human levels, but errors are very different

Cost/benefit tradeoff

Semi-automation is best practice

Page 35: Taxonomies and Metadata for Content Management

Enterprise taxonomy maintenance workflow

Analyst Editor

Problem?

Copywriter

Problem?

Yes

Yes No

No

Suggest new name/category

Review new name

Taxon-omy

Taxonomy Tool

Copy edit new name

Add to enterprise Taxonomy

Sys Admin

Page 36: Taxonomies and Metadata for Content Management

Categorize with a purpose

What is the problem you are trying to solve?

Improve search

Browse for content on an enterprise-wide portal

Enable users to syndicate content

Otherwise provide the basis for content re-use

How will you control the cost of creating and maintaining the metadata) needed to solve these problems?

CMS with a metadata tagging products

Semi-automated classification

Taxonomy editing tools

Guided navigation tools

Page 37: Taxonomies and Metadata for Content Management

How do you sell it?

Don’t sell the taxonomy, sell the vision of what you want to be able to do

Clearly understanding what the problem is and what the opportunities are

Costs and benefits

Design the taxonomy in relation to the value at hand

Page 38: Taxonomies and Metadata for Content Management

Internet Resources

Page 39: Taxonomies and Metadata for Content Management

U.S. Government Resources

Page 40: Taxonomies and Metadata for Content Management

http://www.nasa.gov/home/index.html

Page 41: Taxonomies and Metadata for Content Management

http://pub-lib.jpl.nasa.gov/pub-lib/dscgi/ds.py/View/Collection-10

Page 42: Taxonomies and Metadata for Content Management

http://www.loc.gov/flicc/wg/taxonomy.html

Page 43: Taxonomies and Metadata for Content Management

http://www.loc.gov/lexico/servlet/lexico/

Page 44: Taxonomies and Metadata for Content Management

http://www.archives.gov/federal_register/code_of_federal_regulations/thesaurus.html

Page 45: Taxonomies and Metadata for Content Management

http://feapmo.gov/

Page 46: Taxonomies and Metadata for Content Management

http://www.km.gov/

Page 47: Taxonomies and Metadata for Content Management

Other Resources

Page 48: Taxonomies and Metadata for Content Management

http://www.educause.edu/asp/taxonomy/show_taxonomy_links.asp?TREE=1&EXPAND=1

Page 49: Taxonomies and Metadata for Content Management

http://databases.unesco.org/thesaurus/

Page 50: Taxonomies and Metadata for Content Management

http://www.naa.gov.au/recordkeeping/control/functions_thesaur/contents.html

Page 51: Taxonomies and Metadata for Content Management

http://www.taxonomystrategies.com/html/bibliography.htm

Page 52: Taxonomies and Metadata for Content Management

Summary

Why taxonomies?Why metadata?

Page 53: Taxonomies and Metadata for Content Management

Shiyali Ramamrita Ranganathan

Page 54: Taxonomies and Metadata for Content Management

Ranganathan’s Five Laws of Library Science

1. Books are for use (They don't belong on the shelf)

2. Books are for all; every reader his book (Every reader is unique)

3. Every book its reader (Every book is unique)4. Save the time of the reader (Make libraries

easy to use)5. A library is a growing organism (Libraries are

constantly changing to meet changing patron needs)

Page 55: Taxonomies and Metadata for Content Management

Thank you

Michael HuffInformation Resource Officer

U.S. Department of [email protected]