faqs about taxonomies & metadata - taxonomy...

64
Strategies LLC Taxonomy May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved. FAQs About Taxonomies & Metadata Joseph A. Busch & Ron Daniel, Jr.

Upload: duongnguyet

Post on 16-Feb-2018

221 views

Category:

Documents


1 download

TRANSCRIPT

Strategies LLCTaxonomy

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

FAQs About Taxonomies & Metadata

Joseph A. Busch & Ron Daniel, Jr.

2Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

3Taxonomy Strategies LLC The business of organized information

Who is Joseph Busch?

Over 25 years in the business of organized information Founder, Taxonomy Strategies Director, Solutions Architecture, Interwoven VP, Infoware, Metacode Technologies Program Manager, Getty Foundation Manager, Pricewaterhouse

Metadata and taxonomies community leadership President, American Society for Information Science & Technology Director, Dublin Core Metadata Initiative Adviser, National Research Council Computer Science and

Telecommunications Board Reviewer, National Science Foundation Division of Information and

Intelligent Systems Founder, Networked Knowledge Organization Systems/Services

4Taxonomy Strategies LLC The business of organized information

Who is Ron Daniel, Jr.?

Over 15 years in the business of metadata & automatic classification Principal, Taxonomy Strategies Standards Architect, Interwoven Senior Information Scientist, Metacode Technologies Technical Staff Member, Los Alamos National Laboratory

Metadata and taxonomies community leadership Chair, PRISM (Publishers Requirements for Industry Standard Metadata)

working group Acting Chair, XML Linking working group Member, RDF working groups Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2

reports.

5Taxonomy Strategies LLC The business of organized information

Who has Taxonomy Strategies worked with?Government

Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (

www.firstgov.gov) Head Start Infocomm Development Authority of

Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (

www.usda.gov)

International orgs & Non-profits CEN IDEAlliance IMF OCLC

Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhousecoopers Siderean Software Sprint Time Inc.

Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailers Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant

6Taxonomy Strategies LLC The business of organized information

What we do

Organize Stuff

7Taxonomy Strategies LLC The business of organized information

Who are you? What do you want out of today?

Government / NGO / SME / Global 2000?

IT / Library & IM / Public Affairs / Product Management / Engineering / HR & Finance / Other?

Webmaster / Technical / Researcher / Editorial / Supervisory / Executive?

Competing session – Search & Content Management: Putting the Puzzle Pieces Together What brought you HERE instead of THERE?

8Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

9Taxonomy Strategies LLC The business of organized information

What is metadata? Different definitions

Library & Information Science Author/Title/Subject Controlled Vocabularies for

Subject Codes (e.g. Dewey) Authority Files for Author

Names

Database Tables/Columns/

Datatypes/Relationships References for some values

10Taxonomy Strategies LLC The business of organized information

What is metadata? Another view of Dublin Core

Asset metadata – Who, Where & When:

Title, Creator, Publisher, Contributor, Date, Type,

Format, Identifier, Source, Language

Subject metadata –What & Why:

Subject, Description, Coverage

Relational metadata – Links between and to:

Relation

Use metadata – How can it be used:

Rights & Permissions

Functionality

Diff

icul

t to

Gen

erat

e

Better resource description = Better navigation &

discovery

11Taxonomy Strategies LLC The business of organized information

Are there extensions to the Dublin Core?

Elements1. Identifier2. Title3. Creator4. Contributor5. Publisher6. Subject7. Description8. Coverage9. Format10. Type11. Date12. Relation13. Source14. Rights15. Language

AbstractAccess rightsAlternativeAudienceAvailableBibliographic citationConforms toCreatedDate acceptedDate copyrightedDate submittedEducation levelExtentHas formatHas partHas versionIs format ofIs part of

Is referenced byIs replaced byIs required byIssuedIs version ofLicenseMediatorMediumModifiedProvenanceReferencesReplacesRequiresRights holderSpatialTable of contentsTemporalValid

RefinementsBoxDCMITypeDDCIMTISO3166ISO639-2LCCLCSHMESHPeriodPointRFC1766RFC3066TGNUDCURIW3CTDF

EncodingsCollectionDatasetEventImageInteractive ResourceMoving ImagePhysical ObjectServiceSoftwareSoundStill ImageText

Types

12Taxonomy Strategies LLC The business of organized information

ElementData Type Length Source Purpose

Asset Metadata

Unique ID Integer Fixed System supplied Basic accountability

Recipe Title String Variable Licensed Content Text search & results display

Recipe summary String Variable Licensed Content Content

Main Ingredients List VariableMain Ingredients vocabulary

Key index to retrieve & aggregate recipes, & generate shopping list

Subject MetadataMeal Types List Variable Meal Types vocab

Browse or group recipes & filter search results

Cuisines List Variable Cuisines

Courses List Variable Courses vocab

Cooking Method Flag Fixed Cooking vocab

Link MetadataRecipe Image Pointer Variable Product Group Merchandize products

Use Metadata

Rating String Variable Licensed Content Filter, rank, & evaluate recipes

Release Date Date Fixed Product Group Publish & feature new recipes

What is metadata: A scheme for recipes

13Taxonomy Strategies LLC The business of organized information

Biological taxonomy place an organism in one and only one place.

What is a taxonomy? Systematics view

Kingdom Phylum Class Order Family Genus Species

AnimaliaChordata

MammaliaCarnivora

CanidaeCanis

C. familiari

Linnaeus …

Pets

Dogs

Farm Animals

Mammals

But most of the time things belong to more than one category.

Pragmatic

14Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

15Taxonomy Strategies LLC The business of organized information

Are there other organizational schemes?

Type RemarksSynonym Ring

Connects a series of terms together Treats them as equivalent for search purposes

Authority File Used to control variant names with a preferred term Typically used for names of countries, individuals,

organizations

Classification Scheme

An arrangement of knowledge Does not follow taxonomy rules Usually enumerated; ie, LC or Dewey

Thesaurus Expresses semantic relationships of: Hierarchy (broader & narrower terms) Equivalence (synonyms) Associative (related terms)

Ontology Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules

16Taxonomy Strategies LLC The business of organized information

Another point of view ….

Source: Amy Warner. Metadata and Taxonomies for a More Flexible Information Architecture (http://www.lexonomy.com/presentations/metadataAndTaxonomies.ppt)

Simple Complex

SynonymRings

AuthorityFiles

ThesauriClassificationSchemes

Equivalence Hierarchical Associative

(Vocabularies)

(Relationships)

Taxonomies

Ontologies

17Taxonomy Strategies LLC The business of organized information

Jurisdiction

Industry Impact

BRM Impact

Form TypeAgency AudienceKeyword Topic

Taxonomic metadata – e-Forms example

0001 Legislative

1000 Judicial1100

Executive Office of Pres

0003 Exec Depts1200 Agriculture1300 Commerce9700 Defense9100 Education8900 Energy7500 HHS7000 DHS8600 HUD1400 Interior1500 Justice1600 Labor1900 State6900 Transport2000 Treasury3600 Veterans

Ind AgenciesIntl Orgs

ApplicationApprovalClaimInformation

requestInformation

submission

InstructionsLegal filingPaymentProcuremen

tRenewalReservationService

requestTestOther inputOther

transaction

Agriculture & food

CommerceCommunica-

tionsEducationEnergyEnv proForeign relsGovtHealth &

safetyHousing &

comm devLaborLawNamed grpsNational defNat resourcesRecreationSci & techSocial pgmsTransport

AllGeneral

CitizenBusinessGovtEmployeeNative American

Non-resident

TouristSpecial

group

00 Generic11

Agriculture21 Mining22 Utilities23

Construct31-33

Manuf42

Wholesale44-45

Retail48-49 Trans51 Info52 Finance54

Profession55 Mgmt56 Support61

Education62 Health

Care71 Arts72

Hospitality81 Other

Services92 Public

Admin

FederalState +Local +Other +

Citizen SrvcsSocial SrvsDefenseDisastersEcon DevEducationEnergyEnv MgmtLaw EnfJudicial

CorrectionalHealthSecurityIncome Sec

IntelligenceIntl AffairsNat ResourTransportWorkforceScience

DeliverySupport Manageme

nt

Taxonomies

Metadata Elements

18Taxonomy Strategies LLC The business of organized information

Why use faceted taxonomies?

4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,00010,000 nodes (104) Easier to maintain Can be easier to

navigate

19Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?9:10 What are taxonomies & metadata?9:30 What kinds of taxonomies are there, and what do I need?9:40 How do I get a good taxonomy?

Can I get a taxonomy off-the-shelf or create one with software? How do you know it is good? How do you build or modify to make it good?

10:05 How do I associate the taxonomy with content?10:30 Break10:45 What do taxonomies and metadata have to do with

search?11:15 How can I sell my management on a taxonomy project?11:45 Any more questions?12:00 Adjourn

20Taxonomy Strategies LLC The business of organized information

How do I get a good Taxonomy? – Seven practical rules1) Incremental, extensible process that identifies and enables

users, and engages stakeholders.

2) Quick implementation that provides measurable results as quickly as possible.

3) Not monolithic—has separately maintainable facets.

4) Re-uses existing IP as much as possible.

5) A means to an end, and not the end in itself .

6) Not perfect, but it does the job it is supposed to do—such as improving search and navigation.

7) Improved over time, and maintained.

21Taxonomy Strategies LLC The business of organized information

Can I get a taxonomy off the shelf?

Sure: www.taxonomywarehouse.com There are usually license fees, but they will be less than

the effort to develop an equivalent taxonomy. The voice of experience says these will usually not be

what you want.

We recommend: Adopt a faceted approach. Reuse existing (esp. internal) vocabularies for as many

of the facets as reasonable. Plan on doing full-custom “Content Type” and “Subject”

taxonomies.

22Taxonomy Strategies LLC The business of organized information

Sources for 8 common taxonomies

Taxonomy Definition Potential SourcesOrganization Organizational structure. FIPS 95-2, U.S. Government Manual, Your

organizational structure, etc.

Content Type Structured list of the various types of content being managed or used.

DC Types, AGLS Document Type, AAT Information Forms , Your records management policy, etc.

Industry Broad market categories such as lines of business, life events, or industry codes.

FIPS 66, SIC, NAICS, Your market segments, etc.

Location Place of operations or constituencies.

FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics Div, US Postal Service, Your sales regions, etc.

Function Functions and processes performed to accomplish mission and goals.

FEA Business Reference Model, Enterprise Ontology, AAT Functions, Your business functions, etc.

Topic Business topics relevant to your mission & goals.

Federal Register Thesaurus, NAL Agricultural Thesaurus, LCSH, Your research areas, etc.

Audience Subset of constituents to whom a piece of content is directed or intended to be used.

GEM, ERIC Thesaurus, IEEE LOM, Your psycho-graphics or personas, etc.

Products & Services

Names of products/programs & services.

ERP system, Your products and services, etc.

23Taxonomy Strategies LLC The business of organized information

What about automatically created taxonomies?

Documents can be ‘clustered’ based on similarities and differences.

Problems: Typically only a single

hierarchy No overall plan Results hard for people to

navigate

What does “North” mean on this map?

24Taxonomy Strategies LLC The business of organized information

What should I expect from automatic taxonomy construction software? Software can scan large quantities of

content and extract statistically significant words and phrases.

Example: Archive of 10 publications was analyzed for topics significant to ‘copyright’.

Software does a poor job of de-duplication turning those significant words and phrases

into a larger structure discriminating between gold and garbage

Software is good for getting an understanding of the key phrases

in a large amount of content providing test cases for evaluating a

taxonomy Source: Sample data courtesy of Randy Marcinko and nStein.

25Taxonomy Strategies LLC The business of organized information

How can I test a Taxonomy? – Qualitative methods

Method Process ValidationWalk-throughs Show and explain Approach

Consistency to rules Appropriateness to task

Usability Testing Contextual analysis (card sorting, scenario testing, etc.)

Tasks are completed successfully

Time to complete task is reduced

User Satisfaction Survey Reaction to new interface Reaction to search results

Tagging samples Tag sample content with taxonomy

Content ‘fit’ Fills out content inventory Training materials for people &

algorithms Basis for quantitative methods

26Taxonomy Strategies LLC The business of organized information

Quantitative Method – How evenly does it divide the content? Background: Documents do not distribute uniformly

across categories Zipf (1/x) distribution is expected

behavior 80/20 rule in action (actually 70/20 rule)

Methodology: Part of alpha test of ‘content type’ for

corporate intranet 115 URLs selected at random from

search index were manually categorized. Inaccessible files and ‘junk’ were removed

Results: Results were slightly more uniform than

the Zipf distribution, which is better than expected

Measured and Expected Distribution of Content Types in an Intranet

0

5

10

15

20

25

Peo

ple,

Gro

ups

& P

lace

s

New

s &

Eve

nts

Man

uals

&Le

arni

ngM

ater

ials

Ope

ratio

ns &

Inte

rnal

Com

mun

icat

ions

Mar

ketin

g &

Sal

es

Reg

ulat

ions

,P

olic

ies,

Pro

cedu

res

&P

aper

s &

Pre

sent

atio

ns

Oth

er &

Unc

lass

ified

Pro

gram

s,P

ropo

sals

, Pla

ns&

Sch

edul

es

Content Type

# Do

cum

ents

Measured

Expected

Measured and Expected Distribution of Top 10 Content Types in Library of Congress Database

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

Congre

sses

Biogra

phy

Period

icals

Maps

Fiction

Exhibitio

ns

Juve

nile l

itera

ture

Bibliog

raph

y

Statistic

s

Top 10 Content Types

Num

ber o

f Rec

ords

Series2

Series1

27Taxonomy Strategies LLC The business of organized information

Quantitative Method – How intuitive (repeatable) are the categorizations? Methodology: Closed Card

Sort For alpha test of a grocery site 15 Testers put each of 100 best-

selling products into one of 10 pre-defined categories

Categories where fewer than 14 of 15 testers put product into same category were flagged

Results:% of

TestersCumulative %

of Products15/15 54%14/15 70%13/15 77%12/15 83%11/15 85%

<11/15 100%

In the trade, “Corn Tortillas” are a Dairy item!

“Cocoa Drinks – Powder” is best categorized in both

“Beverages” and “Grocery”.

28Taxonomy Strategies LLC The business of organized information

Quantitative Method – How does taxonomy “shape” match that of content?

Term Group % Terms

% Docs

Administrators 7.8 15.8Community Groups 2.8 1.8Counselors 3.4 1.4Federal Funds Recipients and Applicants

9.5 34.4

Librarians 2.8 1.1News Media 0.6 3.1Other 7.3 2.0Parents and Families 2.8 6.0Policymakers 4.5 11.5Researchers 2.2 3.6School Support Staff 2.2 0.2Student Financial Aid Providers

1.7 0.7

Students 27.4 7.0Teachers 25.1 11.4

Source: Courtesy Keith Stubbs, US. Dept. of Ed.

Background: Hierarchical taxonomies allow

comparison of “fit” between content and taxonomy areas

Methodology: 25,380 resources tagged with

taxonomy of 179 terms. (Avg. of 2 terms per resource)

Counts of terms and documents summed within taxonomy hierarchy

Results: Roughly Zipf distributed (top 20

terms: 79%; top 30 terms: 87%) Mismatches between term%

and document% flagged

29Taxonomy Strategies LLC The business of organized information

How do large corporations typically extend the Dublin Core?

100%86%

57%

0%

20%

40%

60%

80%

100%

120%

Doc Types Products & Services Roles

Base: 20 corporate information managers

Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of Dublin Core metadata in Corporate Environments (

http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)

30Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?9:10 What are taxonomies & metadata?9:30 What kinds of taxonomies are there, and what do I need?9:40 How do I get a good taxonomy?10:05 How do I associate the taxonomy with content?

How are we going to populate metadata elements with complete and consistent values?

What can we expect to get from automatic classifiers? What kinds of tools do people use? How do different automatic classification tools compare? What else should I keep in mind?

10:30 Break10:45 What do taxonomies and metadata have to do with search?11:15 How can I sell my management on a taxonomy project?11:45 Any more questions?12:00 Adjourn

31Taxonomy Strategies LLC The business of organized information

General remarks on tagging

Province of authors (SMEs) or editors?

Taxonomy often highly granular to meet task and re-use needs.

Vocabulary dependent on originating department.

The more tags there are (and the more values for each tag), the more hooks to the content.

If there are too many, authors will resist and use “general” tags (if available)

Automatic classification tools exist, and are valuable, but results are not as good as humans can do. “Semi-automated” is best. Degree of human involvement is a cost/benefit tradeoff.

32Taxonomy Strategies LLC The business of organized information

What methods do large companies use to create & maintain metadata?

71%

57%

43% 43%

0%

10%20%

30%

40%

50%60%

70%

80%

Forms DistributedProduction

Centralizedproduction

Not Automated

Base: 20 corporate information managers

Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of Dublin Core metadata in Corporate Environments (

http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)

33Taxonomy Strategies LLC The business of organized information

How do tools compare? Analyst viewpoint

Accuracy Levelhighlow

Con

tent

Vol

umes

low

high

34Taxonomy Strategies LLC The business of organized information

What accuracy should we expect from an automatic classifier? Classification Performance is

measured by “Inter-cataloger agreement”Trained librarians agree less than 80%

of the timeErrors are subtle differences in

judgment, or big goofs

Automatic classification struggles to match human performanceException: Entity recognition can

exceed human performance

Classifier performance limited by algorithms available, which is limited by development effort

Very wide variance in one vendor’s performance depending on who does the implementation, and how much time they have to do it

1) 80/20 tradeoff where 20% of effort gives 80% of performance.

2) Smart implementation of inexpensive tools will outperform naive implementations of world-class tools.

Accuracy

Development Effort/ Licensing

Expense

Regexps

Trained Librarians

potential performance

gain

35Taxonomy Strategies LLC The business of organized information

How do tools compare? Pragmatic viewpoint

Accuracy Levelhighlow

Con

tent

Vol

umes

low

high

36Taxonomy Strategies LLC The business of organized information

What kind of metadata creation and maintenance process is needed? Even ‘purely’ automatic

meta-tagging systems need a manual error correction procedure.Should add a QA sampling mechanism

Tagging models:Author-generatedCentral librariansHybrid – central auto-tagging service, distributed manual review and correction

Compose in Template

Submit to CMS

Analyst Editor

Review content

Problem?

Copywriter

Copy Edit content

Problem?Hard Cop

y

Web site

Y

Y N

N

Approve/Edit metadata

Automatically fill-in metadata

Tagging Tool Sys Admin

Sample of ‘author-generated’ metadata workflow.

37Taxonomy Strategies LLC The business of organized information

Tagging tool example: Interwoven MetaTagger

Manual form fill-in w/ check boxes, pull-down lists, etc. Auto keyword &

summarization

38Taxonomy Strategies LLC The business of organized information

Tagging tool example: Interwoven MetaTagger

Auto-categorization

Parse & lookup (recognize names)

Rules & pattern matching

39Taxonomy Strategies LLC The business of organized information

Where do I put the metadata?

Where can I store metadata? In the content – HTML Headers, File properties, etc. In a centralized repository – Search index, Metadata database, etc.

Where should I store metadata? It depends. If you are moving files through a process, putting it in the file keeps

it from getting dropped at system borders. If you are doing search across multiple documents, it has to be at

least copied out of the files. If you make copies of files and modify them, consistent in-file

metadata will be impossible.

Real question is not where to STORE the metadata, it is how to MAINTAIN the metadata. Web CMS as an example

40Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

41Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?9:10 What are taxonomies & metadata?9:30 What kinds of taxonomies are there, and what do I need?9:40 How do I get a good taxonomy?10:05 How do I associate the taxonomy with content?10:30 Break

10:45 What do taxonomies and metadata have to do with search? Does adding a taxonomy mean replacing my search engine? How are they used behind the scenes in a search implementation How are they used in the Search UI to aid searching? How can we make our current search engine better?

11:15 How can I sell my management on a taxonomy project?11:45 Any more questions?12:00 Adjourn

42Taxonomy Strategies LLC The business of organized information

How to fix search? … Add metadata to search on! “Adding metadata to unstructured content allows it to be managed

like structured content. Applications that use structured content work better.”

“Enriching content with structured metadata is critical for supporting search and personalized content delivery.”

“Content that has been adequately tagged with metadata can be leveraged in usage tracking, personalization and improved searching.”

“Better structure equals better access: Taxonomy serves as a framework for organizing the ever-growing and changing information within a company. The many dimensions of taxonomy can greatly facilitate Web site design, content management, and search engineering. If well done, taxonomy will allow for structured Web content, leading to improved information access.”

43Taxonomy Strategies LLC The business of organized information

How does Google do so well without metadata?

They don’t, they just use particular types of metadata: Number of incoming links PageRank for each incoming link Text of incoming links

44Taxonomy Strategies LLC The business of organized information

Dublin Core framework for corporate use

Not just 15 elements A framework to enable cross-resource exploration and

use

Dublin Core is framework for “integration metadata” at BellSouth

Source: Courtesy of Todd Stephens, BellSouth

45Taxonomy Strategies LLC The business of organized information

ElementData Type Length

Req. / Repeat Source PurposeAsset Metadata

Unique ID Integer Fixed 1 System supplied Basic accountability

Recipe Title String Variable 1 Licensed Content Text search & results display

Recipe summary String Variable 1 Licensed Content Content

Main Ingredients List Variable ?Main Ingredients vocabulary

Key index to retrieve & aggregate recipes, & generate shopping list

Subject MetadataMeal Types List Variable * Meal Types vocab

Browse or group recipes & filter search results

Cuisines List Variable * Cuisines

Courses List Variable * Courses vocab

Cooking Method Flag Fixed * Cooking vocab

Link MetadataRecipe Image Pointer Variable ? Product Group Merchandize products

Use MetadataRating String Variable 1 Licensed Content Filter, rank, & evaluate recipes

Release Date Date Fixed 1 Product Group Publish & feature new recipes

Legend: ? – 1 or more * - 0 or more

What about Search? Integration Metadata

dc:identifierdc:titledc:description

X

XXXX

dcterms:hasPart

dc:date

dc:type=“recipe”, dc:format=“text/html”, dc:language=“en”

46Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:10 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:30 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

47Taxonomy Strategies LLC The business of organized information

How do I sell Management on a Taxonomy Project?

Don’t sell “metadata” or “taxonomy”, sell the vision of what you want to be able to do.

Clearly understand what the problem is and what the opportunities are.

Do the calculus (costs and benefits)

Design the taxonomy (in terms of LOE) in relation to the value at hand.

48Taxonomy Strategies LLC The business of organized information

Fundamentals of metadata ROI

Tagging content using metadata and a taxonomy are costs, not benefits.

There is no benefit without exposing the tagged content to users in some way that cuts costs or improves revenues.

Putting metadata and a taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.

You need to determine those changes, and their costs, as part of the ROI.

49Taxonomy Strategies LLC The business of organized information

What are the typical metadata ROI scenarios?

Catalog site Increased sales. Increased productivity.

Customer support Cutting costs. Increased sales.

Compliance Avoiding penalties.

Knowledge worker productivity Less time searching, more time working.

50Taxonomy Strategies LLC The business of organized information

Guided Navigation 2-3 clicks to product

No dead ends

http://www.tesco.com/winestore

Metadata ROI: Catalog site

51Taxonomy Strategies LLC The business of organized information

Metadata ROI: Catalog site

Increased sales Product findability. Product cross-sells and up-

sells. Customer loyalty.

1-5% increase in sales $57.6B sales (’04) $2.1B net income (’04)

Enterprise portal cost $6M

$600M to $2B/year $21M to $105M/year

1-5% increase in productivity$50K average cost per employee310,400 employees (’04)

$155M to $776M/year

Source: Proforma based on Hoover’s data.

52Taxonomy Strategies LLC The business of organized information

Metadata ROI: Customer support model

Policy categories for browsing

Type and go to search for specific policies

Good search results for policy topics, e.g., “pets”

Refine search offered with results

Help on search page, not a click away.

53Taxonomy Strategies LLC The business of organized information

Metadata ROI: Customer support model

Self service Fewer customer calls. Faster, more accurate CSR

responses through better information access.

25-50% service efficiency increase 300K customer service calls

per month $6 cost per call

Manual processing 100,000 documents 2 pages per document $4 per page $800K

$5.4M to $10.8M/yr

$186M to $930M/year ($575M) to $169M/year

1-5% increased sales $18.6B sales (’04) ($761M) net income (’04)

Source: Proforma based on Hoover’s data.

54Taxonomy Strategies LLC The business of organized information

Metadata ROI: Compliance

Avoiding penalties for breaching regulations SOX: up to 5 years in jail SOX: up to $5M

Following required procedures

Loss of company $100B revenue (’00)

Loss of partner companies Arthur Andersen

$100B

Source: Proforma based on Hoover’s data.

55Taxonomy Strategies LLC The business of organized information

Searching

Creating

Commun-icating

Knowledge workers spend up to 2.5 hours each day looking for information …

… But find what they are looking for only 40% of the time.

— Kit Sims Taylor

56Taxonomy Strategies LLC The business of organized information

High cost of not finding information

“The amount of time wasted in futile searching for vital information is enormous, leading to staggering costs …”

— Sue Feldman, bnb nbnbn

High cost of poor classification

Poor classification costs a 10,000 user organization $10M each year—about $1,000 per employee.

— Jakob Nielsen, useit.com

But “better search” itself is a weak ROI

57Taxonomy Strategies LLC The business of organized information

Creating new

contentRecreating

existing content

SearchingCommun-icating

26%9%

Knowledge workers spend more time re-creating existing content than creating new content

— Kit Sims Taylor

58Taxonomy Strategies LLC The business of organized information

Metadata ROI: Productivity

Decreased cost to market Decreased development cost Increased R&D productivity Reduced time for sales &

marketing 1-5% decrease in drug

development cost $800M/drug

5-10% increase in R&D productivity 13% of revenue $39B in sales (’04)

10-20% decrease in time for sales & marketing 13% of revenue

Enterprise document management system cost $10M

$8M to $16M/drug

$254M to $507M/year

$254M to $507M/year

Source: Proforma based on Hoover’s data.

59Taxonomy Strategies LLC The business of organized information

Metadata ROI: Executive Mandate

There is no ROI out of the box Just someone with a vision

…and the budget to make it happen.

What’s really needed? Demos and proofs of value. So that a stronger cost benefit argument can be made for

continuing the work

60Taxonomy Strategies LLC The business of organized information

Productivity, loyalty, and revenue have provided the ROI

61Taxonomy Strategies LLC The business of organized information

Intranet has provided the best ROI

Intranet

Web/online customer sales

Web dev infrastructure

Middleware to link Web to ERP

e-billing/payment systems

Web/online business sales

Wireless Web access

Extranet/supply chain

e-marketplace/ portal

None

62Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

?

63Taxonomy Strategies LLC The business of organized information

Agenda

9:00 Who are we?

9:10 What are taxonomies & metadata?

9:30 What kinds of taxonomies are there, and what do I need?

9:40 How do I get a good taxonomy?

10:05 How do I associate the taxonomy with content?

10:30 Break

10:45 What do taxonomies and metadata have to do with search?

11:15 How can I sell my management on a taxonomy project?

11:45 Any more questions?

12:00 Adjourn

Strategies LLCTaxonomy

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Contact Info

Ron Daniel925-368-8371

[email protected]

Joseph Busch415-377-7912

[email protected]