strategies llctaxonomy copyright 2009taxonomy strategies llc. all rights reserved. assorted slides...
TRANSCRIPT
Strategies LLCTaxonomy
Copyright 2009Taxonomy Strategies LLC. All rights reserved.
Assorted Slides on Taxonomy & Metadata Governance
Ron Daniel, Jr.
2Taxonomy Strategies LLC The business of organized information
Creating a Governance Structure for the Ongoing Maintenance of the Taxonomy
Taxonomies must change if they are to remain relevant. But what will it cost to make those changes to the taxonomy and to the data which is categorized by it? Organizations must have appropriate maintenance processes so that the taxonomy changes are based on rational cost/benefit decisions, without becoming mired in endless paperwork. This interactive workshop will highlight the framework for creating taxonomy governance teams and what their specific responsibilities should be. Special attention will be given to defining maintainable taxonomies and metadata for achieving business needs.
3Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
4Taxonomy Strategies LLC The business of organized information
Three Problems
Taxonomy development and maintenance is the LEAST of three problems:
The Taxonomy Problem: How are we going to build and maintain the lists of pre-defined values that can go into some of the metadata elements?
The Tagging Problem: How are we going to populate metadata elements with complete and consistent values? What can we expect to get from automatic classifiers? What kind of error
detection and error correction procedures do we need? What fields do we need?
The ROI (Return On Investment) Problem: How are we going to use content, metadata, and vocabularies in applications to obtain business benefits? More sales? Lower support costs? Greater productivity? Risk avoidance? How much content? How big an operating budget? How to expose to users?
Tolerance for poor data quality?
Business Goals and Cultural Factors are major influences on tagging and taxonomy. These must be acknowledged at the start to avoid rework.
5Taxonomy Strategies LLC The business of organized information
There’s more to maintaining the Taxonomy than just maintaining the Taxonomy
What must change when the Taxonomy changes?
The master copy of the taxonomy.
The data tagged with the taxonomy?
The user interface which uses the taxonomy?
Backend system software which uses the taxonomy?
The training set for automatic classifiers?
The educational material for users, catalogers, programmers, etc.?
The information sent to downstream users of the taxonomy?The versions of the taxonomy distributed to others.The list of changes.
Announcements for stakeholders?
This is a set of items that might be maintained by
taxonomy team and need to be updated.
Few groups will have all of these under maint.
by the taxo team.
6Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
7Taxonomy Strategies LLC The business of organized information
Metadata and Taxonomy
Field Data Type Example
Title String “The Perl Directory”
Creator String The Perl Foundation
Identifier URL http://www.perl.org/
Date DateTime Jan. 12, 2006
Subject List Computers : Programming : Languages : Perl
Metadata
Taxonomy
Big simple hierarchy has lots of nodes and is
a lot of work to maintain.
8Taxonomy Strategies LLC The business of organized information
DMOZ: A worst case example of a unified ‘subject’
Business Biotechnology & Pharmaceuticals
Education & Training
Regional Europe Ireland Business & Economy
Employment Health & Medical
Reference Education Colleges & Universities
North America United States Maryland Columbia Union College
Athletics
Reference Education K-12 Home Schooling Unschooling Chats and Forums
Science Math Academic Departments
South America Colombia
Society People Women Science & Technology
Mathematics
Science Social Sciences Linguistics Translation Associations
Business Small Business Finance Accounting
Business Accounting Firms Directories
Business Employment By Industry
Business Healthcare Employment Regional
Competency (discipline) 11
Geography 9
Audience 9
Topic 7
Organization 5
Doc Type 4
Industry 4
Process 4
DMOZ has over 600k categories
Most are a combination of common facets – Geography, Organization, Person, Document Type, …
(e.g.) Top: Regional: Europe: Spain: Travel and Tourism: Travel Guides
(BTW – DMOZ Governance model is out of whack)
9Taxonomy Strategies LLC The business of organized information
If you want to get technical here, you can explain that lots of big hierarchies are pre-coordinated combinations of items that could come from separate facets. This introduces some arbitrary choices (do we list content type first and location second, or …). It also leads to a lot of repeated substructure which means there have to be edits in many places to make what is in concept a pretty small change.
10Taxonomy Strategies LLC The business of organized information
The power of taxonomy facets
Categorize in multiple, independent, categories.
Allow combinations of categories to narrow the choice of items.
4 independent categories of 10 nodes each have the same discriminatory power as one hierarchy of 10,000 nodes (104) Easier to maintain Can be easier to navigate
Main Ingredients
Cooking Methods
Meal Type Cuisines
• Chocolate• Dairy• Fruits• Grains• Meat &
Seafood• Nuts• Olives• Pasta• Spices &
Seasonings• Vegetables
• Breakfast• Brunch• Lunch• Supper• Dinner• Snack
• African• American• Asian• Caribbean• Continental• Eclectic/
Fusion/ International
• Jewish• Latin American• Mediterranean• Middle Eastern• Vegetarian
• Advanced• Bake• Broil• Fry• Grill• Marinade• Microwave• No Cooking• Poach• Quick• Roast• Sauté• Slow
Cooking• Steam• Stir-fry
42 values to maintain (10+6+11+15)
9900 combinations (10x6x11x15)
11Taxonomy Strategies LLC The business of organized information
How do I get a good Taxonomy? – Seven practical rules
1) Incremental, extensible process that identifies and enables users, and engages stakeholders.
2) Quick implementation that provides measurable results as quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as improving search and navigation.
7) Improved over time, and maintained.
12Taxonomy Strategies LLC The business of organized information
Some vocabulary construction rules
Don’t just have names, also have identifiers This will reduce retagging later when names change When tagging content, use the most specific code. Let software handle the
hierarchy. Bonus: Use URIs for node IDs & publish on the web (See LINKED DATA
in the futures chapter)
Develop scope notes Not just a definition, also say what kind of content the node applies to
Metadata specification must state the vocabulary for a element.
Gather data from multiple sources Talk with users and experts Analyze query logs and content
Choose and arrange terms Test and finalize first version
Shift into maintenance mode
13Taxonomy Strategies LLC The business of organized information
What do I do with all these facets?
Either expose them directly in the user interface (post-coordinating)
or
Combine them in a minimal hierarchy (pre-coordination)
Post-coordination takes software support, which may be fancy or basic.
How many facets?(See elsewhere)
14Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
15Taxonomy Strategies LLC The business of organized information
Maintainable Metadata
Design metadata specification for future changes Lessons from the Dublin Core
Provide metadata tagging and storage that will deal with changes
16Taxonomy Strategies LLC The business of organized information
Dublin Core: A little more complicated over time
Elements1. Identifier2. Title3. Creator4. Contributor5. Publisher6. Subject7. Description8. Coverage9. Format10. Type11. Date12. Relation13. Source14. Rights15. Language
AbstractAccess rightsAlternativeAudienceAvailableBibliographic citationConforms toCreatedDate acceptedDate copyrightedDate submittedEducation levelExtentHas formatHas partHas versionIs format ofIs part of
Is referenced byIs replaced byIs required byIssuedIs version ofLicenseMediatorMediumModifiedProvenanceReferencesReplacesRequiresRights holderSpatialTable of contentsTemporalValid
RefinementsBoxDCMITypeDDCIMTISO3166ISO639-2LCCLCSHMESHPeriodPointRFC1766RFC3066TGNUDCURIW3CTDF
EncodingsCollectionDatasetEventImageInteractive ResourceMoving ImagePhysical ObjectServiceSoftwareSoundStill ImageText
Types
17Taxonomy Strategies LLC The business of organized information
Design Metadata Specification for future changes
Degree of future changes will depend on organization size, sophistication of use, number of repositories and amount of content. Don’t over-engineer
For all organizations: start with the Dublin Core with a few additions and deletions for specific needs
At large/sophisticated organizations: “Refinements” will be unavoidable in the future.
Start with “DatePublished” so that later additions of “DateModified”, DateApproved”, “DateVerified”, etc. fit in easily.
Identify broad “integration metadata” vs. division-specific fields. Coordinate with others to set up a working understanding of a corporate multi-level metadata standard.
18Taxonomy Strategies LLC The business of organized information
Provide metatagging and storage that will deal with changes
Tag with identifiers, not names. This will reduce retagging later when names change Not good if people need to view raw tagging, but usually software
will be involved to show labels.
When tagging content, use the most specific concept. Let software handle the hierarchy.
Metadata is easier to manage if it is stored in a central repository, instead of spread out in the individual files. Exception – when sending files out to other systems (e.g. photo
metadata) Warning – ‘metadata repositories’ are usually a different class of
software than what we are discussing.
19Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
20Taxonomy Strategies LLC The business of organized information
Fundamentals of taxonomy ROI
Tagging content using a taxonomy is a cost, not a benefit.
There is no benefit without exposing the tagged content to users in some way that cuts costs, improves revenues, reduces risk, or achieves some other clear business goal.
Putting taxonomy into operation requires UI changes and/or backend system changes, as well as data changes.
You need to determine those changes, and their costs, as part of the ROI.
21Taxonomy Strategies LLC The business of organized information
Key Factors in ROI
Breadth “How many people will metadata affect?”
Repeatability “How many times a day will they use it?
Cost/Benefit “Is this a costly effort with little or no benefits?”
22Taxonomy Strategies LLC The business of organized information
How to estimate costs — Tagging
Taxonomy Facet Hier?TypicalCV Size
Time/ Value (min)
Avg # values /
Item $ / MinCost/
Element
Audience N 10 0.25 2 $ 0.42 $ 0.21
Content Type N 20 0.25 1 $ 0.42 $ 0.11
Organizational Unit Y 50 0.5 2 $ 0.42 $ 0.42
Products & Services Y 500 1.5 4 $ 0.42 $ 2.52
Geographic Region Y 100 0.5 2 $ 0.42 $ 0.42
Broad Topics Y 400 2 4 $ 0.42 $ 3.36
TOTALS 1080 5 15 $ 7.04
Inspired by: Ray Luoma, BAU Solutions
Consider complexity of facet and ambiguity of content to estimate
time per value.
Estimated cost of tagging one item. This can be reduced with automation, but cannot be
eliminated.
Is this field worth the
cost?
23Taxonomy Strategies LLC The business of organized information
How to estimate costs — Assumptions
ASSUMPTIONS
Enterprise SW License $ 100,000
Maintenance/Support 15%
SW Implementation 200%
Legacy Content Items 100,000
Content Growth Rate 15%
Tagging/Item $ 7.04
Enterprise Taxonomy $ 100,000
Your numbers will vary.
24Taxonomy Strategies LLC The business of organized information
How to estimate costs — Total cost of ownership (TCO)
Description Year 1 Year 2 Year 3 Year 4 Year 5
SW
Licenses $ 100,000
Maintenance $ 15,000 $ 15,000 $ 15,000 $ 15,000
Implementation $ 200,000
App Tech Support $ 30,000 $ 30,000 $ 30,000 $ 30,000
Tagging
Legacy Content $ 704,000
Ongoing $ 105,600 $ 121,440 $ 139,656 $ 160,604
Taxonomy
Creation $ 100,000
Maintenance $ 15,000 $ 15,000 $ 15,000 $ 15,000
TOTAL $ 1,103,500 $ 165,600 $ 181,440 $ 199,656 $ 220,604
25Taxonomy Strategies LLC The business of organized information
Sample ROI Calculations
Description Year 1 Year 2 Year 3 Year 4 Year 5
Costs
Software Licenses/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Implementation/Support $ 200,000 $ 30,000 $ 30,000
$ 30,000
$ 30,000
Taxonomy Creation/ Maintenance $ 100,000 $ 15,000
$ 15,000
$ 15,000
$ 15,000
Legacy/Ongoing Tagging $ 703,500 $ 105,600 $ 121,440
$ 139,656
$ 160,604
Benefits
Productivity increases $ - $ 125,000
$ 1,250,000
$ 1,250,000
$ 1,250,000
Service efficiency gains $ - $ 129,600
$ 1,296,000
$ 1,296,000
$ 1,296,000
Yearly Net Benefits$(1,103,500) $ 89,000
$ 2,364,560
$ 2,346,344
$ 2,325,396
Payback period 1.4 Years until Benefits = CostsInspired by: Todd Stephens, Dublin Core Global Corporate Circle
Ongoing cost of tagging due to 15% content growth.
26Taxonomy Strategies LLC The business of organized information
Where do the benefits come from?Common taxonomy ROI scenarios
Catalog site - ROI based on increased sales through improved: Product findability Product cross-sells and up-sells Customer loyalty
Call center - ROI based on cutting costs through: Fewer customer calls due to improved website self-service Faster, more accurate CSR responses through better information access
Compliance – ROI based on: Avoiding penalties for breaching regulations Following required procedures (e.g. Medical claims)
Knowledge worker productivity - ROI based on cutting costs through: Less time searching for things Less time recreating existing materials, with knock-on benefits of less confusion and
reduced storage and backup costs
Executive mandate No ROI at the start, just someone with a vision and the budget to make it happen
27Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
28Taxonomy Strategies LLC The business of organized information
Generic, yet Important, Advice
It’s not about the tools. It’s not about the taxonomy. It’s about the business goals and the processes people use to meet those goals.
Metrics are grossly underused in metadata and search.
29Taxonomy Strategies LLC The business of organized information
Taxonomy governance overview
Taxonomy governance can be viewed as a standards processClosely linked to organizational metadata standardTaxonomy must evolve, but in predictable way
Take tips from other standards effortsTeam structure, with an appeals process
Taxonomy stewardship is part-time role at most organizationsTeam needs to make decisions based on costs and benefits
Documentation and educational material on Taxonomy and MetadataAnnouncementsComment-handling responsibilities (part of error-correction process)Issue LogsRelease Schedule
These practices are in rough order of
implementation.
30Taxonomy Strategies LLC The business of organized information
Published Facets
Consuming Applications
IntranetSearch
’’
Web CMS
Archives
ERMS
Custodians
Notifications
Change Requests & Responses
ISO3166-1
Other External
ERP
Other Internal
Vocabulary Management
System
Other Controlled
Items
…
’’
Intranet Nav.
DAM
…
Taxonomy governance environment
Taxonomy Governance Environment
CVs
2: Team decides when to update facets within Taxonomy
3: Team adds value via mappings, translations, synonyms, training materials, etc.
1: External vocabularies change on their own schedule, with some advance notice.
4: Updated versions of facets published to consuming applications
CV (Controlled Vocabulary) – The list of values for one facet in the Taxonomy.
31Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
32Taxonomy Strategies LLC The business of organized information
Controlled Items
Taxonomy Team will have several items to manage: Controlled Vocabularies Metadata Standard Editorial Rules Tagger Training Materials (manual and automatic) Charter, Goals, Performance Measures Team Processes Outreach & ROI
Website Communication plan Presentations Announcements
“Roadmap” Advanced practice, requires long planning horizon for organization's IT
projects
Even small taxonomy teams should develop many of these items, although not to the same level of formality.
33Taxonomy Strategies LLC The business of organized information
Controlled Vocabularies are not just tabbed lists
Source: NASA Taxonomy Competencies Facethttp://nasataxonomy.jpl.nasa.gov/nascomp/index_tt.htm
34Taxonomy Strategies LLC The business of organized information
Element Name XML Map Repeatable Source Purpose
General Purpose Metadata
Unique ID dc:identifier 1 System supplied System identifier to retrieve item.
Owner dc:creator ? System supplied POC for content maintenance
Title dc:title 1 User supplied Text search & results display
Date dc:date 1 System suppliedPublish, feature, & review content.
Subject Metadata
Organization x:corp * Corp Classif CV
Search for, browse, group & filter search results.
Asset x:asset * Asset CV
Region/Country dc:coverage * Country CV
Basin/Platform/Well x:well * B/P/Well CV
Content Type dc:type ? Content Types CV
Company/Client/Operator/Partner x:company * Company CV
Project x:project * Project CV
Use Metadata
DisciplinedcTerms: audience * Discipline CV Target, personalize content.
Retention x:retention 1 System supplied Remove expired contentLegend: ? – 1 or more * - 0 or more
Controlled Item: Metadata Specification
35Taxonomy Strategies LLC The business of organized information
Controlled Item: Editorial Rules
Akin to “Chicago Manual of Style”
Issues commonly addressed in the rules:AbbreviationsAmpersandsCapitalizationContinuations (More… or Other…)Duplicate TermsFidelity to External SourceHierarchy and PolyhierarchyLanguages and Character SetsLength Limits“Other” – Allowed or Forbidden?Plural vs. Singular FormsRelation Types and LimitsScope NotesSerial CommaSources of TermsSpacesSynonyms and AcronymsTranslationsTerm Order (Alphabetic or …)Term Label Order (Direct vs. Inverted)
What to do when rules conflict – how do people decide which rule is more important?
Rule Name Editorial Rule
Use Existing Vocabularies
Other things being equal, reusing an existing vocabulary is preferred to creating a new one.
Ampersands The character '&' is preferred to the word ‘and’ in Term Labels.Example: Use Type: “Manuals & Forms”, not “Manuals and Forms”.
Special Characters
Retain accented characters in Term Labels.Example: Use “España”, not “Espana”.
Serial comma If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS NOT preceded by a comma.Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.
Capitalization Use title case (where all words except articles are capitalized).Example: “Education, Learning & Employment”NOT “Education, learning & employment”NOT “EDUCATION, LEARNING & EMPLOYMENT”NOT “education, learning & employment”
… …
36Taxonomy Strategies LLC The business of organized information
Controlled Item: Training Materials
Staff will require training on The UI they use to tag the
content The rules to follow when deciding
what codes to apply The end-effect of the codes they
apply The structure of the taxonomy
Tagging examples come from earlier stages in taxonomy development process
Hardcopies of the taxonomy, and yellow highlighters, are helpful during training
Indexing rulesRule Description
Specificity rule
Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.
Repeatable rule
All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important. Storage is cheap. Re-creating content is expensive.
Appropriateness rule
Not all attributes apply to all assets. Only supply values for attributes that make sense.
Usability rule
Anticipate how the asset will be searched for in the future, and how to make it easy to find it. Remember that search engines can only operate on explicit information.
Indexing UI
37Taxonomy Strategies LLC The business of organized information
Controlled item: Communications Plan
Stakeholders: Who are they and what do they need to know?
Channels: Methods available to send messages to stakeholders. Need a mix of narrow vs. broad,
formal vs. informal, interactive vs. archival, …
Messages: Communications to be sent at various stages of project. Bulk of the plan is here
Channel Description
Demo Live, or screen capture for download
Presentation Tailored message for specific audience
Website Overview info for all, link to files
Memo Formal notification
… …
Stakeholders Info. Needed
Project Sponsors Progress, Issues, Policies
Dept. Reps Progress, Priorities,
… …
Users Progress, How-Tos
Vendors RFPs & SOWs
Trigger Msg. Descrip
From To Chan.
Initiation Project overview
Dept. head
All Memo
… … … … …
38Taxonomy Strategies LLC The business of organized information
Controlled Item: Team Charter
Taxonomy Team is responsible for maintaining: The Taxonomy, a multi-faceted classification scheme Associated materials, including a website providing:
Corporate Metadata Standard Editorial Style Guide Taxonomy Training Materials Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested changes. Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
39Taxonomy Strategies LLC The business of organized information
Remaining Controlled Items
Performance Measures to go along with Charter?
Team Processes (see later in this presentation)
Automatic Classifier Training Materials
Website
Presentations and Announcements
Change Request List (see later in this presentation)
“Taxonomy Roadmap” Advanced practice, requires long planning horizon for
organization's IT projects
40Taxonomy Strategies LLC The business of organized information
Exercise 2: Editorial Rules
Look at sample taxonomy
Think of ways to clean it up and make it ‘better’SmallerMore professional lookingEasy to use
Write editorial rules for the cleanups.
Provide an example with each rule:
Rule Name Editorial Rule
Plumem Lorne ipso ernum de jura fino el
Symosyit Esr Dirgin a periso de forestima
Himerisf Faleoin fi ribska firn eowkds
Capitalization All terms in lowercase.“programming, NOT “Programming”
41Taxonomy Strategies LLC The business of organized information
Exercise 2: Sample Taxonomy
Source: http://del.icio.us/tag/
42Taxonomy Strategies LLC The business of organized information
Exercise 2: Editorial Rules Worksheet
Rule Name Editorial Rule
Plurals Use plural form of names, not singular.
Capitalization All terms, except proper nouns, are lowercase.E.g. “programming”, NOT “Programming”.E.g. “Schwab”, not “schwab”.
Provide a name for each rule, the rule itself, and an example of the rule of the form “X, not Y”.
43Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
44Taxonomy Strategies LLC The business of organized information
Taxonomy StrategistTaxonomist
Information Architect 2Communications Specialist*
Organization 1: Taxonomy Governance TeamOrganization 1 – Internal portal for Fortune 50 Diversified Multinational.
Executive Sponsor
Advocate for the taxonomy team
Business Lead Keeps team on track with larger business
objectives Balances cost/benefit issues to decide
appropriate levels of effortSpecialists help in estimating costs
Obtains needed resources if those in team can’t accomplish a particular task
Technical Specialist Estimates costs of proposed changes in
terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems
Content SpecialistTeam’s liaison to content creatorsEstimates costs of proposed changes in terms
of editorial process changes, additional or reduced workload, etc.Small-scale Metadata QA Responsibility
Taxonomy SpecialistSuggests potential taxonomy changes based
on analysis of query logs, indexer feedbackMakes edits to taxonomy, installs into system
with aid of IT specialist
Content OwnerReality check on process change suggestions
Changes
45Taxonomy Strategies LLC The business of organized information
Organization 2: Vocabulary Policy Committee
Organization 2 – A non-profit international organization. Goal is to improve information management practices to reduce overlap between many similar vocabularies across many systems.Constraint: Even when number of vocabularies reduced, some must still have very close links. Business Lead Chairs group. Assures CVs fit with organization’s larger
information management effort. Small group management experience,
Information management background. Vocabulary Custodians (3)
Responsible for content in a specific CV, typically based on organizational lines.
Team lead experience, detail-oriented. Familiar with databases and organization processes
IT Representative Backups, admin of CV Tool IT administration experience
IT Steering Group Oversees Vocabulary Policy Committee
Stakeholders Managers of systems using the vocabularies, thus
affected by changes. They have a lot of visibility into the process. Control over CV changes is limited, but they
schedule their system’s adoption of changes.
Additional Roles – available during startup of team, and on an as-needed basis later
Training Representative Develops communications plan, training materials
Work Practices Representative Develops processes, monitors adherence
Other Relevant Staff
46Taxonomy Strategies LLC The business of organized information
Organization 3: Taxonomy Team
Organization 3 – Public catalog site for Fortune 50 Retailer. Data for products provided by manufacturers.
Business Lead Chairs committee, resolves disputes
Marketing Representatives Provide product marketing expertise Advocate for product manufacturers Represent data entry concerns
Website Representative Provides input on search and
navigation impacts Advocate for customers and other
website users Provides search log and click trail
analysis Taxonomy Specialist
Maintains taxonomy and product catalog
Provides data feeds to drive site
Larger team than many retailers, where a single person
is responsible.
A single person still makes the changes here, but there is
some oversight.
Fast-Track Process – A fast-track process exists, likely to be used very often. Representative will ask Taxonomy Specialist for a change and he will get approval from Website Representative.
Likely Changes
47Taxonomy Strategies LLC The business of organized information
What if I have to do it solo?
Realize: Its not totally solo – IT help, Graphics & UI help, Business Goals help,
Funding help, Review & QA help… You are the general contractor It needs to be part of your objectives Limit the objectives to what can be achieved by you, and by your
organization
Concentrate: Resource allocation
(i.e. Manage your time) Fundamental processes
Query log examination Error correction procedure
Communications!!!
Cherry-pick from RolesBusiness Lead – align with organization goals, get needed resources, make cost/benefit decisions, report upstairsIT Liaison – Work with IT specialists to get software installed, logs gathered, content harvested, etc. Consider impact of changes on tools and dataTaxonomy / Search Specialist – analyze behavior and suggest changes. Implement changes which pass cost/benefit musterWebsite/User Representative – consider impact of changes on users and job performance
48Taxonomy Strategies LLC The business of organized information
Exercise 3: Team & Stakeholder Identification
Role Applicable/Modify Name(s)
Taxonomy Team Members
Team Lead
Taxonomy Editor(s)
Vocabulary Custodian(s)
Liaisons with external vocabularies
Liaisons with applications using vocabularies
User advocate(s)
Training / Communications
IT / Data & System Maintenance
External Stakeholders
Team Supervisory Group
Representatives of external vocabularies
Representatives of consuming applications
Representatives of users
Other representatives of organization
49Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
50Taxonomy Strategies LLC The business of organized information
Taxonomy editing tools
Abi
lity
to E
xecu
telo
whi
gh
Completeness of VisionVisionariesNiche Players
Widely used, cheap, good reporting, bad
IDs
All upper-end tools are high functionality
and high cost.
Most popular taxonomy editor? MS
Excel
Immature industry – no vendors in upper-right quadrant!
This slide is out of date. Don’t know if we want
to include this.
51Taxonomy Strategies LLC The business of organized information
Taxonomy editor functionality requirements
Hierarchy
Browser
Term Editin
g
Standard and Custom FieldsStandard and Custom Relations
Data Typing and RestrictionsConsistency EnforcementFlexible ReportingFlexible Importing?
Basic
WorkflowVotingChange Request
Mgmt.Stylistic rules enforcementProgrammability
Ad
van
ced
UNICODEMultiple Vocabulary SupportInter-Vocabulary RelationsUnique IDs
ISO Codes not sufficientMid
ran
ge
52Taxonomy Strategies LLC The business of organized information
Taxonomy governance: Where changes come from
experience
End User
Firewall
Taxonomy
Content TaggingLogic
ApplicationUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of NASA
experience
End User
Taxonomy Team
FirewallFirewall
Taxonomy
Content TaggingLogic
TaggingLogic
ApplicationUI
ApplicationUI
TaggingUI
TaggingUI
Tagging Staff
Taxonomy Editor
Staff notes
‘missing’concepts
Query log analysis
Requests from other parts of the organization
Team considerations
1. Business goals
2. Changes in user experience
3. Retagging cost
Recommendations by Editor
1. Small taxonomy changes (labels, synonyms)
2. Large taxonomy changes (retagging, application changes)
3. New “best bets” content
Application Logic
I think three sources of change requests is a big
concept to communicate to
readers.
53Taxonomy Strategies LLC The business of organized information
Processes
Different organizations will need to consider their own change processes.Organization 1: A custodian is responsible for the content, but checks facts with
department heads before making changesOrganization 2: Analysts suggest changes, editors approve, copyeditors verify
consistencyOrganization 3: Marketing reps ask for a change, taxonomy editor makes demo, web
representative approves it.
Change process MUST also consider cost of implementing the changeRetagging dataReconfiguring auto-classifierRetraining staffChanges in user expectations
Case 1. Renaming a term
Case 2. Adding a new leaf term
Case 3. Inserting a new term
Case 4. Splitting a term
Case 5. Deleting a leaf term or subtree
Case 6. Deleting a term
Case 7. Moving a subtree
Case 8. Merging terms
Case 9. Adding a CV
Case 10. Deleting a CV
Taxonomy Change Cases
54Taxonomy Strategies LLC The business of organized information
Analyst Editor
Problem?
Copywriter
Problem?
Yes
Yes No
No
Suggest new name/category
Review new name
Taxon-omy
Taxonomy Tool
Copy edit new name
Add to enterprise Taxonomy
Sys Admin
Taxonomy governance: Taxonomy maintenance workflow
Can contrast this process with others that are less formal and/or less like a newsroom..
Couple more are described on next slide.
55Taxonomy Strategies LLC The business of organized information
Other change processes
Processes may be diagramed or written
Provide an ‘emergency’ change process because it will be needed.How can emergency changes be requested? Who makes the change and who approves it?Who are backups for the people when they are out?Who are escalation points?
Change Request Process should call out decision criteria, e.g.Cost of retaggingBenefit of changeConflict with editorial rules
Organization X:Change Request Process
Anyone can ask a team member for a change. Team members responsible for figuring out details and bringing to team for decision.
Pending changes list for low priority/high cost items.Change Process
Includes preview of change on site and data mockupFast-Track Change Process
Anyone can ask editor, he gets team leader or deputy approval
56Taxonomy Strategies LLC The business of organized information
Fundamental Processes & Outlooks
Two fundamental processes every organization should implement to maintain its metadata and taxonomies: Query log / Click trail examination Error Correction
What are the key outlooks a taxonomist should try to instill in their organization? Integrated approach to Taxonomy, Metadata, Search,
and UI Measure & Improve Mindset
Another biggie
57Taxonomy Strategies LLC The business of organized information
Fundamental process #1 – Query log examination How can we characterize users
and what they are looking for?
Query Log & Click Trail Examination Only 30-40% of organizations
interested in Taxonomy Governance examine query logs*
Basic reports provide plenty of real value
Greatest value comes from: Identifying a person as
responsible for search quality Starting a “Measure & Improve”
mindset
Greatest challenge: Getting a person assigned (≥
10%) Getting logs turned back on
UltraSeek Reporting
• Top queries • Queries with no
results • Queries with no
click-through • Most requested
documents • Query trend
analysis • Complete server
usage summary Click Trail Packages
iWebTrackNetTrackerOptimalIQ
SiteCatalystVisitorvilleWebTrends
Source: Metadata Maturity Model Presentation, Ron Daniel, ESS’05
58Taxonomy Strategies LLC The business of organized information
Fundamental process #2 – Error correction
Errors will happen, and some will be found. What are you going to do about them? Tagging errors, content errors, taxonomy errors, …
Define an error correction process. Process will accommodate questions like:
Who looks at it? Is it an error? What are the costs to correct vs. not correct? Does the correction need to be scheduled? etc.
Once a tagging error is corrected, NEVER lose that fact. Manually reviewed pages are vital for training automatic classifiers Has implications for metadata specification and review procedures
Over time, multiple error detection methods will be defined e.g. Statistical sampling of newly added pages Gradually, additional error correction processes may be defined to deal
with particular types of errors
You have an error correction process. Would
you hate to see it on paper?
59Taxonomy Strategies LLC The business of organized information
Fundamental Outlooks
Measure & Improve Mindset Query logs and click trails are prime example Next place to instrument: Error correction and error
detection processes
Integrated handling of Taxonomy, Metadata, UI, & Search To be most effective, these must work together Governance structure must help that happen Cross-functional team structure is a start
60Taxonomy Strategies LLC The business of organized information
Actions to define taxonomy governance
Initial vocabularies should be selected for stability as well as utility.
Custodians of shared vocabularies must be identified, educated re. impacts of changes.
Group of custodians and stakeholders must be established. (Simple) System for sharing the CVs and tracking the
update process must be established.
61Taxonomy Strategies LLC The business of organized information
Agenda
10:15 Introduction
10:30 Background
10:35 Maintainable Taxonomies
10:45 Maintainable Metadata
10:50 ROI Estimation
11:00 Governance Environment
11:10 Controlled Items
11:30 Team Structures
11:45 Change Process
12:00 Exercises
12:15 Adjourn
62Taxonomy Strategies LLC The business of organized information
Exercise 4: Self-Diagnosis
1. Does your organization know what it is, or wants to be, doing around search & taxonomy yet?
2. Is the cost basis for the taxonomy ROI clear to you?
3. Is the benefits basis for the taxonomy ROI clear to you?
4. Is the cost basis for the taxonomy ROI clear to your CFO?
5. Is the benefits basis for the taxonomy clear to your CFO?
6. Do you know how content will be tagged?
7. Do you know how tagged content will be displayed to users?
8. Do you know how users will fetch the content?
9. Do users know how they should report errors in the tagging?
10.Do you know how what information will be logged for later analysis?
11. Do you know what information has to be reported to management to justify the taxonomy team?
12.Does management expect the taxonomy team to justify its existence?
13. Is your organization planning a tightly focused taxonomy effort?
14. Is your organization planning a credible ‘Enterprise Taxonomy Strategy’?
15.Does your organization expect its taxonomies to change frequently?
16.Has your organization identified some facets as stable and some facets as volatile?
17.Does your organization have a plan for retagging data when the taxonomy is changed?
18.Do you have an identified taxonomy “team” with at least one person?
19. Is there at least one person working on taxonomy/metadata/search more than ½ time?
20.Does the team contain members who represent search, UI, and metadata tagging?
21.Does the organization have any hiring and training criteria for taxonomy, metadata, and search positions?
22.Does the team maintain Editorial Rules?
23.Does the team maintain a corporate metadata specification?
24.Does the team maintain educational materials?
25.Does the team have a communications plan?
26.Does the team examine query logs?
27.Does the team examine click trails?
28.Does the team have a documented error correction process?
29.Does the organization have a procedure to locate ROT (Redundant, Obsolete, or Trivial content)?
30.Does the organization have any qualitative or quantitative measures of data quality?
31.Do you use a tool other than MS Excel for editing and maintaining the Taxonomy?
32.Were taxonomy, metadata, search, or content management tools purchased with money other than “use it or lose it” funds?
I think a self-diagnosis quiz like this could be
nice to have in the book. Also see the “Metadata Maturity
Model” stuff in the next set of slides.
Strategies LLCTaxonomy
Copyright 2009Taxonomy Strategies LLC. All rights reserved.
Data Governance Maturity:
When the business depends on clear description of fuzzy objects
Presented to San Francisco DAMA
Sept. 10, 2008
Ron Daniel, Jr.
64Taxonomy Strategies LLC The business of organized information
Goals for this talk
Provide you with background on maturity models.
Provide the results of our surveys of Search, Metadata, & Taxonomy practices and discuss interesting findings.
Review the practices in use at stock photo houses, and compare them to methods that may be used in typical information management projects.
Give you the tools to do a simple self-assessment of your organization’s metadata maturity
65Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
66Taxonomy Strategies LLC The business of organized information
Taxonomy and metadata definitions
Metadata “Data about data”. Different communities have very different assumptions
about they types of data being described. I’m from the Information Science community, not the database,
statistics, or massive storage communities.
Taxonomy1. The classification of organisms in an ordered system
that indicates natural relationships.
2. The science, laws, or principles of classification; systematics.
3. Division into ordered groups, categories, or hierarchies.
67Taxonomy Strategies LLC The business of organized information
Examples of taxonomy used to populate metadata fields
ExcelCard.ico
PDFCard.ico
OffAcc.ico
PPTCard.ico
Metadata
Title
Author
Department
Audience
Topic
Topics
Employee Services
Compensation
Retirement
Insurance
Further Education
Finance and Budget
Products and Services
Support Services
Infrastructure
Supplies
Metadata Values(Facets within the overall Taxonomy)
Audience
InternalExecutives
Managers
External
Suppliers
Customers
Partners
68Taxonomy Strategies LLC The business of organized information
Example faceted taxonomy
ABC Computers.com
AllBusinessABC EmployeeEducationGaming Enthusiast
HomeInvestorJob SeekerMediaPartnerShopper
First TimeExperiencedAdvanced
Supplier
Audience
AllHome & Home Office
GamingGovernment, Education & Healthcare
Medium & Large Business
Small Business
Line of Business
AllAsia-PacificCanadaABC EMEAJapanLatin America & Caribbean
United States
Region-Country
DesktopsMP3 PlayersMonitorsNetworkingNotebooksPrintersProjectorsServersServicesStorageTelevisionsNon-ABC Brands
Product Family
AwardCase StudyContract & Warranty
DemoMagazineNews & EventProduct Information
ServicesSolutionSpecificationTechnical NoteToolTrainingWhite PaperOther Content Type
Content Type
Business & Finance
Interpersonal Development
IT Professionals Technical Training
IT Professionals Training & Certification
PC ProductivityPersonal Computing Proficiency
Competency Industry
Banking & Finance
Communica-tions
E-BusinessEducationGovernmentHealthcareHospitalityManufacturingPetro-chemocals
Retail / Wholesale
TechnologyTransportationOther Industries
Service
Assessment, Design & Implementation
DeploymentEnterprise Support
Client Support
Managed Lifecycle
Asset Recovery & Recycling
Training
69Taxonomy Strategies LLC The business of organized information
Manually tagged metadata sample
Attribute Values
Title Jupiter’s Ring System
URL http://ringmaster.arc.nasa.gov/jupiter/
Description Overview of the Jupiter ring system. Many images, animations and references are included for both the scientist and the public.
Content Types Web Sites; Animations; Images; Reference Sources
Audiences Educators; Students
Organizations Ames Research Center
Missions & Projects Voyager; Galileo; Cassini; Hubble Space Telescope
Locations Jupiter
Business Functions Scientific and Technical Information
Disciplines Planetary and Lunar Science
Time Period 1979-1999
70Taxonomy Strategies LLC The business of organized information
Other things sometimes called Taxonomy
Type Remarks
Synonym Ring 4 Connects a series of terms together 4 Treats them as equivalent for search purposese.g (Dog, Canine, Pooch, Mutt) (Cat, Feline, Kitty), …
Authority File 4 Used to control variant names with a preferred term 4 Typically used for names of countries, individuals, organizationse.g. (IBM, Big Blue, International Business Machines Inc.)
Classification Scheme
4 A hierarchical arrangement of terms4 May or may not follow strict “is-a” hierarchy rules4 Usually enumerated; ie, LC or Dewey
Thesaurus 4 Expresses semantic relationships of: • Hierarchy (broader & narrower terms)• Equivalence (synonyms) • Associative (related terms)
4 May include definitions
Ontology 4 Resembles faceted taxonomy but uses richer semantic relationships among terms and attributes and strict specification rules
4 A model of reality, allowing inferences to be made.
71Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
72Taxonomy Strategies LLC The business of organized information
Organizational benchmarking
A common goal of organizations is to ‘benchmark’ themselves against other organizations.
Different organizations have: Different levels of sophistication in their planning,
execution, and follow-up for CMS, Search, Portal, Metadata, and Taxonomy projects.
Different reasons for pursuing Search, Metadata, and Taxonomy efforts
Different cultures
Benchmarks should be to similar organizations.
73Taxonomy Strategies LLC The business of organized information
Is unnecessary capability harmful?
Tool Vendors continue to provide ever-more capable tools with ever-more sophisticated features. But we live in a world where a significant fraction of
public, commercial, web pages don’t have a <title> tag. Organizations that can’t manage <title> tags stand a
very poor chance of putting an entity extractor to use, which requires some ongoing management of the lists of entities to be extracted.
Organizations that can’t create and maintain clean metadata can’t put a faceted search UI to good use.
Unused capability is poor value-for-money. Organizations over-spend on tools and under-spend on
staff & processes.
74Taxonomy Strategies LLC The business of organized information
Towards better benchmarking…
Wanted a method to: Generally identify good and bad practices. Help clients identify the things they can do, and the things that
stand an excellent chance of failing. Predict likely sources of problems in engagements.
We have started to develop a Metadata Maturity Model, inspired by Maturity Models from the software industry.
To keep the model tied to reality, we are conducting surveys to determine the actual state of practice around search, metadata, taxonomy, and supporting business functions such as staffing and project management.
75TAXONOMY STRATEGIES The business of organized information
A Tale of Two Software Maturity Models
CMMI (Capability Maturity Model Integration)
vs.
The Joel Test
76Taxonomy Strategies LLC The business of organized information
CMMI structure
Source: http://chrguibert.free.fr/cmmi
Maturity Models are collections of Practices.
Main differences in Maturity Models concern:
• Descriptivist or Prescriptivist Purpose
• Degree of Categorization of Practices
• Number of Practices (~400 in CMMI)
77Taxonomy Strategies LLC The business of organized information
22 Process Areas, keyed to 5 Maturity Levels… Process Areas contain Specific
and Generic Practices, organized by Goals and Features, and arranged into Levels
Process Areas cover a broad range of practices beyond simple software development
CMMI Axioms:Individual processes at higher levels are AT RISK from supporting processes at lower levels.A Maturity Level is not achieved until ALL the Practices in that level are in operation.
78Taxonomy Strategies LLC The business of organized information
CMMI Positives
Independent audits of an organization’s level of maturity are a common service Level 3 certification frequently required in bids
“…compared with an average Level 2 program, Level 3 programs have 3.6 times fewer latent defects, Level 4 programs have 14.5 times fewer latent defects, and Level 5 programs have 16.8 times fewer latent defects”.
Michael Diaz and Jeff King – “How CMM Impacts Quality, Productivity,Rework, and the Bottom Line”
‘If you find yourself involved in product liability litigation you're going to hear terms like "prevailing standard of care" and "what a reasonable member of your profession would have done". Considering the fact that well over a thousand companies world-wide have achieved level 3 or above, and the body of knowledge about the CMM is readily available, you might have some explaining to do if you claim ignorance’.
Linda Zarate in a review of A Guide to the Cmm: Understanding the Capability Maturity Model for Software by Kenneth M. Dymond
79Taxonomy Strategies LLC The business of organized information
CMMI Negatives
Complexity and Expense Reading and understanding the materials Putting it into action – identifying processes, mapping
processes to model, gathering required data, … Audits are expensive
CMMI does not scale down well to small shops Has been accused of restraint of trade
80Taxonomy Strategies LLC The business of organized information
At the other extreme, The Joel Test
Developed by Joel Spolsky as reaction to CMMI complexity
Positives - Quick, easy, and inexpensive to use.
Negatives - Doesn’t scale up well:Not a good way to assure the quality of nuclear reactor software.Not suitable for scaring away liability lawyers.Not a longer-term improvement plan.
The Joel Test1. Do you use source control? 2. Can you make a build in one step? 3. Do you make daily builds? 4. Do you have a bug database? 5. Do you fix bugs before writing new code? 6. Do you have an up-to-date schedule? 7. Do you have a spec? 8. Do programmers have quiet working conditions? 9. Do you use the best tools money can buy? 10.Do you have testers? 11. Do new candidates write code during their interview? 12.Do you do hallway usability testing?
Scoring: 1 point for each ‘yes’. Scores below 10 indicate serious trouble.
81Taxonomy Strategies LLC The business of organized information
What does software development “Maturity” really mean?
A low score on a maturity audit DOES NOT mean that an organization can’t develop good software
It DOES mean that whether the organization will do a good job depends on the specific mix of people assigned to the project
In other words, it sets a floor for how bad an organization is likely to do, not a ceiling on how good they can do Probability of failure is a good thing to know before
spending a lot of time and money
82TAXONOMY STRATEGIES The business of organized information
Towards a Metadata Maturity Model
Caveats: Maturity is not a goal, it is a characterization of an
organization’s methods for achieving its core goals.
Mature processes impose expenses which must be justified by consequent cost savings, revenue
gains, or service improvements.
Nevertheless, Maturity Models are useful as collections of best practices and stages in which to try to adopt
them.
83Taxonomy Strategies LLC The business of organized information
Basis for initial maturity model
CEN study on commercial adoption of Dublin Core
Small-scale phone survey Organizations which have world-class search and
metadata externally Not necessarily the most mature overall processes or
the best internal search and metadata
Literature review
Client experiences
Structure from software maturity models
84Taxonomy Strategies LLC The business of organized information
Initial Metadata Maturity Model (ca. May, 2005)
Practice Area Maturity Level
Basic Intermediate Advanced Bleeding- Edge
Limiting
Search Capabilities Uniform Search BoxQuery Log Exam.
Index Multiple Repos.Best BetsSimple Grouping
Intranet Facet NavigationImproved Ranking
Metadata and taxonomy standards
System MD Stds. Organization MD Std.Reuse ERP
Multipe Repos ComplyTaxonomy Roadmap
Highly Abstract Subject Taxos.
Tools and tool selection
Requirements, then Tools
Bakeoff Datasets Budget for Bakeoffs Unneeded Capabils.Tools, then Reqs.
Staff training and hiring
Search Analyst Role Librarian Expertise Pre-hire Testing SME Catalogers
Data creation and QA CM Introduced ROT-Eliminatiion Hybrid Creation Model Adaptive QualificationQuality Measures
Project management Project Plan Std. Proj. Methodol.X-Functional TeamsCommunication PlanMulti-Year Plan
Early Termination
Executive support and ROI
External Search ROI Intranet ROI Model CEO knows Search ROI Use it or Lose It Budgets
37 Practices, Categorized by Area, Level, and
Importance
85Taxonomy Strategies LLC The business of organized information
Shortcomings of the initial model
No idea of how it corresponds to actual practice across multiple organizations Some indications that it over-emphasized the sophisticated
practices and under-emphasized beginning practices.
The initial metadata maturity model can be regarded as a hypothesis about how an organization progresses through various practices as it matures How to test it? Let’s ask! Two surveys to date Surveys are being run in stages because of large number of
practices. Ask about future, current, and former practices to gather
information on progression
86Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
87TAXONOMY STRATEGIES The business of organized information
Survey 1: Search, Metadata, & Taxonomy Practices
The data in this section comes from a survey conducted in the autumn of 2005.
88Taxonomy Strategies LLC The business of organized information
Participants by Organization Size
89Taxonomy Strategies LLC The business of organized information
Participants by Job Role
90Taxonomy Strategies LLC The business of organized information
Participants by Industry
91Taxonomy Strategies LLC The business of organized information
Search Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Search Box in standard place on all web pages. 20% (12) 11% (7) 62% (38) 2% (1) 5% (3)
Search engine indexes multiple repositories in addition to web sites. 25% (15) 21% (13) 44% (27) 2% (1) 8% (5)
Spell Checking. 31% (19) 18% (11) 38% (23) 0% (0) 13% (8)
Synonym Searching. 41% (25) 23% (14) 30% (18) 0% (0) 7% (4)
Search results grouped by date, location, or other factors in addition to simple relevance score. 37% (22) 20% (12) 37% (22) 0% (0) 7% (4)
Queries are logged and the logs are regularly examined 31% (19) 25% (15) 31% (19) 5% (3) 8% (5)
Common queries identified, 'best' pages for those queries are found, and search engine configured to return them at the top. 46% (28) 25% (15) 21% (13) 0% (0) 8% (5)
Advanced computation of relevance based on data in addition to the text of the document. 43% (26) 16% (10) 25% (15) 0% (0) 16% (10)
A faceted search tool, such as Endeca, has been implemented for the organization's external site or product catalog search. 68% (41) 7% (4) 10% (6) 0% (0) 15% (9)
A faceted search tool, such as Endeca, has been implemented for the organization's internal website(s) or portal. 57% (34) 15% (9) 17% (10) 0% (0) 12% (7)
92Taxonomy Strategies LLC The business of organized information
Metadata Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Metadata standards are developed for the needs of each system with no overall attempt to unify them. 22% (13) 12% (7) 37% (22) 20% (12) 10% (6)
An Organization-wide metadata standard exists and new systems consider it during development. 37% (22) 37% (22) 20% (12) 0% (0) 7% (4)
The Organization-wide metadata standard is based on the Dublin Core. 52% (30) 16% (9) 21% (12) 0% (0) 12% (7)
Multiple repositories comply with metadata standard. 52% (31) 20% (12) 17% (10) 0% (0) 12% (7)
A Cataloging Policy document exists to teach people how to tag data in compliance with organizational metadata standard. 48% (29) 20% (12) 20% (12) 0% (0) 12% (7)
The Cataloging Policy document is revised periodically. 48% (29) 15% (9) 17% (10) 0% (0) 20% (12)
A centralized metadata repository exists to aggregate and unify metadata from disparate sources. 57% (34) 17% (10) 17% (10) 0% (0) 10% (6)
Metadata is manually entered into web forms. 15% (9) 12% (7) 61% (36) 3% (2) 8% (5)
Metadata is generated automatically by software. 38% (23) 18% (11) 27% (16) 2% (1) 15% (9)
Metadata is generated automatically, then reviewed manually for correction. 48% (29) 18% (11) 17% (10) 2% (1) 15% (9)
These two questions were the only ones with much correlation to
organization size
93Taxonomy Strategies LLC The business of organized information
Taxonomy Practices
Not current practice
Being developed In practice
Former practice
NA or Unknown
Org Chart' Taxonomy - One based primarily on the structure of the organization. 36% (21) 10% (6) 34% (20) 5% (3) 15% (9)
'Products' Taxonomy - One based primarily on the products and/or services offered by the organization. 37% (22) 10% (6) 32% (19) 5% (3) 15% (9)
'Content Types' Taxonomy - One based primarily on the different types of documents. 28% (16) 21% (12) 40% (23) 5% (3) 7% (4)
'Topical' Taxonomy - One based primarily on topics of interest to the site users. 20% (12) 36% (21) 34% (20) 3% (2) 7% (4)
'Faceted' Taxonomy - One which uses several of the approaches above. 32% (19) 29% (17) 34% (20) 0% (0) 5% (3)
The Taxonomy, or a portion of it, was licensed from an outside taxonomy vendor. 75% (44) 3% (2) 14% (8) 0% (0) 8% (5)
The Taxonomy follows a written 'style guide' to ensure its consistency over time. 47% (28) 22% (13) 20% (12) 0% (0) 10% (6)
The Taxonomy is maintained using a taxonomy editing tool other than MS Excel. 35% (21) 17% (10) 40% (24) 2% (1) 7% (4)
The Taxonomy was validated on a representative sample of content during its development. 28% (17) 22% (13) 33% (20) 3% (2) 13% (8)
A Roadmap for the future evolution of the Taxonomy has been developed. 38% (23) 40% (24) 13% (8) 0% (0) 8% (5)
94TAXONOMY STRATEGIES The business of organized information
Survey 2: Business Drivers, Processes, and Staffing
The data in this section comes from a survey conducted in the spring of 2006.
95Taxonomy Strategies LLC The business of organized information
Participants by Job Role
96Taxonomy Strategies LLC The business of organized information
Participants by Tenure
97Taxonomy Strategies LLC The business of organized information
Participants by Industry
98Taxonomy Strategies LLC The business of organized information
Participants by Organization Size
99Taxonomy Strategies LLC The business of organized information
Business Drivers: Search, Metadata, and Taxonomy (SMT) Applications
100Taxonomy Strategies LLC The business of organized information
Business Drivers: Desired Benefits
1 Innovation
2 Core to our business product3 Clients do all the above [From a consultant]4 Better navigation to diverse State web sites5 Increased knowledge sharing across the corporation6 Interoperability7 Dynamic web applications8 Improved user search experience9 Improve R&D
10Higher value to members [From a non-profit membership
org.]11 For organization to have better understanding of their content
Other desired benefits
:
101Taxonomy Strategies LLC The business of organized information
ROI: Cost Estimation
102Taxonomy Strategies LLC The business of organized information
Processes
Use of search logs is
improving
Surprisingly sophisticated
Basic data quality and communications need improvement
Many solo operators
103Taxonomy Strategies LLC The business of organized information
Team Structures & Staffing
104Taxonomy Strategies LLC The business of organized information
Salary Survey
Experience 0.6 Nice to see it really counts.
Geography 0.5 California and the Northeast have highest salaries.
Co. Size 0.5 Not very reliable, big changes from one datapoint
Education 0.4 Many taxonomists have MLS or above.
Industry 0.4 Surprisingly, retail has high salaries for taxonomists.
Role 0.04 Taxonomists paid about like Information Architects
Time at current job -0.07
105Taxonomy Strategies LLC The business of organized information
Notes from Participants
There is the constant struggle with individual [magazine] titles to hire trained librarians or data specialists instead of trying to save money by hiring an editor who can build articles AND create and assign metadata. This is a governance issue we have been struggling with since we have no monetary stake in the individual publications. We make recommendations, but have no higher level authority to require titles to hire trained staff for metadata.
Reporting metrics have become a new area of confusion as we move to portalized pages consisting of objects in portlets, each with their own metadata.
Key organizational issue is that the "problems" that stem from lack of systematic metadata/taxonomy creation are not "owned" by anyone, and consequently have no budget for their solution.
106TAXONOMY STRATEGIES The business of organized information
Interim Conclusions
107Taxonomy Strategies LLC The business of organized information
Observations (1)
Practices which a single person or a small group can carry out are more commonly used Not surprising Very different than ERP/BPR, indicates that information
management is not being sold to the “C-level” staff. People need to question how inclusive their
“Organizational Metadata Standards” and “Taxonomy Roadmaps” actually are. We have found Taxonomy Roadmaps to be an advanced
practice, due to a dependence on knowing upcoming IT development schedule
108Taxonomy Strategies LLC The business of organized information
Observations (2)
Many of the basics are being skipped More organizations doing “Spell Checking” than “Query
Log Analysis”. 69% have a taxonomy change plan, but only 41% have
a plan for revisiting data if the taxonomy changes. 64% have a communications plan, but only 56% have a
website. This seems to be linked to the previous observation –
things that are easy for an individual get done before things that need an organizational effort, despite their level of ‘sophistication’.
109Taxonomy Strategies LLC The business of organized information
Interim Metadata Maturity Model (ca. May, 2006)
Practice Area Basic Intermediate Advanced Limiting
Search Capabilities Uniform Search BoxQuery Log Exam.
Index Multiple Repos.Best Bets
Facet Navigation UI
Metadata and taxonomy standards
System MD Stds.Organization MD Std.
Multipe Repos Comply w/ MD Std.Reuse ERP TaxosTaxo Maint. Doc
Taxonomy RoadmapHighly Abstract Subject Taxos (e.g. “Moods”)Metadata Maint. Doc
Tools and tool selection
Requirements, then Tools Bakeoff Datasets Budget for Bakeoffs Tools, then Reqs.
Staff training and hiring
Librarian or IA ExpertiseSearch Analyst Role
Cross-Functional Taxonomy Creation
Cross-functional taxonomy maint.SME CatalogersPre-hire Testing
Data creation and QA CM Introduced ROT-EliminatiionSemi-auto tagging
Quality Measures
Project management Project PlanX-Functional Teams
Std. Proj. Methodol.Multi-Year PlanCommunication PlanSMT Business Manager, instead of IT Manager
Early Termination
Executive support and ROI
External Search ROISMT in separate silos
Intranet ROI Model CEO knows Search ROI Use it or Lose It Budgets
110Taxonomy Strategies LLC The business of organized information
Search and Metadata Maturity Quick Quiz
Basic1) Is there a process in place to examine query logs?2) Is there a process for adding directories and content to the repository, or do people just
do what they want?3) Is there an organization-wide metadata standard, such as an extension of the Dublin
Core, for use by search tools, multiple repositories, etc.?Intermediate4) Does the search engine index more than 4 repositories around the organization?5) Does the search engine integrate with the taxonomy to improve searches and organize
results?6) Are there hiring and training practices especially for metadata and taxonomy positions?7) Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete,
Trivial content)?8) Are tools only acquired after requirements have been analyzed, or are major purchases
sometimes made to use up year-end money?Advanced9) Are there established qualitative and quantitative measures of metadata quality?10) Can the CEO explain the ROI for search and metadata?
111Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
112Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
113Taxonomy Strategies LLC The business of organized information
Stock Photo Business
Advertising, Editorial Content, Corporate Communications, and many other types of content rely on images to convey information and moods.
When time and/or budget does not allow a commissioned shoot, stock photo houses can supply images.
Fundamental problem for users: How to search for an image that conveys what you want?
Fundamental problem for houses: How to describe images so that users can find them?
114Taxonomy Strategies LLC The business of organized information
How would you search for this image?
115Taxonomy Strategies LLC The business of organized information
Tagging by emotions
116Taxonomy Strategies LLC The business of organized information
“silence”
Conceptual refinement
Objective criteria
Conceptual refinement
Image Rights Criteria
117Taxonomy Strategies LLC The business of organized information
Clarification: Finger on Lips
118Taxonomy Strategies LLC The business of organized information
Scrolling through results…
This is more of the mood I’m looking for…
119Taxonomy Strategies LLC The business of organized information
More like this
120Taxonomy Strategies LLC The business of organized information
Facets at gettyimages.com
121Taxonomy Strategies LLC The business of organized information
Key Questions
Getty Images (and Corbis) have put a lot of effort into their websites for image purchase*.
Internal staff at such organizations tell me that their intranets are nowhere near as easy to use. ROI is the reason why. Recall that retail had high salaries for taxonomists,
because the ROI for a better shopping site is so clear.
The front-ends are dependent on data. How is that data governed? How does that differ from how their intranets are governed?
*Licensing, not purchasing, to be pedantic.
122Taxonomy Strategies LLC The business of organized information
Agenda
9:15 Metadata Definitions
9:30 Maturity Models
9:45 Metadata Maturity Model (ca. 2006)
10:15 Break
10:30 Stock Photo Business
10:40 Data Governance Practices in Stock Photo Agencies
11:40 Summary
11:45 Questions
12:00 Adjourn
123Taxonomy Strategies LLC The business of organized information
Who are the users & what are they looking for?
Only 30-40% of organizations regularly examine their logs.
Sophisticated software available, but don’t wait. 80% of value comes from basic reports
124Taxonomy Strategies LLC The business of organized information
Query log & click trail examination—Click trail packages iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends
Overkill
125Taxonomy Strategies LLC The business of organized information
Query log & click trail examination– Query log
UltraSeek Reporting Top queries Queries with no results Queries with no click-through Most requested documents Query trend analysis Complete server usage
summary
Basic queries provide most of the value if organization has a
process to review what is going one.
126Taxonomy Strategies LLC The business of organized information
Key Governance Aspects
Roles and Responsibilities – Managers Reviewers
Policies – For naming Required Fields
Procedures – For reviewing and approving metadata placement For acting on poor metadata application
127Taxonomy Strategies LLC The business of organized information
Recommended Measure and Improve Mindset Measure - Determine current situation and what is wrong.
• Too many documents in a category? Too many categories? People complaining about not finding material that is on the site? People asking for materials not on the site? Common searches without results?
Decide – Decide how to change things to fix the problem.• Change navigation list? Add new categories? Add synonyms to search? Create
new content?
Confirm – Before rolling out changes, test them to make sure they will improve the problem.
• Usability tests, Card sorts, Internal functionality tests, …
Implement – Roll out the changes.
Repeat – Monitor people’s behavior on the site as well as responding to reported problems.
• Query log examination, Clicktrail examination, Google search result position, Stakeholder feedback, User surveys, Site analytics, etc.
128Taxonomy Strategies LLC The business of organized information
Taxonomy team: Generic roles
Business Lead
Technical Specialist
Content Specialist
Taxonomy Specialist
Content Owners
Keeps team on track with larger business objectives.
Reality check on process change suggestions.
Balances cost/benefit issues to decide appropriate levels of effort.
Obtains needed resources if those on committee can’t accomplish a particular task.
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems.
Committee’s liaison to content creators. Estimates costs of proposed changes in terms of editorial
process changes, additional or reduced workload, etc.
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback.
Makes edits to taxonomy, installs into system with aid of IT specialist.
Stakeholder Committee
129Taxonomy Strategies LLC The business of organized information
Recommended Reading
CMMI: http://chrguibert.free.fr/cmmi
(Official site is http://www.sei.cmu.edu/cmmi/, but that is not the most comprehensible.)
Joel Testhttp://www.joelonsoftware.com/articles/fog0000000043.html
EIA Roadmaphttp://www.louisrosenfeld.com/presentations/031013-KMintranets.ppt
Enterprise Search Reporthttp://www.cmswatch.com/EntSearch/
130Taxonomy Strategies LLC The business of organized information
Fun Questions
The animals are divided into:(a) belonging to the emperor,(b) embalmed, (c) tame, (d) sucking pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification,(i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from along way off look like flies.
Jorge Luis Borges, " THE ANALYTICAL LANGUAGE OF JOHN WILKINS"Works in 3 volumes (in Russian). St. Petersburg, "Polaris", 1994. V. 2: 87.
This was created to be
as bad a classification as possible.
What makes it so bad?
Strategies LLCTaxonomy
Copyright 2009Taxonomy Strategies LLC. All rights reserved.
Contact Info
Ron Daniel, Jr.
925-368-8371