  Document Repositories & Metadata Richard Beatch Earley & Associates


    • Focus: Information Architecture (IA) Services
    • Founded: 1994
    • Personnel: Twenty core team consultants, plus a network of other top industry experts
      • ECM and KM experts
      • taxonomy specialists
      • search experts
      • information architects
      • usability professionals
      • technology consultants
      • business process experts
    • Headquarters: Boston, MA

About Earley & Associates, Inc.

    • Consulting Philosophy:
      • Organizing Principles based on business context and goals
      • Four Pillars - People, Content, Process, and Technology

Core Capabilities Enterprise Search, Portal Design, Collaboration Web Content Management Workflow Management Security & PrivacyManagement Rights Management Records Management Website Navigation, Search & SEO Digital Asset Management Taxonomy, Metadata, & Usability

  • Document/Content/Management:
  • Strategy and requirements planning
  • Taxonomy, Metadata, Object modeling
  • Audit and analysis
  • Migration
  • Tagging and indexing
  • Lifecycle and workflow planning
  • Technology selection, RFP development
  • Governance
  • Taxonomy & Metadata:
  • Taxonomy strategy
  • Taxonomy development (for e-commerce, faceted search, ECM, DAM, enterprise taxonomy, thesauri)
  • Taxonomy evaluation and testing
  • Taxonomy implementation
  • Taxonomy governance and training
  • Taxonomy tool selection
  • Metadata standards development
  • Metadata schema design
  • Metadata governance
  • Digital Asset Management:
  • DAM strategy
  • DAM taxonomy
  • DAM technology evaluation
  • Asset lifecycle management
  • Marketing resource management (MRM)
  • Information Architecture/Usability:
  • Usabilitystudies(site,navigation,taxonomy)
  • Wireframesand IA design
  • Search:
  • Search audit and user testing
  • Search strategy and ROI analysis
  • Taxonomy for faceted search and search optimization
  • Search deployment
  • Search and business intelligence
  • Search tuning and SEO
  • Search technology evaluation/tool selection

5. About Me

  • Richard Beatch
    • Senior Consultant at Earley & Associates, Inc.
    • Ph.D. in Ontology
    • Specialized in Taxonomy, Search, Metadata, and content architecture.
    • Extensive industry experience leading the implementation and design of taxonomies and search solutions for a range of companies including Apple, McAfee, Allstate, Dell, and AT&T.
    • Blog:

The Challenge

  • Suppose you have roughly 1 Million scanned documents entering your document management system each week
  • Suppose you want users to be able to find them in the future so as to conduct your business
  • Suppose it is 2001

The result

  • H:DocStoreCaliforniaClaimsAutoPoliceRepPhotosDR65876KL
  • H:DocStoreCaliforniaClaimsAutoPoliceRepPhotosDR64876KL
  • H:DocStoreCaliforniaClaimsAutoPoliceRepPhotosDR64879DL
  • H:DocStoreCaliforniaClaimsAutoPoliceRepPhotosDW72876KL
  • Multiplied by (roughly) 250K each week

Why should I care about access anyways?

  • Reuse of content
  • Access in order to do business, e.g., process an insurance claim
  • Access for regulatory needs
  • In short, to either generate revenue or save money

How do we access this information now?

  • Ad HOC mechanisms
    • File shares
    • Snail mail/sending CDs
    • Email

Why Ad Hoc approaches still fail:

  • Intricacies of:
    • files formats
    • digital rights
    • time to transfer content


    • Wow what a cool photo can I re-use it?
    • Yeah sure let me get a copy and send it over

Document Management to the Rescue

  • Ad Hoc Sharing frustration bubbles up to the surface
  • Business recognizes the need and the potential cost savings over time

We need a document management system

  But we all know the answer:

But how do you expect me to find content?
Taxonomy & Metadata For Findability

  • Type : Magazine Advertisement
  • Channel: Print
  • Target Demographic: Parents
  • Country : US
  • Language : Spanish
  • Concept : Rebellion
  • Brand:Settletra


  • Do your kids:
  • Have discipline problems?
  • Trouble paying attention in school?
  • Trouble getting along with others?
  • Maybe its time to find out how Settletra can help


  • Structured data that describes the attributes of an information package (Taylor, 1994)
  • Helpsmanage & shareinformation
  • Helpsfindinformation

Metadata a refresher
Types of metadata Structural Administrative Descriptive Taxonomy can apply in any category What is it? What is it about? What is it called? When was it created? Who owns it? Whats its status? What parts does it have?
Types of metadata Structural Administrative Descriptive Taxonomy can apply in any category Subject Title Document type Description Date created File type Review date Publication Status Is_Part _Of Requires Parent_Object

  • Taxonomy is applied to content as metadata
    • Describes
      • Is-ness
      • About-ness

Taxonomy as metadata Press Release Item Types Press PressReleases Logos Press Kits Taxonomy IRESSA Brands ELAVIL IRESSA Is about Is a Date created May-15-2009 Document name IRESSA Recommended... Item Type Metadata Document type Document
Uses for Metadata

  • Identification
  • Discovery
  • Structural
  • Rights
  • Product

Identification

  • Globally unique identifiers
  • Single or federated registries (directories)
  • Choice of what to identify
    • Abstract piece of IP
    • Manifestation of work (US version, German version, etc.)
    • Individual copy
  • General or content type-specific
  • Examples:
    • Book publishing: ISBN, ISTC
    • Journal publishing: ISSN
    • Video content: ISAN
    • Music: ISWC, ISRC, ISMN, GRid
    • Broadcast industry: UMID
    • All content types: DOI, Handle
    • Internet resources: URL, URN, URI

Discovery

  • Enable searching, querying, categorization
  • Basic identifying information
  • Descriptive metadata
  • Examples
    • Identifying information from Dublin Core schema: Title Creator Publisher Format
    • Descriptive information from Dublin Core schema: Subject Description

Discovery Standards

  • Basic bibliographic: Dublin Core
  • Books: ONIX
  • Magazine articles (print & online): PRISM
  • Journal articles (online): CrossRef
  • News stories: NewsML
  • Educational content: LOM
  • Images: TIFF, DIG35
  • Music: MUZE, AMG

Structural

  • Describe logical structure of content
    • Ideally without defining output appearance
  • Allow content to be fed to predefined templates for production & distribution
  • Replacements for old markup languages(TROFF, SCRIPT, etc.)
  • Examples
    • From NITF tagset: [sic]

Structural Standards

  • Web pages: XHTML HTML that can be validated through an XML parser
  • News stories: NITF
  • E-books: IDPF OPS/OPF
  • Technical documentation (book form): DocBook
  • Technical documentation (modular): DITA
  • Multimedia: SMIL/MMS

Rights

  • Establish rights that can be conveyed to user
  • Define rights that you own or can grant
  • Examples
    • From ODRL 1.1 Permission Elements: display print play execute sell lend give lease modify excerpt

Rights Standards

  • DRM-based distribution: ODRL, MPEG REL/XrML
  • Website indexing/search: ACAP
  • Image licensing: PLUS
  • Downstream reuse rights: Creative Commons

Product

  • Describe characteristics of product
    • Physical or appearance
    • Marketing
  • Allow separation of content from product
  • Examples
    • From ONIX:
  • Product metadata standard: ONIX (books)

The Holy Grail
Taxonomy & Metadata Governance & Content Strategy submission retrieval
Why stop there?
Perhaps we can do better

  • This is ALL just metadata
  • Different users can focus on what is valuable to them:
    • Price
    • Optical zoom
    • Megapixels
  • The good news: this used to cost a fortune.Not anymore.

Conclusion

  • Managing large and changing document repositories is challenging.
  • File stores and databases alone cannot provide for genuine findability.
  • Semantically rich metadata can provide for findability through search.
  • Shifts in the costs of faceted navigation make eCommerce-style searching a real option within the enterprise.

