3 25 11 term store best practices

43
Enterprise Class Taxonomy Management and Auto- classification - Leveraging the Term Store for Organizational Metadata to Close Information and Records Management Capability Gaps in SharePoint The Term Store Management Compan Don Miller is a senior executive at Concept Searching with over 20 years experience in knowledge management. He is a frequent speaker about Records Management and Information Architecture problems and solutions. Don has been a guest speaker at Taxonomy Boot Camp, Management Electronic Records and numerous SharePoint events about information organization and records management. [email protected] 408-828-3400

Upload: puckmiller3

Post on 14-May-2015

6.370 views

Category:

Technology


2 download

DESCRIPTION

Overview of how to improve records management and findability using SharePoint 2010, EMM, Term Store and Content Types and ConceptClassifier for SharePoint.

TRANSCRIPT

Page 1: 3 25 11 Term Store Best Practices

Enterprise Class Taxonomy Management and Auto-classification -Leveraging the Term Store for Organizational Metadata to Close

Information and Records Management Capability Gaps in SharePoint

The Term Store Management Company

Don Miller is a senior executive at Concept Searching with over 20 years experience in knowledge management. He is a frequent speaker about Records Management and Information Architecture problems and solutions. Don has been a guest speaker at Taxonomy Boot Camp, Management Electronic Records and numerous SharePoint events about information organization and records management.

[email protected]

Page 2: 3 25 11 Term Store Best Practices

Agenda

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Introductions Company Overview, Unique Differentiator, Use Cases The cost and ROI of metadata for Records Management and Findability SharePoint 2010

Enterprise Metadata Management Service Term Store Basics

Enterprise Taxonomy and Auto Classification Product Screen Shots Demo of conceptClassifier for SharePoint 2010

Show native integration into SharePoint 2010 for Records Management and automatic content type updating

Dynamic guided navigation within the search platform Show enterprise Taxonomy Management and auto-classification capabilities

Building out new Taxonomies/Term Sets Term Store Management Enterprise Taxonomy Management

Page 3: 3 25 11 Term Store Best Practices

Company founded in 2002 Product launched in 2003 Focus on management of structured and unstructured

information

Technology Automatic concept identification, content tagging, auto-

classification, taxonomy management Only statistical vendor that can extract conceptual metadata

2009, 2010, 2011 ‘100 Companies that Matter in KM’ (KM World Magazine)

KMWorld ‘Trend Setting Product’ of 2009, 2010

Locations: US, UK, & South Africa

Client base: Fortune 500/1000 organizations

Managed Partner under Microsoft global ISV Program - “go to partner” for SharePoint 2010 Term Store Management

Microsoft Enterprise Search ISV , FAST Partner

Enterprise Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier

Concept Searching, Inc.

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 4: 3 25 11 Term Store Best Practices

ConceptSearching’s unique statistical concept identification underpins all technologies

Multi word suggestion is explicitly more valuable than single term suggestion algorithms

Automated Multi Word Term Suggestions for Term Store

conceptClassifier will generate conceptual metadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords

• Metadata can be used by any search engine index or any application/process that uses metadata

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Concept Searching provides Automatic Concept Term Extraction

Triple

BaseballThree

Heart

OrganCenter

Bypass

HighwayAvoid

Page 5: 3 25 11 Term Store Best Practices

Enterprise Class Product Suite - Deployment Case Studies

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

USAF Medical Service Global Deployments 70,000 Users

LexisNexis FAST Multi User Distributed Taxonomy Management

Architecture

Xerox E Discovery 150 Million Documents

Market Research FAST WWW

Logica FAST 40,000 Users

CAL ISO & MIDWEST ISO FAST WWW

Booz and Company Taxonomy Management

Emerson Climate Technologies Enterprise Deployment

BP Enterprise Deployment

Parsons Brinckerhoff FAST Global Deployment 40,000

CPSC Enterprise wide FAST Enterprise Deployment

National Transportation Safety Board FAST Enterprise Deployment

Health and Human Services FAST Enterprise Deployment

Southern Union Group FAST

Page 6: 3 25 11 Term Store Best Practices

What Is poor Metadata (Lack of structure) costing you?

•Identify any type of organizationally defined privacy data

•Combines pattern matching with associated vocabulary

•Automatic Content Type updating enabling workflows and rights management

Data Privacy Protection

•Average cost per exposed record is $197 and ranges from $90-$305 per record

•70% of breaches are due to a mistake or malicious intent by an organization’s own staff

•Average cost runs from $225K to $35M

•Eliminate manual tagging & replace with automatic identification of multi-word concepts

•Provide guided navigation via the taxonomy structure (i.e. concepts)

•Go beyond dynamic clustering with conceptual clustering based on the taxonomies

Search

•“It’s not about better search”

•Less than 50% of content is correctly indexed, meta tagged or efficiently searchable

•85% of relevant documents are never retrieved in search

•Taxonomy navigation is 36% - 48% faster

•Savings 2.5 hours per user per day

•Eliminate inconsistent end user tagging

•Automatically declare documents of record based on vocabulary and retention codes

•Automatically change the Content Type and route to the Records Management repository

Records Management

•67% of data loss in Records Management is due to end user error

•It costs and organization $180 per document to recreate it when it is not tagged correctly and cannot be found

•Savings of $4.00 - $7.04 per record by eliminating manual tagging

•Ensures compliance and reduces potential litigation exposures

•Eliminate duplicate documents

•Identify privacy data exposures

•Identify and declare records that were not previously identified

•Notify users of high value content

•Migrating required content to a structure

Pre Migration/Collaboration

•60% of stored documents are obsolete

•50% of documents are duplicates

•Requires resources to identify what should/not be migrated

•Reduces migration costs

•Ensures compliance and protection of content assets

•Easy end user updates

Problem

Solution

Benefit

Page 7: 3 25 11 Term Store Best Practices

A manual metadata approach will fail 95%+ of the time

Issue Organizational ImpactInconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently

searchable rendering it unusable to the organization (IDC)

Subjective Highly trained Information Specialists will agree on meta tags between 33% - 50% of the time. (C. Cleverdon)

Cumbersome - Expensive Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers)

Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier)

No perceived value for end user What’s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non conformance to policies.

What have you seen Metadata will continue to be a problem due to inconsistent human behavior

The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage

knowledge assets in alignment with the unique corporate knowledge infrastructure.

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 8: 3 25 11 Term Store Best Practices

Create enterprise automated metadata framework/model Average return on investment minimum of 38%

and runs as high as 600% (IDC)

Apply consistent meaningful metadata to enterprise content Incorrect meta tags costs an organization $2,500

per user per year – in addition potential costs for non-compliance (IDC)

Guide users to relevant content with taxonomy navigation Savings of $8,965 per year per user based on an

$80K salary (Chen & Dumais) 100% “Recall” of content, 35% Faster access to

content “Precision”

Use automatic conceptual metadata generation to improve Records Management Eliminate inconsistent end user tagging at $4-$7

per record (Hoovers) Improve compliance processes, eliminate

potential privacy exposures

conceptClassifer for SharePoint 2010 and the Enterprise, provides an automated approach to apply metadata and content types for immediate ROI and business value

1. Align,Model and Validate

2. Automate Tagging

3. Findability

4. Business Processes – Alerts & WF

5. Records Management and PII

6. Life Cycle Managemen

t

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 9: 3 25 11 Term Store Best Practices

Microsoft’s approach to solving the metadata problem for Records

Management, Governance Policies, Sensitive Information Removal and

Findability:

Content Types, The Term Store and Enterprise Managed

Metadata Services 04/12/2023

Page 10: 3 25 11 Term Store Best Practices

A Content Types is a means to apply structure to unstructured or structured content with in SharePoint. Content Types inherit their parent content types.

This is usually a combination of a term or terms from a single or multiple term sets.

Terms are metadata and metadata is information about information. Terms can also include governance and retention code policies and also can be

for the sole purpose of improved findability However, it is best to align Content Types with business goals and business use

cases.

What is a content type

Page 11: 3 25 11 Term Store Best Practices

Introducing EMM, The Term Store and Term Store Management Definitions

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Subscription Service

Content Type Hub

Term Store

Term Store Management

Auto Classification

Content Type Updating

SharePoint 2010 Farm

Site Collection

Records Library

Concept Classifier for SharePoint 2010

SharePoint 2010 Enterprise Managed Metadata Service

Page 12: 3 25 11 Term Store Best Practices

Managed Metadata Service Manages Enterprise Content Types

via the Content Type Hub Manages Term Store Term Sets (taxonomies) and terms

can be shared across multiple SharePoint site collections

Multiple manage metadata services can be created

Enables search filtering Two types of terms:

Managed terms – pre-defined by an enterprise administrator and may be hierarchical. Surfaced in the "managed metadata" column type

Managed keywords – non-hierarchical words or phrases that have been added to SharePoint 2010 items by users (folksonomy)

The Managed Metadata Service

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

30,000 Terms per Term Set (1 Taxonomy)

1,000 Term Sets

Tested to 1,000,000 Preferred Terms

Enterprise Managed Metadata Service

Page 13: 3 25 11 Term Store Best Practices

SharePoint 2010 Managed Metadata Service Considerations

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

SharePoint 2010 Element Comments

Site Collection/Site Structure Can be organized by a hierarchical taxonomy structure

Document Library Structure Can be organized by a hierarchical taxonomy structure

Columns Where terms are applied to content in Document Libraries and Lists

Term A metadata value. Metadata is information about information.

Term Set Hierarchical metadata with values

Managed Metadata SP 2010’s ability to manage terms and term sets - Hierarchical

Keywords Allows to add metadata by end user, not recommended for enterprise use – Flat List Only!

Content Types Ability to use managed metadata and associate with different columns and different term sets with a specific content type for a specific business requirement, governance (i.e. PII, Retention Code, SOX) or Findability (facets for navigation)

Page 14: 3 25 11 Term Store Best Practices

File Share or Directory Structures

Database fields/tablesExcel spreadsheet File Plan – Especially if

using for records management

Search Analytics Topic MapsCard Sorting – (Open &

Closed)Subject Matter Experts

Free industry standard taxonomies Wikipedia – “Industry

classification” or “Global Industry

Classification Standard”

WWW directory structureTag Clouds – Flickr,

Del.icio.us, Technorati,ConceptSearching – Free

Taxonomies Hard Core - ANSI/NISO

Z39.19-2005

What/where do I find good examples to use to build out term sets and terms

Page 15: 3 25 11 Term Store Best Practices

conceptClassifier for SharePoint is the only native Term Store Management tool for 2010

Term Set

Child Term

Parent Term

Grand Child Term

A content type can contain one or many taxonomies based on specific business user requirement. The values can shown as columns or can be hidden from users for administrative or governance purposes only.

Build term sets/taxonomies here in SharePoint 2010 EMM. Plan for 30,000 values

Page 16: 3 25 11 Term Store Best Practices

Traditional manual approach is subjective, cumbersome and overwhelming

End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.

Page 17: 3 25 11 Term Store Best Practices

ConceptClassifier for SharePoint 2010

An automated solution for applying metadata and providing term store management to enhance SharePoint 2010 capabilities for

Records Management, Governance Policies, Rights Management, Sensitive Information

Removal and Findability.04/12/2023

Page 18: 3 25 11 Term Store Best Practices

Native integration into Term Store

No Service Pack Updates, no custom code. ConceptClassifier is a native integration.

No custom property types Every item is synchronized with term store and is a part of managed metadata service. All search features work natively as they should. No custom search property values which require custom code updates and additional custom search controls. ConceptClassifier is a native integration.

Why do we work with native term store natively

Because it is the natural place that you should store metadata if you are driving economies of scale by leveraging Microsoft stack. That is Microsoft’s road map for metadata management.

Easy Upgrade If you want to go back to a pure manual application, there is no code rewrite. ConceptClassifier is a native integration. You just unplug and you are back to native.

04/12/2023

conceptClassifier provides a native integration into Term Store

Page 19: 3 25 11 Term Store Best Practices

Multi User Distributed Branch and Term Support for Enterprise

Native Term Store Integration for SharePoint 2010

Accelerate building out taxonomies by 75% with automatic Term/Clue Suggestion

Enables the ability for information architects to build model and validate

Automatic Term Boosting for FAST/Search Platforms

Pragmatic Ontology Features for subject matter experts (You don’t need to be a librarian)

Broad to Narrow Preferred Term Non preferred terms Poly hierarchies – Not supported in Term Store Relations – Not supported in Term Store

Enterprise Taxonomy Management and Auto-classification

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 20: 3 25 11 Term Store Best Practices

conceptClassifier for SharePoint 2010

Automatically applies Metadata

Automatically Applies Content Types

Auto Applies Retention Code Policies

Automatically applies Windows Rights Management Policies

Automatic Term Boosting for FAST

Pulls hierarchy directly from Term Store, therefore updates are immediate and accurate for guided taxonomy navigation in FAST

conceptClassifier for SharePoint 2010 drives immediate value for end users for Search, Records Management and Sensitive Information Removal

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 21: 3 25 11 Term Store Best Practices

conceptClassifier for FAST Search

Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results

Enables import of FAST Entities into the conceptClassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature

Runs natively as a FAST Pipeline Stage eliminating integration and customization issues

Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies

Improves faceted search results as facets are based on concepts aligned with the taxonomy

Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)

Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching

Removes documents from search results that are confidential/sensitive through automatic Content Type updating and routing to secure server

Automatically tags content with both vocabulary and retention codes and respects SharePoint security that could prevent access to the document once it has been declared a record

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 22: 3 25 11 Term Store Best Practices

Product Screen Shots

04/12/2023

Page 23: 3 25 11 Term Store Best Practices

Traditional manual approach is subjective, cumbersome and ineffective

End user must select values from multiple term sets. Up to 30,000 values per term set and 1,000 term sets per term store. Manual approach is impractical.

Page 24: 3 25 11 Term Store Best Practices

An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability

Metadata is automatically applied to content by ConceptClassifier via TaxonomyManager. Content Type Updater can take it a step further and can modify content type to redirect document/object to a different content type or migrate it to another site collection or document library. In this example the documents are being changed from document content type to PII or Records Cetner Content Type.

Page 25: 3 25 11 Term Store Best Practices

Term Store Management is provided by Taxonomy Manager and ConceptClassifier

TaxonomyManager is an intuitive and elegant to tool to manage how and when term sets are applied within SharePoint 2010 and what new terms to add to the term store

Deep capabilities to build out rules classification approaches including: standard term, phonetics, metadata, class ID, language, case sensitive, regular expression and boosting

Page 26: 3 25 11 Term Store Best Practices

An automated approach ensures accurate Records Management, Sensitive Information Removal and improved Search/Findability

The documents with 10 in front of them have had their content types updated. In this example the documents are being changed from document content type to PII or Records Cetner Content Type. They could have also been moved to a different folder if that was the desired outcome.

Page 27: 3 25 11 Term Store Best Practices

conceptClassifier for 2010 Product Suite provides intuitive guided navigation for FAST

Multi value select with in a term set is the single fastest approach you can provide for end users to get access to the correct content. It is just like picking values when you are on Best Buy or Amazon but it is with your personalized corporate term set vocabulary.

conceptClassifier for FAST and SharePoint 2010 Search

Page 28: 3 25 11 Term Store Best Practices

• Set proper expectations– Select a business unit to begin term set building and classification approaches (Manual vs.

Automated) within SharePoint– Manual – No more than 3 tags– Manage scope, don’t try to boil ocean

• Focus on value– Focus on the key constituents that you can show immediate value– Search or Findability– Records Management

• Focus on Use Cases– Understand how and why they will use term sets and how they will apply metadata

• Define Governance (See partner presentation from PPC on governance)– Roles, responsibilities, policies, and procedures

• Reconfirm expectations, it is a Marathon not a Sprint– Taxonomy development is an iterative and on-going effort– It changes and evolves just like your content and terminology– Add new business units or users after successful feedback from initial term set sponsors

28How To Guide for Taxonomies in SharePoint28

Best practices for Term Store Development and applying metadata in SharePoint 2010 for Records Management and Findability

Page 29: 3 25 11 Term Store Best Practices

Demo

04/12/2023

Page 30: 3 25 11 Term Store Best Practices

Differentiator Value

Enterprise Product Suite for Metadata Management that includes: Taxonomy Management (TM) , Auto Classification (AC), and Search

Only product to use TM, AC and Search to test, validate and build out meaningful business taxonomies and term sets for records management, sensitive information removal and improved search

Native Integration into SharePoint 2010 for Term Store Management

No custom code, no additional user controls, easy installation and upgrades with Microsoft SharePoint 2010

Positioned for success •Privately funded•Strategic Microsoft Partner for Term Store Management•Leverage partners for deployment and domain expertise across the world•Growing Fortune 2000 Customer Base

In Summary we are an Enterprise Metadata Management Product Suite

Page 31: 3 25 11 Term Store Best Practices

Thank youDon Miller 408-828-3400

[email protected]

Page 32: 3 25 11 Term Store Best Practices

Planning

04/12/2023

Page 33: 3 25 11 Term Store Best Practices

Determine Key Term Sets Think about audience, business needs, content types Focus on immediate needs, build out term set Ask for immediate feedback

Governance for Tagging Vision and Executive Sponsorship Roles and responsibilities – Committee of one Policies and procedures – Committee of one Adoptability

Communication – Mandated process? Education and Training – How much time to ensure adoption

Maximum of 3-5 manual tags Internal Promotion

Tag off - Total number of tags per business unit or group Show total number of retention code policies as a before and after

Showing ROI The Stop Watch Test Governance Applications Executive Feedback – Tuning exercise

Initial Planning:

Page 34: 3 25 11 Term Store Best Practices

Method Definition Examples

Records Management

Retention Code Policies employment, staffing, training

Subject-oriented Information categorized by subject or topic Instantive - each child category is an instance of the parent category Partitive - each child category is a part of the parent category

water pollution, soil pollution, air pollution

Functional Information categorized by the process to which it relates

employment, staffing, training

Organizational Information categorized by corporate departments or business entities

Human Resources, Marketing, Accounting, Research

Document Type Information categorized by the type of document presentations, expense reports, press releases

Location Information categorized by the location where it originated or was conceived

US State, Office locations

Product or Customer

Information categorized by the product or customer it was developed for

Electronics > TVs, DVD Players, Computers

Categorization Schemas

Hard

es

t

Easi

est

34 How To Guide for Taxonomies in SharePoint

Page 35: 3 25 11 Term Store Best Practices

Records Management Use Cases

Page 36: 3 25 11 Term Store Best Practices

www.conceptsearching.com

Lack of Information Transparency Government and Private Sector directives to tag content for retrieval Untagged Data Assets = Untapped Resources Time Gap between Information Requests and Discovery is Directly Proportional to

Volume of Data Assets

Non-Compliance with Records Management Policies Sarbanes-Oxley and Government RM Retention Schedules Data Stored in Wrong Location Information not Preserved in Accordance with Regulatory Guidelines

Increasing Volume of Unplanned Data Exposure Events Privacy Act Program (PII), Protected Health Information (PHI), HIPAA, Payment Card

Industry (PCI), etc… Organizational Confidential and Sensitive Information

Problems

Information and Records Management Capability Gaps

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 37: 3 25 11 Term Store Best Practices

Why is this Difficult?

Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems

www.conceptsearching.com

Metadata

Tagging

Records Retention Code

Access Rights

Document Library 1 Document Library 2

Document Library 3 Document Library 4

Server Content with Appropriate Metadata, Retention Codes, and Rights Management

Templates

Human Factors

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 38: 3 25 11 Term Store Best Practices

www.conceptsearching.com

Physical or Cognitive Properties of an Individual or Human Social Behavior which Influence Functioning of Technological Systems

Limiting Factor = Human Behavior

Metadata

Tagging

Records Retention Code

Access Rights

Document Library 1 Document Library 2

Document Library 3 Document Library 4

Server Content with Appropriate Metadata, Retention Codes, and Rights Management

Templates

Why is this Difficult?

Human Factors

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 39: 3 25 11 Term Store Best Practices

How do Organization’s Typically Address These Capability Gaps

www.conceptsearching.com

Customize system interface to force manual application of metadata Pros: data assets now have metadata Cons: high customization costs, increase in end-user labor costs, less end-user

productivity, non-standardized application of metadata across enterprise

Hire temporary staff to add metadata to data assets Pros: data assets now have metadata Cons: temporary staff = $$$$$ and results in non-standardized tagging

Acknowledge that it is a problem and do nothing

Alternatives

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 40: 3 25 11 Term Store Best Practices

www.conceptsearching.com

Records Retention

Code Tagging

Automatic Content

Type Updating

Records Management

Confidential Secure Data

CollaborationPortal

Concept Classifier

Security

Appropriate Storage

& Preservati

on

Increase Information

Retrieval Precision for Search

Semantic Metadata Tagging

Metadata, Auto-classification, Taxonomies Drive Business Value

Tagged for Search

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 41: 3 25 11 Term Store Best Practices

How does Concept Searching Close IM and RM Capability Gaps

www.conceptsearching.com

Uses Taxonomy Manager to create and manage organizational taxonomies, ontologies, and metadata environment;

Employs conceptClassifier for SharePoint as an Automated Metadata Population Service;

Applies content types base on metadata; Uses content types derived from metadata to drive individual and group

access to data assets using inherent SharePoint Security; Uses content types derived from metadata to drive migration of data

assets to proper document libraries where RMS templates are automatically applied to restrict data asset usage.

Leveraging Metadata as an Enabling Asset

Concept Searching • Don Miller • (408) 828-3400 • [email protected]

Page 42: 3 25 11 Term Store Best Practices

Concept Searching in MOSS and Windows Server

Page 43: 3 25 11 Term Store Best Practices

SharePoint Server Security and AD-RMS in MOSS