taxonomies in electronic records management systems
DESCRIPTION
Taxonomies in Electronic Records Management Systems. May 21, 2002. Terms. Controlled Vocabulary A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms. Thesaurus - PowerPoint PPT PresentationTRANSCRIPT
Taxonomies in Electronic Records Management Systems
May 21, 2002
2
Terms Controlled Vocabulary
» A collection of preferred terms that indicates which terms are preferred and which are variants of the preferred terms.
Thesaurus» A type of controlled vocabulary that shows the hierarchical (parent-child),
associative (related terms) and equivalent (synonymous) relationships among terms.
Taxonomy» Hierarchical classification of elements within a domain. One type of
taxonomy is a File Plan.
Ontology» A hierarchical classification that is more complex and subtle than a taxonomy.
It explains relationships between objects by mapping relationships, such as “part of” or “located in”. Also called knowledge mapping.
3
Why Use a Taxonomy
Management of Records
» Structure for Classification
» Navigational Tool
» Reduced Burden on Users
More Consistent Than Humans
Sheer Volume of Information
» Document Level Vs Folder Level
» High Speed Processing
More Than 80% of All Information Is Unstructured
4
Example: FirstGov.gov
5
Example: File Plan
6
Example: Visual Map
7
How Do Taxonomy Tools Work? General
»Understand Relevancy to Categories
»Create Knowledge Clusters
»Enable Types to Be Combined
Training Based»Require Representative Samples
» Identify Patterns
»Create Statistical Models
Rule Based»Process Rules Devised and Hand-coded by Humans
»Contain Keywords and Logical Relationships
Linguistics Based»Use Algorithms
»Understand Linguistic and Semantic Elements
8
Taxonomy Uses in Electronic Recordkeeping Systems
Auto Categorization
Searching and Browsing
File Plan Creation and Maintenance
9
Auto Categorization
10
Auto Categorization Case Studies
National Archives and Records Administration» 12,000 Documents
» Granular File Plan
» Single Repository
University of Nevada for Department of Energy» 150,000 Documents
» 99.5% Accuracy in Identifying Non Records
» Less Than 1 in 20 Documents Required Human Intervention
Department of Education» 90,000 Documents
» Accuracy Enhanced by Narrowing Categories
» 100% Accuracy Categorizing to Retention Periods
11
Auto Categorization Anecdotes
Factiva» 1500 Topics
» Target of 45% Accuracy
» Achieving 60-80% Accuracy
Gartner Group Findings» Typical Accuracy Is 80-95% When Broad Non-overlapping
Categories Are Used
One Vendor’s Literature» 75-80% Accuracy Is Typical
12
Common Themes
Mutually Exclusive Categories Increase Accuracy
Big Bucket Theory
Easy Retrieval Vs Easy Filing
Stove Piping Vs Open System
Human Effort Necessary» Select Training Set
» Quality Control
» Fine Tune
13
Comments on Accuracy
No Case Study Achieved 100% in Categorization
Accuracy Rises With Fewer Categories
Short Documents Can Have Too Little Content
Long Documents Can Cover Too Many Topics
Fly in Ointment» Accuracy Diminishes at Each Level Down in the File Plan
» In a System Where Auto Categorization Is 80% Accurate, the Expected Accuracy for the Proper Assignment of a Document At the Third Level Down Would Be About 51%
Critical Element - Records Management» Control of File Plan
» Understanding of Technology
14
Searching and Browsing
15
Searching and Browsing
The only thing harder than finding something is finding it again.
Searching» Looking For Something You Know About
» Generally Easy in Electronic Documents
» The Document Comes to You
Browsing» Looking Through a Collection to See What Is There
» Generally Difficult in Electronic Documents
» You Go to the Document(s)
Contextual Browsing» Accessing Other Relevant Content Related to the Content Being Viewed.
» Other Objects May Not Have Been Grouped Together
» Prospective Navigation
16
The Beauty of a Taxonomy Tool
Delivers Information You Did Not Know You Had
Identifies Unknown Associations Between Documents
Summarizes or Abstracts Content
Uses Visual Maps
Does Not Require User to Know Location of the Information
17
Visual Map
18
Visual Map Drilled to Document Level
19
File Plan Creation
20
File Plan Creation Using a Taxonomy Tool
Information Architecture Based on Content
Electronically Generated File Plan
“It is possible to produce affinities through automatic categorization without a pre-existing taxonomy. These categories can then be edited and renamed. Once categories have been created by humans, documents and other information objects can be automatically assigned to those categories.”
Gartner Group
21
Feasibility of Using Taxonomy Software for File Plan Creation
Feasible to Develop a True Records Management File Plan Using Software
Feasible to Populate an RMA With Electronically Generated File Plan
Feasible to Compile a Quantity of Quality Documents to Mine for Creating the Taxonomy
22
Then Why Hasn’t It Been Done?
Existing Retention Schedules Not Built This Way» Map Required File Plan Elements to Appropriate Retention Classification
OR
» Re-Engineer Retention Schedules
Usability for File Plan Development Untested» Statistically Correct
BUT
» May Not Appear Natural to Users
23
Scenario
Humans Create Top Level of File Plan
Software Mines Data - Free Categorization
Software Forms Category Patterns
Humans Use Results to Create One Subsidiary Level in File Plan
Humans Associate Retention Schedules at Secondary Level of File Plan
Software Auto Categorizes Documents Into File Plan
24
NoPattern
Audit
TestFacility
ReliabilityReport
Budget
BudgetCall
MeetingRoom
Change
LunchInvite
Cate-gories
Budget Quality Control Test Facility Reliability
RetentionSchedule
1 year
RetentionSchedule2 years
RetentionSchedule10 years
RetentionSchedule
Permanent
BudgetCorrespondence
Files
Quality ControlReports
Test Facility LogBooks
FormalReliabilityReports
Budget PolicyFles
AdministrativeMeeting
Files
Records Text
ReviewFolder
Records
ReliabilityReports
QualityControlReports
AdministrativeMeeting
Reminders
Personal
ConfidenceBelow
Threshhold
BudgetCorrespondence
Top Level of File Plan
Formation of Clusters
Secondary Levelof Taxonomywith Retention
Schedules
Auto Categorization
Records ManagementStaff
Records ManagementStaff
Software
Software
Resume
Hybrid Solution
25
Conclusion
Use for Support – Not Full Automation
Ongoing Human Commitment to Plan, Create, and Maintain
Consider Portfolio Approach – Mixing Products
Very Effective for Searching and Browsing
Capture and Search Legacy Documents That Otherwise Would Be Too Costly to Process
Integrate With Document Imaging System
Potential Is Huge
26
Resources
27
Web Sites With Energy Glossaries/Thesauri
www.eia.doe.gov
http://www.nerc.com/glossary/
http://www.eren.doe.gov/consumerinfo/glossary/
http://www.naruc.org/resources/glossary.shtml
www.powermarketers.com/glossary.htm
http://hilt.cdlr.strath.ac.uk/Sources/thesauri.html
28
Cool Stuff Thesaurus Management Tools
»www.multites.com
»www.synaptica.com
»www.pmei.com/lexico.html
Books
»Content Management Bible, Bob Boiko
»Information Architecture for the World Wide Web, Louis Rosenfeld & Peter Morville
Free Search Engine for Your Web Site
»http://www.freefind.com/
29
More Cool Stuff DOE Related Use of Taxonomy Tool for Searching
and Browsing» www.lsnnet.gov
Controlled Vocabularies, Thesauri and Classification Systems Available on the Web
» www.lub.lu.se/metadata/subject-help.html
» http://sky.fit.qut.edu.au/~middletm//cont_voc.html
Information Architecture White Papers and Publications
» http://argus-acia.com/index.html
Virtual Library» www.vlib.org/overview.html
30
THANK YOU!
Angela Tayfun, CRM
AT&T Government Solutions, Inc.
1900 Gallows Road
Vienna, VA 22182
Ph: 703.506.5562
E-mail: [email protected]