text analytics workshop tom reamy chief knowledge architect kaps group knowledge architecture...

Post on 17-Jan-2016

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Text AnalyticsWorkshop

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

2

Agenda

Introduction – Elements & Infrastructure Platform– Semantics not technology– Infrastructure not project– Value of Text Analytics

Evaluating Software– Two Phase Process– Designing the Team and Content Structures

Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications

– Integration with Search and ECM– Platform for Information Applications

3

KAPS Group: General

Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services:

– Taxonomy/Text Analytics development, consulting, customization– Technology Consulting – Search, CMS, Portals, etc.– Evaluation of Enterprise Search, Text Analytics– Metadata standards and implementation– Knowledge Management: Collaboration, Expertise, e-learning– Applied Theory – Faceted taxonomies, complexity theory, natural

categories

4

Introduction to Text AnalyticsSemantic Infrastructure - Elements Taxonomy – Thesauri, Controlled Vocabulary Metadata – Standard (Dublin Core) and Facets Basic Text Analytics

– Categorization – Document Topics – Aboutness– Entity Extraction – noun phrases, feed facets– Summarization – beyond snippets

Advanced Text Analytics– Fact extraction – ontologies– Sentiment Analysis – good, bad, and ugly

What is in a Name – text analytics or ?

5

Introduction to Text AnalyticsTaxonomy Thesauri, Controlled Vocabulary

– Resources to build on– Indexing not categorization

Taxonomy – Foundation for Categorization– Browse – classification scheme– Formal – Is-Child-Of, Is-Part-Of– Large taxonomies - MeSH – indexing all topics– Small is better – for categorization and faceted navigation

6

Introduction to Text AnalyticsMetadata Metadata standards – Dublin Core - Mostly syntactic not semantic

– Description – static or dynamic (summarization)– Semantic – keywords – very poor performance

Best Bets – high level categorization-search– Human judgments

Audience – mixed results– Role, function, expertise, information behaviors

Facets – classes of metadata– Standard - People, Organization, Document type-purpose– Specialized – methods, materials, products

7

Introduction to Text AnalyticsText Analytics Categorization

– Multiple techniques – examples, terms, Boolean– Built on a taxonomy

Entity Extraction– Catalogs with variants, rule based dynamic

Summarization– Rules – find sentences in a document

Fact Extraction– Relationships of entities – people-organizations-activities

Sentiment Analysis– Rules – adjectives & adverbs not nouns

8

Introduction to Text AnalyticsText Analytics Why Text Analytics?

– Enterprise search has failed to live up to its potential– Enterprise Content management has failed to live up to its potential– Taxonomy has failed to live up to its potential– Adding metadata, especially keywords has not worked

What is missing?– Intelligence – human level categorization, conceptualization– Infrastructure – Integrated solutions not technology, software

Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

9

Text Analytics Platform4 Basic Contexts Ideas – Content Structure

– Language and Mind of your organization– Applications - exchange meaning, not data

People – Company Structure– Communities, Users– Central team - establish standards, facilitate

Activities – Business processes and procedures Technology

– CMS, Search, portals, taxonomy tools– Applications – BI, CI, Text Mining

10

Text Analytics Platform: The start and foundationKnowledge Architecture Audit Knowledge Map - Understand what you have, what you

are, what you want– The foundation of the foundation

Contextual interviews, content analysis, surveys, focus groups, ethnographic studies

Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories

Natural level categories mapped to communities, activities• Novice prefer higher levels• Balance of informative and distinctiveness

Living, breathing, evolving foundation is the goal

11

Text Analytics Platform – BenefitsIDC White Paper Time Wasted

– Reformat information - $5.7 million per 1,000 per year– Not finding information - $5.3 million per 1,000– Recreating content - $4.5 Million per 1,000

Small Percent Gain = large savings– 1% - $10 million– 5% - $50 million– 10% - $100 million

12

Text Analytics Platform – Benefits

Findability within and outside the enterprise– Savings per year - $millions

Rescue enterprise search and ECM projects– Add semantics to search

Clean up enterprise content– Duplication and accurate categorization

Improve the quality of information access– Finding the right information can save millions

Build smarter applications – Social networking, locate expertise within the enterprise

13

Text Analytics Platform – Benefits

Understand your customers– What they are talking about and how they feel about it

Empower your employees – Not only more time, but they work smarter

Understand your competitors– What they are working on, talking about– Combine unstructured content and rich data sources – more

intelligent analysis

14

Text Analytics Platform – Dangers

Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library

– Need all of the above and taxonomists+

Bad Design:– Start with bad taxonomy– Wrong taxonomy – too big or two flat

Bad Categorization / Entity Extraction– Right kind of experience

15

Resources

Books– Women, Fire, and Dangerous Things

• George Lakoff– Knowledge, Concepts, and Categories

• Koen Lamberts and David Shanks– The Stuff of Thought – Steven Pinker

Web Sites– Text Analytics News -

http://social.textanalyticsnews.com/index.php

– Text Analytics Wiki - http://textanalytics.wikidot.com/

16

Resources

Blogs– SAS- Manya Mayes – Chief Strategist -

http://blogs.sas.com/text-mining/

Web Sites – Taxonomy Community of Practice:

http://finance.groups.yahoo.com/group/TaxoCoP/

– Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

Questions?

Tom Reamytomr@kapsgroup.com

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

top related