text analytics workshop tom reamy chief knowledge architect kaps group knowledge architecture...

17
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: claire-pierce

Post on 17-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text AnalyticsWorkshop

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2

Agenda

Introduction – Elements & Infrastructure Platform– Semantics not technology– Infrastructure not project– Value of Text Analytics

Evaluating Software– Two Phase Process– Designing the Team and Content Structures

Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications

– Integration with Search and ECM– Platform for Information Applications

Page 3: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

3

KAPS Group: General

Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services:

– Taxonomy/Text Analytics development, consulting, customization– Technology Consulting – Search, CMS, Portals, etc.– Evaluation of Enterprise Search, Text Analytics– Metadata standards and implementation– Knowledge Management: Collaboration, Expertise, e-learning– Applied Theory – Faceted taxonomies, complexity theory, natural

categories

Page 4: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

4

Introduction to Text AnalyticsSemantic Infrastructure - Elements Taxonomy – Thesauri, Controlled Vocabulary Metadata – Standard (Dublin Core) and Facets Basic Text Analytics

– Categorization – Document Topics – Aboutness– Entity Extraction – noun phrases, feed facets– Summarization – beyond snippets

Advanced Text Analytics– Fact extraction – ontologies– Sentiment Analysis – good, bad, and ugly

What is in a Name – text analytics or ?

Page 5: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

5

Introduction to Text AnalyticsTaxonomy Thesauri, Controlled Vocabulary

– Resources to build on– Indexing not categorization

Taxonomy – Foundation for Categorization– Browse – classification scheme– Formal – Is-Child-Of, Is-Part-Of– Large taxonomies - MeSH – indexing all topics– Small is better – for categorization and faceted navigation

Page 6: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

6

Introduction to Text AnalyticsMetadata Metadata standards – Dublin Core - Mostly syntactic not semantic

– Description – static or dynamic (summarization)– Semantic – keywords – very poor performance

Best Bets – high level categorization-search– Human judgments

Audience – mixed results– Role, function, expertise, information behaviors

Facets – classes of metadata– Standard - People, Organization, Document type-purpose– Specialized – methods, materials, products

Page 7: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

7

Introduction to Text AnalyticsText Analytics Categorization

– Multiple techniques – examples, terms, Boolean– Built on a taxonomy

Entity Extraction– Catalogs with variants, rule based dynamic

Summarization– Rules – find sentences in a document

Fact Extraction– Relationships of entities – people-organizations-activities

Sentiment Analysis– Rules – adjectives & adverbs not nouns

Page 8: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

8

Introduction to Text AnalyticsText Analytics Why Text Analytics?

– Enterprise search has failed to live up to its potential– Enterprise Content management has failed to live up to its potential– Taxonomy has failed to live up to its potential– Adding metadata, especially keywords has not worked

What is missing?– Intelligence – human level categorization, conceptualization– Infrastructure – Integrated solutions not technology, software

Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

Page 9: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

9

Text Analytics Platform4 Basic Contexts Ideas – Content Structure

– Language and Mind of your organization– Applications - exchange meaning, not data

People – Company Structure– Communities, Users– Central team - establish standards, facilitate

Activities – Business processes and procedures Technology

– CMS, Search, portals, taxonomy tools– Applications – BI, CI, Text Mining

Page 10: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

10

Text Analytics Platform: The start and foundationKnowledge Architecture Audit Knowledge Map - Understand what you have, what you

are, what you want– The foundation of the foundation

Contextual interviews, content analysis, surveys, focus groups, ethnographic studies

Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories

Natural level categories mapped to communities, activities• Novice prefer higher levels• Balance of informative and distinctiveness

Living, breathing, evolving foundation is the goal

Page 11: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

11

Text Analytics Platform – BenefitsIDC White Paper Time Wasted

– Reformat information - $5.7 million per 1,000 per year– Not finding information - $5.3 million per 1,000– Recreating content - $4.5 Million per 1,000

Small Percent Gain = large savings– 1% - $10 million– 5% - $50 million– 10% - $100 million

Page 12: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

12

Text Analytics Platform – Benefits

Findability within and outside the enterprise– Savings per year - $millions

Rescue enterprise search and ECM projects– Add semantics to search

Clean up enterprise content– Duplication and accurate categorization

Improve the quality of information access– Finding the right information can save millions

Build smarter applications – Social networking, locate expertise within the enterprise

Page 13: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

13

Text Analytics Platform – Benefits

Understand your customers– What they are talking about and how they feel about it

Empower your employees – Not only more time, but they work smarter

Understand your competitors– What they are working on, talking about– Combine unstructured content and rich data sources – more

intelligent analysis

Page 14: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

14

Text Analytics Platform – Dangers

Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library

– Need all of the above and taxonomists+

Bad Design:– Start with bad taxonomy– Wrong taxonomy – too big or two flat

Bad Categorization / Entity Extraction– Right kind of experience

Page 15: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

15

Resources

Books– Women, Fire, and Dangerous Things

• George Lakoff– Knowledge, Concepts, and Categories

• Koen Lamberts and David Shanks– The Stuff of Thought – Steven Pinker

Web Sites– Text Analytics News -

http://social.textanalyticsnews.com/index.php

– Text Analytics Wiki - http://textanalytics.wikidot.com/

Page 16: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

16

Resources

Blogs– SAS- Manya Mayes – Chief Strategist -

http://blogs.sas.com/text-mining/

Web Sites – Taxonomy Community of Practice:

http://finance.groups.yahoo.com/group/TaxoCoP/

– Whitepaper – CM and Text Analytics - http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

Page 17: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com