text analytics workshop tom reamy chief knowledge architect kaps group program chair – text...

63
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: alexandra-washington

Post on 19-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Text AnalyticsWorkshop

Tom ReamyChief Knowledge Architect

KAPS Group

Program Chair – Text Analytics World

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

2

Agenda

Introduction – State of Text Analytics– Text Analytics Features– Information / Knowledge Environment – Taxonomy, Metadata,

Information Technology– Value of Text Analytics– Quick Start for Text Analytics

Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications

– Integration with Search and ECM– Platform for Information Applications

Questions / Discussions

Page 3: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

3

Introduction: KAPS Group

Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural

categories, emotion taxonomies Services:

– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics

Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times,

Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.

Presentations, Articles, White Papers – www.kapsgroup.com

Page 4: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

4

Text Analytics WorkshopIntroduction: Text Analytics History – academic research, focus on NLP Inxight –out of Zerox Parc

– Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data

Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends

– Half from 2008 are gone - Lucky ones got bought Focus on enterprise text analytics – shift to sentiment analysis -

easier to do, obvious pay off (customers, not employees)– Backlash – Real business value?

Enterprise search down, taxonomy up –need for metadata – not great results from either – 10 years of effort for what?

Text Analytics is slowly growing – time for a jump?

Page 5: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

5

Text Analytics WorkshopCurrent State of Text Analytics

Big Data – Big Text is bigger, text into data, data for text– Watson – ensemble methods, pun module

Social Media / Sentiment – look for real business value– New techniques, emotion taxonomies

Enterprise Text Analytics (ETA)– ETA is the platform for unstructured text applications– Wide Range of InfoApps – BI,CI, Fraud, social media

Has Text Analytics Arrived?– Survey – 28% just getting started, 11% not yet, 17.5% ETA

What is holding it back?– Lack of clarity about business value, what it is – 55%– Lack of strategic vision, real examples

Gartner – new report on text analytics

Page 6: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

6

Introduction: Future DirectionsWhat is Text Analytics Good For?

Page 7: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

7

Text Analytics WorkshopWhat is Text Analytics?

Text Mining – NLP, statistical, predictive, machine learning Semantic Technology – ontology, fact extraction Extraction – entities – known and unknown, concepts, events

– Catalogs with variants, rule based

Sentiment Analysis– Objects and phrases – statistics & rules – Positive and Negative

Auto-categorization – Training sets, Terms, Semantic Networks– Rules: Boolean - AND, OR, NOT– Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE– Disambiguation - Identification of objects, events, context– Build rules based, not simply Bag of Individual Words

Page 8: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Case Study – Categorization & Sentiment

8

Page 9: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Case Study – Categorization & Sentiment

9

Page 10: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

10

Page 11: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

11

Page 12: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Case Study – Taxonomy Development

12

Page 13: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

13

Text Analytics WorkshopTA & Taxonomy Complimentary Information Platform Taxonomy provides a consistent and common vocabulary

– Enterprise resource – integrated not centralized Text Analytics provides a consistent tagging

– Human indexing is subject to inter and intra individual variation Taxonomy provides the basic structure for categorization

– And candidates terms Text Analytics provides the power to apply the taxonomy

– And metadata of all kinds Text Analytics and Taxonomy Together – Platform

– Consistent in every dimension– Powerful and economic

Page 14: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Text Analytics WorkshopTaxonomy and Text Analytics Standard Taxonomies = starter categorization rules

– Example – Mesh – bottom 5 layers are terms Categorization taxonomy structure

– Tradeoff of depth and complexity of rules– Easier to maintain taxonomy, but need to refine rules

Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large, orthogonal categories

Smaller modular taxonomies– More flexible relationships – not just Is-A-Kind/Child-Of

Different kinds of taxonomies – emotion, expertise No standards for text analytics – custom jobs

– Importance of starting resources

14

Page 15: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

15

Text Analytics WorkshopMetadata - Tagging How do you bridge the gap – taxonomy to documents? Tagging documents with taxonomy nodes is tough

– And expensive – central or distributed Library staff –experts in categorization not subject matter

– Too limited, narrow bottleneck– Often don’t understand business processes and business uses

Authors – Experts in the subject matter, terrible at categorization– Intra and Inter inconsistency, “intertwingleness”– Choosing tags from taxonomy – complex task– Folksonomy – almost as complex, wildly inconsistent– Resistance – not their job, cognitively difficult = non-compliance

Text Analytics is the answer(s)!

Page 16: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

16

Text Analytics WorkshopMind the Gap – Manual-Automatic-Hybrid All require human effort – issue of where and how effective Manual - human effort is tagging (difficult, inconsistent)

– Small, high value document collections, trained taggers Automatic - human effort is prior to tagging – auto-categorization

rules and/or NLP algorithm effort Hybrid Model – before (like automatic) and after

– Build on expertise – librarians on categorization, SME’s on subject terms

Facets – Requires a lot of Metadata - Entity Extraction feeds facets – more automatic, feedback by design

Manual - Hybrid – Automatic is a spectrum – depends on context

Page 17: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

17

Text Analytics Workshop Benefits of Text Analytics

Why Text Analytics?– Enterprise search has failed to live up to its potential– Enterprise Content management has failed to live up to its potential– Taxonomy has failed to live up to its potential– Adding metadata, especially keywords has not worked

What is missing?– Intelligence – human level categorization, conceptualization– Infrastructure – Integrated solutions not technology, software

Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

Page 18: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Strategic Vision for Text AnalyticsCosts and Benefits IDC study – quantify cost of bad search Three areas:

– Time spent searching– Recreation of documents– Bad decisions / poor quality work

Costs – 50% search time is bad search = $2,500 year per person– Recreation of documents = $5,000 year per person– Bad quality (harder) = $15,000 year per person

Per 1,000 people = $ 22.5 million a year– 30% improvement = $6.75 million a year– Add own stories – especially cost of bad information– Human measure - # of FTE’s, savings passed on to customers, etc.

18

Page 19: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

19

Getting Started with Text Analytics Need for a Quick Start

Text Analytics is weird, a bit academic, and not very practical• It involves language and thinking and really messy stuff

On the other hand, it is really difficult to do right (Rocket Science) Organizations don’t know what text analytics is and what it is for TAW Survey shows - need two things:

• Strategic vision of text analytics in the enterprise• Business value, problems solved, information overload• Text Analytics as platform for information access

• Real life functioning program showing value and demonstrating an understanding of what it is and does

Quick Start – Strategic Vision – Software Evaluation – POC / Pilot

Page 20: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

20

Getting Started with Text AnalyticsText Analytics Vision & Strategy

Strategic Questions – why, what value from the text analytics, how are you going to use it

– Platform or Applications?

What are the basic capabilities of Text Analytics? What can Text Analytics do for Search?

– After 10 years of failure – get search to work?

What can you do with smart search based applications?– RM, PII, Social

ROI for effective search – difficulty of believing– Problems with metadata, taxonomy

Page 21: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start Step One- Knowledge Audit

Ideas – Content and Content Structure– Map of Content – Tribal language silos– Structure – articulate and integrate– Taxonomic resources

People – Producers & Consumers– Communities, Users, Central Team

Activities – Business processes and procedures– Semantics, information needs and behaviors– Information Governance Policy

Technology – CMS, Search, portals, text analytics– Applications – BI, CI, Semantic Web, Text Mining

21

Page 22: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start Step One- Knowledge Audit

Info Problems – what, how severe Formal Process – Knowledge Audit

– Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining

Informal for smaller organizations, specific application Category modeling – Cognitive Science – how people think

– Panda, Monkey, Banana Natural level categories mapped to communities, activities

• Novice prefer higher levels• Balance of informative and distinctiveness

Strategic Vision – Text Analytics and Information/Knowledge Environment

22

Page 23: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

23

Quick Start Step Two - Software EvaluationVarieties of Taxonomy/ Text Analytics Software Software is more important to text analytics

– No spreadsheets for semantics

Taxonomy Management - extraction Full Platform

– SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE

Embedded – Search or Content Management– FAST, Autonomy, Endeca, Vivisimo, NLP, etc.– Interwoven, Documentum, etc.

Specialty / Ontology (other semantic)– Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology

Page 24: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start Step Two - Software EvaluationDifferent Kind of software evaluation Traditional Software Evaluation - Start

– Filter One- Ask Experts - reputation, research – Gartner, etc.• Market strength of vendor, platforms, etc.• Feature scorecard – minimum, must have, filter to top 6

– Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus

– Filter Three – In-Depth Demo – 3-6 vendors Reduce to 1-3 vendors Vendors have different strengths in multiple environments

– Millions of short, badly typed documents, Build application– Library 200 page PDF, enterprise & public search

24

Page 25: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start Step Two - Software EvaluationDesign of the Text Analytics Selection Team

IT - Experience with software purchases, needs assess, budget– Search/Categorization is unlike other software, deeper look

Business -understand business, focus on business value They can get executive sponsorship, support, and budget

– But don’t understand information behavior, semantic focus

Library, KM - Understand information structure Experts in search experience and categorization

– But don’t understand business or technology Interdisciplinary Team, headed by Information Professionals Much more likely to make a good decision Create the foundation for implementation

25

Page 26: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start Step Three – Proof of Concept / Pilot Project

POC use cases – basic features needed for initial projects Design - Real life scenarios, categorization with your content Preparation:

– Preliminary analysis of content and users information needs• Training & test sets of content, search terms & scenarios

– Train taxonomist(s) on software(s)– Develop taxonomy if none available

Four week POC – 2 rounds of develop, test, refine / Not OOB Need SME’s as test evaluators – also to do an initial

categorization of content Majority of time is on auto-categorization

26

Page 27: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

27

POC Design: Evaluation Criteria & Issues

Basic Test Design – categorize test set– Score – by file name, human testers

Categorization & Sentiment – Accuracy 80-90%– Effort Level per accuracy level

Combination of scores and report Operators (DIST, etc.) , relevancy scores, markup Development Environment – Usability, Integration Issues:

– Quality of content & initial human categorization– Normalize among different test evaluators– Quality of taxonomy – structure, overlapping categories

Page 28: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start for Text AnalyticsProof of Concept -- Value of POC

Selection of best product(s) Identification and development of infrastructure elements –

taxonomies, metadata – standards and publishing process Training by doing –SME’s learning categorization,

Library/taxonomist learning business language Understand effort level for categorization, application Test suitability of existing taxonomies for range of applications Explore application issues – example – how accurate does

categorization need to be for that application – 80-90% Develop resources – categorization taxonomies, entity extraction

catalogs/rules

28

Page 29: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

POC and Early Development: Risks and Issues

CTO Problem –This is not a regular software process Semantics is messy not just complex

– 30% accuracy isn’t 30% done – could be 90% Variability of human categorization Categorization is iterative, not “the program works”

– Need realistic budget and flexible project plan Anyone can do categorization

– Librarians often overdo, SME’s often get lost (keywords) Meta-language issues – understanding the results

– Need to educate IT and business in their language

29

Page 30: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Development

30

Page 31: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

31

Text Analytics Development: Categorization ProcessStart with Taxonomy and Content Starter Taxonomy

– If no taxonomy, develop (steal) initial high level• Textbooks, glossaries, Intranet structure• Organization Structure – facets, not taxonomy

Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large– Orthogonal categories

Content Selection– Map of all anticipated content – Selection of training sets – if possible– Automated selection of training sets – taxonomy nodes as

first categorization rules – apply and get content

Page 32: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

32

Text Analytics WorkshopText Analytics Development: Categorization Process First Round of Categorization Rules Term building – from content – basic set of terms that

appear often / important to content Add terms to rule, apply to broader set of content Repeat for more terms – get recall-precision “scores” Repeat, refine, repeat, refine, repeat Get SME feedback – formal process – scoring Get SME feedback – human judgments Text against more, new content Repeat until “done” – 90%?

Page 33: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

33

Text Analytics WorkshopText Analytics Development: Entity Extraction Process Facet Design – from Knowledge Audit, K Map Find and Convert catalogs:

– Organization – internal resources– People – corporate yellow pages, HR– Include variants – Scripts to convert catalogs – programming resource

Build initial rules – follow categorization process– Differences – scale, threshold – application dependent– Recall – Precision – balance set by application– Issue – disambiguation – Ford company, person, car

Page 34: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

34

Text Analytics Development: Entity Extraction Process

Demo – SAS Enterprise Content Categorization Amdocs Motivation – BillGreaterThanLast – build rule BillIncludesProrate – auto rule   GAO Project Three – Agriculture and New Agriculture  

Page 35: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

35

Text Analytics WorkshopCase Study - Background

Inxight Smart Discovery Multiple Taxonomies

– Healthcare – first target– Travel, Media, Education, Business, Consumer Goods,

Content – 800+ Internet news sources– 5,000 stories a day

Application – Newsletters – Editors using categorized results– Easier than full automation

Page 36: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

36

Text Analytics WorkshopCase Study - Approach

Initial High Level Taxonomy – Auto generation – very strange – not usable– Editors High Level – sections of newsletters– Editors & Taxonomy Pro’s - Broad categories & refine

Develop Categorization Rules– Multiple Test collections– Good stories, bad stories – close misses - terms

Recall and Precision Cycles– Refine and test – taxonomists – many rounds – Review – editors – 2-3 rounds

Repeat – about 4 weeks

Page 37: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

37

Text Analytics WorkshopCase Study – Issues & Lessons

Taxonomy Structure: Aggregate vs. independent nodes– Children Nodes – subset – rare

Trade-off of depth of taxonomy and complexity of rules No best answer – taxonomy structure, format of rules

– Need custom development– Recall more important than precision – editors role

Combination of SME and Taxonomy pros– Combination of Features – Entity extraction, terms, Boolean, filters,

facts

Training sets and find similar are weakest Plan for ongoing refinement

Page 38: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

38

Text Analytics WorkshopEnterprise Environment – Case Studies

A Tale of Two Taxonomies – It was the best of times, it was the worst of times

Basic Approach– Initial meetings – project planning– High level K map – content, people, technology– Contextual and Information Interviews– Content Analysis– Draft Taxonomy – validation interviews, refine– Integration and Governance Plans

Page 39: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

39

Text Analytics Workshop Enterprise Environment – Case One – Taxonomy, 7 facets

Taxonomy of Subjects / Disciplines:– Science > Marine Science > Marine microbiology > Marine toxins

Facets:– Organization > Division > Group– Clients > Federal > EPA– Facilities > Division > Location > Building X– Content Type – Knowledge Asset > Proposals– Instruments > Environmental Testing > Ocean Analysis > Vehicle– Methods > Social > Population Study– Materials > Compounds > Chemicals

Page 40: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

40

Text Analytics WorkshopEnterprise Environment – Case One – Taxonomy, 7 facets Project Owner – KM department – included RM, business

process Involvement of library - critical Realistic budget, flexible project plan Successful interviews – build on context

– Overall information strategy – where taxonomy fits Good Draft taxonomy and extended refinement

– Software, process, team – train library staff– Good selection and number of facets

Developed broad categorization and one deep-Chemistry Final plans and hand off to client

Page 41: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

41

Text Analytics WorkshopEnterprise Environment – Case Two – Taxonomy, 4 facets Taxonomy of Subjects / Disciplines:

– Geology > Petrology

Facets:– Organization > Division > Group– Process > Drill a Well > File Test Plan– Assets > Platforms > Platform A– Content Type > Communication > Presentations

Page 42: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

42

Enterprise Environment – Case Two – Taxonomy, 4 facetsEnvironment & Project Issues

Value of taxonomy understood, but not the complexity and scope– Under budget, under staffed

Location – not KM – tied to RM and software– Solution looking for the right problem

Importance of an internal library staff– Difficulty of merging internal expertise and taxonomy

Project mind set – not infrastructure– Rushing to meet deadlines doesn’t work with semantics

Importance of integration – with team, company– Project plan more important than results

Page 43: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

43

Enterprise Environment – Case Two – Taxonomy, 4 facetsResearch and Design Issues

Research Issues– Not enough research – and wrong people– Misunderstanding of research – wanted tinker toy connections

• Interview 1 leads to taxonomy node 2

Design Issues– Not enough facets– Wrong set of facets – business not information– Ill-defined facets – too complex internal structure

Page 44: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

44

Enterprise Environment – Case Two – Taxonomy, 4 facetsConclusion: Risk Factors

Political-Cultural-Semantic Environment – Not simple resistance - more subtle

• – re-interpretation of specific conclusions and sequence of conclusions / Relative importance of specific recommendations

Access to content and people– Enthusiastic access

Importance of a unified project team– Working communication as well as weekly meetings

Page 45: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Applications

45

Page 46: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

46

Quick Start for Text AnalyticsBuilding on the Foundation Text Analytics: Create the Platform – CM & Search

– New Electronic Publishing Process• Use text analytics to tag, new hybrid workflow

– New Enterprise Search• Build faceted navigation on metadata, extraction

Enhance Information Access in the Enterprise - InfoApps– Governance, Records Management, Doc duplication, Compliance

– Applications – Business Intelligence, CI, Behavior Prediction– eDiscovery, litigation support, Fraud detection

– Productivity / Portals – spider and categorize, extract

Page 47: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

47

Quick Start for Text AnalyticsInformation Platform: Content Management Hybrid Model – Internal Content Management

– Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author

– Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy

– Feedback – if author overrides -> suggestion for new category

External Information - human effort is prior to tagging– More automated, human input as specialized process –

periodic evaluations– Precision usually more important – Target usually more general

Page 48: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

48

Text Analytics and SearchMulti-dimensional and Smart Faceted Navigation has become the basic/ norm

– Facets require huge amounts of metadata– Entity / noun phrase extraction is fundamental– Automated with disambiguation (through categorization)

Taxonomy – two roles – subject/topics and facet structure – Complex facets and faceted taxonomies

Clusters and Tag Clouds – discovery & exploration Auto-categorization – aboutness, subject facets

– This is still fundamental to search experience– InfoApps only as good as fundamentals of search

People – tagging, evaluating tags, fine tune rules and taxonomy

Page 49: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

49

Page 50: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

50

Page 51: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

51

Integrated Facet ApplicationDesign Issues - General

What is the right combination of elements?– Dominant dimension or equal facets– Browse topics and filter by facet, search box– How many facets do you need?

Scale requires more automated solutions– More sophisticated rules

Issue of disambiguation:– Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford– Same word, different entity – Ford and Ford

Number of entities and thresholds per results set / document– Usability, audience needs

Relevance Ranking – number of entities, rank of facets

Page 52: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

52

Quick Start for Text AnalyticsText and Data: Two Way Street

New types of applications– New ways to make sense of data, enrich data

Harvard – Analyzing Text as Data– Detecting deception, Frame Analysis

Narrative Science – take data (baseball statistics, financial data) and turn into a story

Political campaigns using Big Data, social media, and text analytics

Watson for healthcare – help doctors keep up with massive information overload

Page 53: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

53

Quick Start for Text AnalyticsSocial Media: Beyond Simple Sentiment

Beyond Good and Evil (positive and negative)– Social Media is approaching next stage (growing up)– Where is the value? How get better results?

Importance of Context – around positive and negative words– Rhetorical reversals – “I was expecting to love it”– Issues of sarcasm, (“Really Great Product”), slanguage

Granularity of Application– Early Categorization – Politics or Sports

Limited value of Positive and Negative– Degrees of intensity, complexity of emotions and documents

Addition of focus on behaviors – why someone calls a support center – and likely outcomes

Page 54: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

54

Quick Start for Text AnalyticsSocial Media: Beyond Simple Sentiment Two basic approaches [Limited accuracy, depth]

– Statistical Signature of Bag of Words – Dictionary of positive & negative words

Essential – need full categorization and concept extraction New Taxonomies – Appraisal Groups – Adjective and modifiers –

“not very good”

– Supports more subtle distinctions than positive or negative Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust

– New Complex – pride, shame, confusion, skepticism

Page 55: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Quick Start for Text AnalyticsSocial Media: Beyond Simple Sentiment Expertise Analysis

– Experts think & write differently – process, chunks– Categorization rules for documents, authors, communities

Applications:– Business & Customer intelligence, Voice of the Customer– Deeper understanding of communities, customers – better models– Security, threat detection – behavior prediction, Are they experts?– Expertise location- Generate automatic expertise characterization

Crowd Sourcing – technical support to Wiki’s Political – conservative and liberal minds/texts

– Disgust, shame, cooperation, openness

55

Page 56: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

56

Quick Start for Text AnalyticsBehavior Prediction – Telecom Customer Service

Problem – distinguish customers likely to cancel from mere threats Basic Rule

– (START_20, (AND,  (DIST_7,"[cancel]", "[cancel-what-cust]"),

– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))

Examples:– customer called to say he will cancell his account if the does not stop receiving

a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to

cancel his act

More sophisticated analysis of text and context in text Combine text analytics with Predictive Analytics and traditional behavior

monitoring for new applications

Page 57: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

57

Text Analytics WorkshopConclusions

Text Analytics and Taxonomy are partners – enrich each other Text Analytics can mind the gap – between taxonomies and

documents Text Analytics needs strategic vision and quick start

– Need to approach as platform – deep context – understand information environment

Text Analytics is a platform for huge range of applications:– Search and Content Management and Basic productivity apps– New kinds of applications - social, data, InfoApps of all kinds

Want to learn more – come to Text Analytics World in SF in April!– Call for Speakers-Nov 2 – www.textanalyticsworld.com

Page 58: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 59: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

59

Resources

Books– Women, Fire, and Dangerous Things

• George Lakoff– Knowledge, Concepts, and Categories

• Koen Lamberts and David Shanks– Formal Approaches in Categorization

• Ed. Emmanuel Pothos and Andy Wills– The Mind

• Ed John Brockman • Good introduction to a variety of cognitive science theories,

issues, and new ideas– Any cognitive science book written after 2009

Page 60: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

60

Resources

Conferences – Web Sites– Text Analytics World - All aspects of text analytics

• Call for Speakers – April 17-18, San Francisco– http://www.textanalyticsworld.com

– Text Analytics Summit– http://www.textanalyticsnews.com

– Semtech– http://www.semanticweb.com

Page 61: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

61

Resources

Blogs– SAS- http://blogs.sas.com/text-mining/

Web Sites – Taxonomy Community of Practice:

http://finance.groups.yahoo.com/group/TaxoCoP/– LindedIn – Text Analytics Summit Group– http://www.LinkedIn.com– Whitepaper – CM and Text Analytics -

http://www.textanalyticsnews.com/usa/contentmanagementmeetstextanalytics.pdf

– Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com

Page 62: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

62

Resources

Articles– Malt, B. C. 1995. Category coherence in cross-cultural

perspective. Cognitive Psychology 29, 85-148– Rifkin, A. 1985. Evidence for a basic level in event

taxonomies. Memory & Cognition 13, 538-56– Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987.

Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086

– Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82

Page 63: Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

63

Resources

LinkedIn Groups:– Text Analytics World– Text Analytics Group– Data and Text Professionals– Sentiment Analysis– Metadata Management– Semantic Technologies

Journals– Academic – Cognitive Science, Linguistics, NLP– Applied – Scientific American Mind, New Scientist