improving search for discovery tom reamy chief knowledge architect kaps group program chair – text...

21
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: nigel-rogers

Post on 19-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Improving Search forDiscovery

Tom ReamyChief Knowledge Architect

KAPS Group

Program Chair – Text Analytics World

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Improving Search forDiscovery

and Everything Else

Tom ReamyChief Knowledge Architect

KAPS Group

Program Chair – Text Analytics World

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 3: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

3

Agenda

Introduction

What is Wrong With Search?

What Works?– Metadata & taxonomies– Infrastructure / Information Life Cycle

Yes, But –– Missing Link - Text Analytics – Search and Beyond

Conclusion

Page 4: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

4

Introduction: KAPS Group

Knowledge Architecture Professional Services – Network of Consultants Applied Theory – Faceted taxonomies, complexity theory, natural

categories, emotion taxonomies Services:

– Strategy – IM & KM - Text Analytics, Social Media, Integration– Taxonomy/Text Analytics development, consulting, customization– Text Analytics Quick Start – Audit, Evaluation, Pilot– Social Media: Text based applications – design & development

Partners: Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics

Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.

Presentations, Articles, White Papers – www.kapsgroup.com Program Chair – Text Analytics World

Page 5: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

5

Improving Search for Discovery

TheyWon’tWork!

Page 6: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

6

Improving Search for DiscoveryWhy Won’t It Work?

Search Engines are Stupid! – (and people have better things to do)

Documents deal in language BUT it’s all chicken scratches to Search

Relevance – requires meaning– Imagine trying to understand what a document is about in a

language you don’t know

Mzndin agenpfre napae ponaoen afpenafpenae timtnoe.– Dictionary of chicken scratches (variants, related)– Count the number of chicken scratches = relevance - Not

Google = popularity of web sites and Best Bets– For documents in an enterprise – Counting and Weighting 

Page 7: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

7

Improving Search for DiscoveryWhy Won’t It Work?

Option – Add metadata – good for archiving & indexing Keywords – don’t scale

– Pilots or small doc set and many authors– Folksonomies don’t really work

Tagging – Governance – Thou Shalt Tag! – No they won’t or really badly

Add taxonomies – beautiful to behold, but gap between taxonomy and documents – and too complex for authors

Power Search – statistical signature of a document – apply all kinds of math = Find Similar!

Not trashing search, but just want to say:– Survey Says – Users Unhappy with Search  – Text Analytics is (part of) the answer

Page 8: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

8

Semantic InfrastructureText Analytics Features Text Mining – NLP, machine learning, complex statistics Noun Phrase Extraction – Feed facets

– People, Organizations, Dates, Geographic, Methods, etc. – Catalogs with variants, rule based dynamic.

Sentiment Analysis – Positive and Negative Phrases– Dictionaries & rules – “I hate your product”

Summarization – replace snippets Ontologies – fact extraction + reasoning about relationships Auto-categorization – built on a taxonomy

– Training sets, Terms, Semantic Networks– Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE– Foundation – subjects, disambiguation, add intelligence to all

Page 9: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Case Study – Categorization & Sentiment

9

Page 10: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Improving SearchAdding Meaning and Structure

Text Analytics and Taxonomy Together– Text Analytics provides the power to apply the taxonomy– And metadata of all kinds– Consistent in every dimension, powerful and economic

Hybrid Model– Publish Document -> Text Analytics analysis -> suggestions for

categorization, entities, metadata - > present to author– Cognitive task is simple -> react to a suggestion instead of select from

head or a complex taxonomy– Feedback – if author overrides -> suggestion for new category– Facets – Requires a lot of Metadata - Entity Extraction feeds facets

Hybrid – Automatic is really a spectrum – depends on context– Automatic – adding structure at search results

10

Page 11: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

11

Improving SearchAdding Meaning and Structure Documents are not unstructured – they have a variety of

structures Categorization by page, sections (text markers) or even

sentence or phrase Use generic components – like the level of generality of

terms or concepts (general and context specific) Additional metadata - document types-purpose, authors Relevance – complex rules – based on structure (intelligent

use of titles, headlines, sections + complex categorization

Page 12: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

12

Improving SearchDocument Type Rules (START_2000, (AND, (OR, _/article:"[Abstract]",

_/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans",

(NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"),

If the article has sections like Abstract or Methods AND has phrases around “clinical trials / Humans” and not words

like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score

Primary issue – major mentions, not every mention– Combination of noun phrase extraction and categorization– Results – virtually 100%

Page 13: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

13

Need One More Piece:Smart Semantic Infrastructure Integrate entire information life cycle & environment Semantic Layer = Content, Taxonomies, Metadata, Vocabularies

+ Text Analytics– Integrated / Federated Search – all content

Technology Layer– Search, Content Management, SharePoint, Intranets

People – communities (formal and dynamic), business processes (embedded information needs and behaviors)

Publishing process– Hybrid human automatic structure (tagging)

Feedback is essential – direct user comments to deep analytics

Page 14: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Search Can Work!

Simple Subject Taxonomy structure – Easy to develop and maintain

Combined with categorization capabilities– Added power and intelligence

Combined with Faceted Metadata– Dynamic selection of simple categories– Allow multiple user perspectives

• Can’t predict all the ways people think• Monkey, Banana, Panda

Combined with ontologies and semantic data– Multiple applications – Text mining to Search

Combined with feedback before and after Search ROI is enormous - $7M per 1,000 employees a year

14

Page 15: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

15

Enterprise Text AnalyticsBuilding on the Foundation: Applications

Focus on business value, cost cutting Enhancing information access is means, not an end

– Governance, Records Management, Doc duplication, Compliance

– Business Intelligence, CI, Behavior Prediction– eDiscovery, litigation support, Risk Management– Productivity / Portals -KM communities & knowledge bases

Sentiment Analysis, Social Media Analysis– Adding Search-based intelligence – context – New taxonomies – emotion, Appraisal

Page 16: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

16

Beyond Search: Info AppsSearch-based Applications Plus Legal Review

– Significant trend – computer-assisted review – TA- categorize and filter to smaller, more relevant set– Payoff is big – One firm with 1.6 M docs – saved $2M

Expertise Location – Data (HR) plus text – authored documents – subject & level

Financial Services– Combine structured data (what) and unstructured text (why)– Anti-Money Laundering

Page 17: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

17

Beyond Search: Info AppsBehavior Prediction – Telecom Customer Service

Problem – distinguish customers likely to cancel from mere threats Basic Rule

– (START_20, (AND,  (DIST_7,"[cancel]", "[cancel-what-cust]"),

– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))

Examples:– customer called to say he will cancell his account if the does not stop receiving

a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to

cancel his act

More sophisticated analysis of text and context in text Combine text analytics with Predictive Analytics and traditional behavior

monitoring for new applications

Page 18: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

18

Beyond Search: Info AppsPronoun Analysis: Fraud Detection - Enron Emails Patterns of “Function” words reveal wide range of insights Function words = pronouns, articles, prepositions, conjunctions, etc.

– Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words

Areas: sex, age, power-status, personality – individuals and groups Lying / Fraud detection: Documents with lies have

– Fewer and shorter words, fewer conjunctions, more positive emotion words

– More use of “if, any, those, he, she, they, you”, less “I”– More social and causal words, more discrepancy words

Current research – 76% accuracy in some contexts Text Analytics can improve accuracy and utilize new sources

Page 19: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

19

Conclusions

Traditional Search improvements – nice, but Relevance needs meaning, Keyword and human tagging don’t

work Search + Text Analytics + Semantic Infrastructure work Text Analytics THE essential component of a multi-modal

solution Semantic Infrastructure

– Content, People, Technology, Processes – Integration of text analytics, search, content management– Hybrid Model of tagging – best of human & machine

Smart Search as foundation for new universe of Apps = Success beyond your wildest dreams!

Page 20: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

20

Conclusions

Now You Believe! So, what next – how can you get started? Quick Start – software evaluation, Knowledge Map, POC or

Pilot = Good choice and Learn by doing Fall – Attend ESS, TBC, KMWorld – latest ideas Or develop a time machine and go back to yesterday and

take my workshop Fall 2014 – early 2015: New Book:

– Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data

– Title might be shorter but it will be cover all you need to know

Page 21: Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com