text analytics workshop applications tom reamy chief knowledge architect kaps group knowledge...

27
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Upload: emma-singleton

Post on 28-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Text Analytics Workshop Applications

Tom ReamyChief Knowledge Architect

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com

Page 2: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2

Agenda

Text Analytics Applications– Integration with Search –Faceted Navigation– Integration with ECM

• Metadata• Auto-categorization

– Platform for Information Applications

• Enterprise – internal and external

• Commercial

• Structure for Social

Page 3: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

3

Text Analytics and Search - Elements

Facet – orthogonal dimension of metadata Entity / Noun Phrase – metadata value of a facet Entity extraction – feeds facets, signature, ontologies Taxonomy and categorization rules Auto-categorization – aboutness, subject facets People – tagging, evaluating tags, fine tune rules and

taxonomy

Page 4: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

4

Essentials of Facets

Facets are not categories– Categories are what a document is about – limited number– Entities are contained within a document – any number

Facets are orthogonal – mutually exclusive – dimensions– An event is not a person is not a document is not a place.

Facets – variety – of units, of structure– Numerical range (price), Location – big to small– Alphabetical, Hierarchical – taxonomic

Facets are designed to be used in combination• Wine where color = red, price = excessive, location = Calirfornia,• And sentiment = snotty

Page 5: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

5

Advantages of Faceted Navigation

More intuitive – easy to guess what is behind each door• Simplicity of internal organization• 20 questions – we know and use

Dynamic selection of categories• Allow multiple perspectives• Ability to Handle Compound Subjects

Systematic Advantages – fewer elements– 4 facets of 10 nodes = 10,000 node taxonomy– Ability to Handle Compound Subjects

Flexible – can be combined with other navigation elements

Page 6: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

6

Developing Facets: Tools and TechniquesSoftware Tools – Entity Extraction Dictionaries – variety of entities, coverage, specialty

– Cost of update – service or in-house– 50+ predefined entity types– 800,000 people, 700,000 locations, 400,000 organizations

Rules– Capitalization, text – Mr., Inc.– Advanced – proximity and frequency of actions, associations– Need people to continually refine the rules

Entities and Categorization– Total number and pattern of entities = a type of aboutness of the

document – Bar Code, Fingerprint– SAS – integration of entities (concepts) and categorization

Page 7: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

7

Three Environments

E-Commerce– Catalogs, small uniform collections of entities– Uniform behavior – buy this

Enterprise– More content, more types of content– Enterprise Tools – Search, ECM– Publishing Process – tagging, metadata standards

Internet– Wildly different amount and type of content, no taggers– General Purpose – Flickr, Yahoo– Vertical Portal – selected content, no taggers

Page 8: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

8

Three Environments: E-Commerce

Page 9: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

9

Three Environments: E-Commerce

Page 10: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

10

Enterprise Environment – When and how add metadata

Enterprise Content – different world than eCommerce– More Content, more kinds, more unstructured– Not a catalog to start – less metadata and structured content – Complexity -- not just content but variety of users and activities

Combination of human and automatic metadata – ECM– Software aided - suggestions, entities, ontologies

Enterprise – Question of Balance / strategy– More facets = more findability (up to a point)– Fewer facets = lower cost to tag documents

Issues– Not enough facets– Wrong set of facets – business not information– Ill-defined facets – too complex internal structure

Page 11: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

11

Facets and Taxonomies Enterprise Environment –Taxonomy, 7 facets

Taxonomy of Subjects / Disciplines:– Science > Marine Science > Marine microbiology > Marine toxins

Facets:– Organization > Division > Group– Clients > Federal > EPA– Instruments > Environmental Testing > Ocean Analysis > Vehicle– Facilities > Division > Location > Building X– Methods > Social > Population Study– Materials > Compounds > Chemicals– Content Type – Knowledge Asset > Proposals

Page 12: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

12

External Environment – Text Mining, Vertical Portals

Internet Content – Scale – impacts design and technology – speed of indexing– Limited control – Association of publishers to selection of content to none– Major subtypes – different rules – metadata and results

Complex queries and alerts– Terrorism taxonomy + geography + people + organizations

Text Mining – General or specific content and facets and categories– Dedicated tools or component of Portal – internal or external

Vertical Portal – Relatively homogenous content and users– General range of questions– More specific targets – the document, not a web site

Page 13: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

13

Internet Design

Subject Matter taxonomy – Business Topics– Finance > Currency > Exchange Rates

Facets – Location > Western World > United States– People – Alphabetical and/or Topical - Organization– Organization > Corporation > Car Manufacturing > Ford– Date – Absolute or range (1-1-01 to 1-1-08, last 30 days)– Publisher – Alphabetical and/or Topical – Organization– Content Type – list – newspapers, financial reports, etc.

Page 14: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

14

Page 15: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

15

Page 16: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

16

Page 17: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

17

Integrated Facet ApplicationDesign Issues - General

What is the right combination of elements?– Faceted navigation, metadata, browse, search, categorized

search results, file plan

What is the right balance of elements?– Dominant dimension or equal facets– Browse topics and filter by facet

When to combine search, topics, and facets?– Search first and then filter by topics / facet– Browse/facet front end with a search box

Page 18: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

18

Integrated Facet ApplicationDesign Issues - General Homogeneity of Audience and Content Model of the Domain – broad

– How many facets do you need?– More facets and let users decide– Allow for customization – can’t define a single set

User Analysis – tasks, labeling, communities• Issue – labels that people use to describe their

business and label that they use to find information Match the structure to domain and task

– Users can understand different structures

Page 19: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

19

Automatic Facets – Special Issues

Scale requires more automated solutions– More sophisticated rules

Rules to find and populate existing metadata– Variety of types of existing metadata – Publisher, title, date– Multiple implementation Standards – Last Name, First / First Name,

Last Issue of disambiguation:

– Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford– Same word, different entity – Ford and Ford

Number of entities and thresholds per results set / document– Usability, audience needs

Relevance Ranking – number of entities, rank of facets

Page 20: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

20

Putting it all together – Infrastructure Solution

Facets, Taxonomies, Software, People Combine formal power with ability to support multiple

user perspectives Facet System – interdependent, map of domain Entity extraction – feeds facets, signatures, ontologies Taxonomy & Auto-categorization – aboutness, subject People – tagging, evaluating tags, fine tune rules and

taxonomy The future is the combination of simple facets with rich

taxonomies with complex semantics / ontologies

Page 21: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

21

Putting it all together – Infrastructure Solution

Integration with ECM– Central Team –

• Metadata – Create dictionaries of entities

• Develop text analytics catalogs

– Publishing Process• Software suggests entities, categorization

• Authors task is simple – yes or no, not think of keyword

Enterprise Search– Integrate at metadata level – build advanced presentation and

refine results– Integrate into relevance

Page 22: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

22

Text Analytics Platform – Multiple Applications

Platform for Information Applications– Content Aggregation– Duplicate Documents – save millions!– Text Mining – BI, CI – sentiment analysis– Social – Hybrid folksonomy / taxonomy / auto-metadata– Social – expertise, categorize tweets and blogs, reputation– Ontology – travel assistant – SIRI

Integrate with Applications Text into data – predictive analytics Use your Imagination!

Page 23: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

23

New Applications in Social MediaBehavior Prediction – Telecom Customer Service Problem – distinguish customers likely to cancel from mere threats Analyze customer support notes General issues – creative spelling, second hand reports Develop categorization rules

– First – distinguish cancellation calls – not simple– Second - distinguish cancel what – one line or all– Third – distinguish real threats

Page 24: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

24

New Applications in Social MediaBehavior Prediction – Telecom Customer Service

Basic Rule

– (START_20, (AND,  – (DIST_7,"[cancel]", "[cancel-what-cust]"),– (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))

Examples:– customer called to say he will cancell his account if the does not stop receiving

a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to

cancel his act– ask about the contract expiration date as she wanted to cxl teh acct

Combine sophisticated rules with sentiment statistical training and Predictive Analytics and behavior monitoring

Page 25: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

25

New Applications: Wisdom of CrowdsCrowd Sourcing Technical Support Example – Android User Forum Develop a taxonomy of products, features, problem areas Develop Categorization Rules:

– “I use the SDK method and it isn't to bad a all. I'll get some pics up later, I am still trying to get the time to update from fresh 1.0 to 1.1.”

– Find product & feature – forum structure– Find problem areas in response, nearby text for solution

Automatic – simply expose lists of “solutions”– Search Based application

Human mediated – experts scan and clean up solutions

Page 26: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

26

New Directions in Social MediaText Analytics, Text Mining, and Predictive Analytics Two Systems of the Brain

– Fast, System 1, Immediate patterns (TM)– Slow, System 2, Conceptual, reasoning (TA)

Text Analytics – pre-processing for TM– Discover additional structure in unstructured text– Behavior Prediction – adding depth in individual documents – New variables for Predictive Analytics, Social Media Analytics– New dimensions – 90% of information

Text Mining for TA– Semi-automated taxonomy development – Bottom Up- terms in documents – frequency, date, clustering– Improve speed and quality – semi-automatic

Page 27: Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Questions?

Tom [email protected]

KAPS Group

Knowledge Architecture Professional Services

http://www.kapsgroup.com