what you can accomplish with ibm content …...share 2010 – what you can accomplish with ibm...

36
© 2010 IBM Corporation What You Can Accomplish with IBM Content Analytics* What You Can Accomplish With (IBM) Content Analytics Bruce S. Tannenbaum Managing Consultant, IBM Text Analytics Group [email protected] *Currently marketed as Cognos Content Analytics

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

What You Can Accomplish with IBM Content Analytics*

What You Can Accomplish With (IBM) Content Analytics

Bruce S. TannenbaumManaging Consultant, IBM Text Analytics Group

[email protected]

*Currently marketed as Cognos Content Analytics

Page 2: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation2

SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture

Raw DataStore

Indexer

Indexer Service

Scheduler LoggingControl ConfigurationMonitor Security

Common Infrastructure

Exporter

Crawler Framework

CCAIndex

CustomCrawler

QuickPlace

CrawlerDominoDoc MgtCrawler

NotesCrawler

SharePoint

CrawlerExchange

ServerCrawler

NNTPCrawler

DB2Crawler

JDBCDatabaseCrawler

ContentIntegrator

Crawler

DB2Content

MgrCrawler

FileNet P8Crawler

WebCrawler

Seed ListCrawler

WebContent

MgrCrawler

WebSpherePortal

Crawler

WindowsFile

SystemCrawler

UnixFile

SystemCrawler

Agent forFile

SystemCrawler

Collection

Expo

rt

Plug

-in

Text Miner UI

Admin UI

Search UI

SIAPIApplication

Real-time NLPApplication

Document Processor Document Processor

Document Processor

ParserDocument Generator

Ann

otat

or

Ann

otat

or

Ann

otat

or

UIMA

Text Analytics& SearchRuntime

Inspector

CustomPoint

RDB

Cra

wle

r Pl

ug-in

XML

Page 3: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation3

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Agenda

The Growing Need for Content Analytics

What is Content Analytics

IBM Content Analytics Overview

IBM Content Analytics Architecture

Currently marketed as Cognos Content Analytics

Page 4: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation4

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

A Smarter Planet enables business optimization by leveraging all of our enterprise content

80% of information being stored

today is unstructured What if you could find crime

patterns and apprehend criminals in real-time?

What if you could detect fraudulent claims before

they’re paid?

What if you could understand

what your customers want before they ask?

What if you could make cities smarter by integrating

all information about a citizen?

Page 5: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation5

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Business optimization enabled by content analyticsSmarter Insurance Smarter Telecommunications

Smarter Healthcare PlansSmarter CPG

Telecommunications CustomerAnalytics over Voice of Customer data provides insight to drive customer-oriented decision making, boosting loyalty and creating new opportunity

Customer in AustraliaAnalytics over online customer postings helps Kraft target and deliver new branding campaigns, increasing sales and customer loyalty

Healthcare ProviderAnalytics over an integrated single view of plans, patients and providers enables better negotiations and improves provider satisfaction to over 90%

Large Claims Third-Party AdministratorAnalytics over insurance claim files helps detect fraud faster, reducing costs for their clients by $millions and optimizing the claims-handling process

Page 6: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation6

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Agenda

The Growing Need for Content Analytics

What is Content Analytics?

IBM Content Analytics Overview

IBM Content Analytics Architecture

Currently marketed as Cognos Content Analytics

Page 7: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation7

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Definitions

Text AnalyticsThe automated process of

analyzing unstructured text, extracting relevant information, and transforming that information into structured information that can

then be leveraged in different ways

Content AnalyticsA layer above the actual extraction

process that analyzes this information to understand trends and patterns in this content. Content analytics can be used with content from content management

systems or used in conjunction with other unstructured data from any other

corporate system or outside sources

Page 8: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation8

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

8

Analyzed Content (and Data)

“Owner” “reports” “check engine lite”“flashes” “after refueling” ...

Source InformationCorporate (Contact Center, Test

Data, Dealer notes, ECM, etc.) and External ( NHTSA, Edmunds,

Consumer Reports, MotorTrend etc.)

Noun Verb Noun Phrase Prep Phrase

Person Issue Warning Driver action

Component Issue: “Engine Light”Situation: “While Refueling”

ExtractedConcept

Automatic Visualization for Interactive Exploration and Assessment

Content Analytics – How it works

Page 9: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation9

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Agenda

The Growing Need for Content Analytics

What is Content Analytics

IBM Content Analytics Overview

IBM Content Analytics Architecture

Currently marketed as Cognos Content Analytics

Page 10: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation10

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Delivery of Insight to Users, Systems and Processes

Industry Solutions

Business IntelligencePredictive Systems

ECMAdvanced Case Mgt

IBM Content Analytics*

* Currently marketed Cognos Content Analytics

** IBM LanguageWare tooling is now part of CI

Solution and Modeling Tools

IBMLanguageWare **

IBMClassification Module

External and InternalInformation Sources

Sources

Analysis

Exploration

Interactive Assessment andDiscovery of Business Insight

IBM Content Analytics

Page 11: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation11

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

A robust content analytics platform that features…

Immediate benefit from out of the box capabilities

Support for analysis of over 30 content sources and over 150 content formats

Packed with valuable annotators to automatically extract meaningful concepts and entities without customization.

Six user-friendly, graphical views to intuitively uncover new insight.

Dynamic highlighting of interesting anomalies and correlations in the content

Open, standard UIMA-based text analysis pipeline for flexibility and growth

Highly scalable and extensible

Easily-to-use, flexible tooling to tailor annotators, rules and dictionaries.

Enhance content management with insight in your ECM Filenet P8 system.

Analyze in process cases for improved Advanced Case Management

Extend content insight into IBM Cognos 8 BI and its reports and dashboards

Integrate into any application environment – from desktop to mainframe – via web services or native Java APIs.

IBM Classification Module is a proven advanced classification tool to categorize and cluster documents using the context within the content. It’s context sensitive and highly accurate (optional).

Industry Solutions ECM

Business Analytics

Page 12: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation12

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Based on UIMA

John sprained his ankle on the step...

Noun Verb Noun Phrase Prep Phrase

Person Injury Body Part Location

Claimant: Soft Tissue Injury ExtractedConcept

John sprained his ankle on the step...

Noun Verb Noun Phrase Prep Phrase

Person Injury Body Part Location

Claimant: Soft Tissue Injury ExtractedConcept

Noun Verb Noun Phrase Prep Phrase

Person Injury Body Part Location

Claimant: Soft Tissue Injury ExtractedConcept

Unstructured Information Management Architecture

It is an open, industrial-strength, scalable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components.

Although UIMA originated at IBM, it is now an OASIS industry standard and an Open Source project which is currently incubating at the Apache Software Foundation.

http://domino.research.ibm.com/comm/research_projects.nsf/pages/uima.index.html

Automated Concept Extraction and

Logical Organization

UIMA Annotators

Iden

tify

Lang

uage

Wor

d A

naly

tics

Nam

ed E

ntity

Ext

ract

ion

Aut

omat

ic C

lass

ifier

Plu

g-in

Cus

tom

Ana

lytic

s

EnhancedMetadata

AnalyticsIndex

Visualization UI

Crawlers

Mul

ti-w

ord

Ana

lytic

s

Toke

niza

tion

Source InformationInternal (ECM, Files, DBMS, etc.) and

External (Social, News, etc.)

Page 13: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

13

• Text Mining, refers to extracting usable knowledge from unstructured text data, through identification of core concepts, opinions and trends, to drive better business decisions across the enterprise.

What is Text Mining?

SharePointSharePoint

InstantMessages

Desktop

Email File Systems

bag of words

01100101111010110110111000010100

Page 14: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation14

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Automatically Extracted and

Analyzed Concepts, Entities,

Relationships, Meta Data and Classifications

Visualization with Drill Down for Exploration and Assessment

Views, Filters and Thresholds

Search Query Exploration

The Interactive Discovery User Interface Explained

Page 15: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation15

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

FDA MedWatch incident reports are one source of data for medical device manufacturers to understand problems being reported by consumers about their products. It contains both structured and unstructured information.A manufacturer could also analyze internal content, such as warranty claims or support incidents

The FDA's MedWatch Program

Page 16: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation16

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

This view shows frequency Trends over time for all values of the selected facet– in this case, Generic Device Name

Page 17: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation17

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

Here we see an unexpectedly high occurrence of incidents around Infusion Pumps beginning in April, 2008, so we drill in.

Page 18: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation18

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

Switching to the Facets view of verb-noun phrases, we see frequent mentions of battery issues in Infusion Pump incidents reported in April, 2008. We drill down into these battery issues.

Page 19: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation19

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

In the documents view, we can see the original source documents about these 154 battery related infusion pump incidents.Relevant matching text from the original documents is highlighted.

Page 20: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation20

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

Switching to the Brand Name facet view, we can immediately see a summary, by frequency and correlation, of the devices that are mentioned in these battery-related incidents.

Page 21: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation21

SHARE 2010 – What You Can Accomplish with IBM Content Analytics Infusion Pump Analysis

Ten months later the FDA issued a class 1 recall for the Colleague Infusion pump. Reason for recall... damaged batteries

Page 22: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Tuesday May 4th, 2010

Baxter International Inc. said Monday it would recall the approximately 200,000 Colleague brand drug-infusion pumps that are on the market, after years of malfunctions with the device, along with patient injuries and deaths.

The Colleague pumps have been widely used in hospitals, especially in the U.S., to deliver medication and other fluids to patients.

Approximately 200,000 units recalled

Estimated cost of recall between $400-600 million

And 24 months later ...

Page 23: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation23

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

IBM Content Analytics: Analysis Export Capability

IBM Data Warehouse

IBM Master Data Mgmt

Content Intelligence Consumers

Many others

• • •

• • •

• • •

Import

Export

Export

Export

1 Crawled Document ExportExport documents with its metadata andcontent as those are crawled

2 Analyzed Document ExportExport documents with the result of text Analytics such as Natural Language Processing, Named Entity Extraction,classification or user implemented logicbefore indexing

3 Searched Document ExportExport documents limited by search or analysis with original content from the index

RDB

Limit documents by search or analysis

IBM

Con

tent

Ana

lytic

s

Crawler

DataStore

Parser / TokenizerUIMA Annotators

Indexer

SearchIndex

Plug

-inPl

ug-in

Plug

-in

Expo

rter

Page 24: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation24

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Through Content Analytics OLAP/Star Schema export ability, Cognos BI reports and dashboards can be created to monitor and track these issues over time.

Page 25: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation25

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Agenda

The Growing Need for Content Analytics

What is Content Analytics

IBM Content Analytics Overview

IBM Content Analytics Architecture

Currently marketed as Cognos Content Analytics

Page 26: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation26

SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture

Raw DataStore

Indexer

Indexer Service

Scheduler LoggingControl ConfigurationMonitor Security

Common Infrastructure

Exporter

Crawler Framework

CCAIndex

CustomCrawler

QuickPlace

CrawlerDominoDoc MgtCrawler

NotesCrawler

SharePoint

CrawlerExchange

ServerCrawler

NNTPCrawler

DB2Crawler

JDBCDatabaseCrawler

ContentIntegrator

Crawler

DB2Content

MgrCrawler

FileNet P8Crawler

WebCrawler

Seed ListCrawler

WebContent

MgrCrawler

WebSpherePortal

Crawler

WindowsFile

SystemCrawler

UnixFile

SystemCrawler

Agent forFile

SystemCrawler

Collection

Expo

rt

Plug

-in

Text Miner UI

Admin UI

Search UI

SIAPIApplication

Real-time NLPApplication

Document Processor Document Processor

Document Processor

ParserDocument Generator

Ann

otat

or

Ann

otat

or

Ann

otat

or

UIMA

Text Analytics& SearchRuntime

Inspector

CustomPoint

RDB

Cra

wle

r Pl

ug-in

XML

Page 27: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation27

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Collections can contain documents from many sources

Notes docs

Text Analytic Collection

Field 1

Field 2

Field 3

Field 4

Field 6

Field 7

Field 8

Field 1

Field 2

Field 3

Field 4

Field 4

Field 6

Field 7

Field 8

Web docs

Fields 1-8 still comprise single

document in collection

$language

$doctype

$source

Can limit docs found for query

Can share facets

Page 28: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Easy UIMA Annotator Configuration

• Users can enable/disable analytics annotators on Admin GUI• Dictionary and Pattern Matcher Annotators are enabled by default

• Named Entity and ICM Annotator are disabled as default

• User custom annotator is optional

• Support by Text Analytics Collection only

Page 29: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation29

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Steps to tailor your text analysis with flexible, easy-to-use tooling

1 Develop your Custom Text Analysis with ToolingBuild language and domain resources into a LangaugeWare dictionary.Develop rules to spot facts, entities and relationships.Create and test UIMA annotators with a collection of documents.

2 Export your Custom Text AnalysisEasily generate the annotators to be Content Analytics ready

3 Deploy your Custom Text Analysis within ICAImport newly created annotators via Content Analytics administration console and associate it to a collection.

View ofProject Resources

Easy to test and verifyyour tailored text analysis

Easy to export your custom text

analysis

Currently marketed as Cognos Content Analytics

Page 30: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Indexer Task Indexer

UIMA pipeline

DocumentProcessor

DocumentGenerator

RDSdocument

Indexingdocument

RDSdocument

IndexingdocumentCAS CAS

SearchIndex

DocumentCache

TaxonomyIndex

Raw DataStore

Document Processing Engine

Document Processor

DocumentParser

Structure of the Document Processing Engine

Ling

uist

ic A

naly

sis

Lang

Iden

tific

atio

n

Cla

ssifi

catio

n

Cus

tom

Dic

tiona

ry

Pat

tern

Mat

cher

Nam

ed E

ntity

Page 31: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation31

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Document and Analytics Export CapabilityCrawler Framework

Supported Crawlers

• Web (HTTP)

• Windows File System

• UNIX File System

• FileNet P8

• DB2 Content Manager

• Content Integrator

• DB2

• JDBC

• NNTP

• Lotus Notes

• QuickPlace

• SharePoint

• Microsoft Exchange

• WebSphere Portal

• Web Content Mgmt

• Domino Doc Mgmt

Custom Crawler

Craw ler

Plug-inDocument

Cache

IBM Extended Lucene Indexer

UIMA

Document Processor

Parser

Document Generator

Indexer

Text Miner

Applications

Search and Text Analytics

Runtime

Search and Text Analytics

RuntimeText Analytics

Runtime

Analyst

xmlxmlxml

InfoSphere Warehouse

IBM company

1 2 3

• • •

RDB

Export Plug-in Adapter

import

Crawled Document Export

Analyzed Document Export

Searched Document Export

Other Text Analytic

Consumers

Text Analytic

Collection

Page 32: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation32

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Real-time Text AnalyticsCrawler Framework

Supported Crawlers

• Web (HTTP)

• Windows File System

• UNIX File System

• FileNet P8

• DB2 Content Manager

• Content Integrator

• DB2

• JDBC

• NNTP

• Lotus Notes

• QuickPlace

• SharePoint

• Microsoft Exchange

• WebSphere Portal

• Web Content Mgmt

• Domino Doc Mgmt

Custom Crawler

Craw ler

Plug-inDocument

Cache

Indexer

Text Miner Application

Search and Text Analytics

Runtime

Search and Text Analytics

RuntimeText Analytics

Runtime

Analyst

1 2

Text Text

+Annotations

SIAPI Real-time application of text analytics on a single document

1. User submits text through SIAPI

2. CCA returns document with annotations

IBM Extended Lucene Indexer

UIMA

Document Processor

Parser

Document Generator

Text Analytic

Collection

Page 33: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation33

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

1. Submit inspection requests through text miner application

2. Dispatches heavy load inspection jobs to multiple search servers

3. Output report contains important keywords relevant to analytics query

Deep Inspection

Crawler Framework

Supported Crawlers

• Web (HTTP)

• Windows File System

• UNIX File System

• FileNet P8

• DB2 Content Manager

• Content Integrator

• DB2

• JDBC

• NNTP

• Lotus Notes

• QuickPlace

• SharePoint

• Microsoft Exchange

• WebSphere Portal

• Web Content Mgmt

• Domino Doc Mgmt

Custom Crawler

Craw ler

Plug-in

Document

Cache

Indexer

IBM Extended Lucene Indexer

UIMA

Document Processor

Parser

Document Generator

Text Analytics Runtime

Text Miner

1

3

Deep Inspection

Output Report

2

Deep Inspection is a facility for the execution of text analytics jobs which involve large numbers of keywords and facets to be analyzed

Analytics Server

Text Analytic

Collection

Page 34: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

System Configuration

Support Scalable and Flexible Configuration Multiple Document Processing Servers Multiple Text Analytics and Search Runtime Servers Can add new servers without restarting system

Crawler Session

Index Service Session

Doc Processing Session

Text Analytics & Search SessionIndex

Doc Processing Session

Doc Processing Session

Text Analytics & Search SessionIndex

Text Analytics & Search SessionIndex

All-in-one (Master) Server

Document Processing Servers

Text Analytics & Search Runtime Servers

Share or Replicate Index

Page 35: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation35

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Questions and Answers (Maybe)

Page 36: What You Can Accomplish with IBM Content …...SHARE 2010 – What You Can Accomplish with IBM Content Analytics ICA V2.1 System Architecture Raw Data Store Indexer Indexer Service

© 2010 IBM Corporation36

SHARE 2010 – What You Can Accomplish with IBM Content Analytics

Bruce S. Tannenbaum

Managing Consultant

IBM Text Analytics Group

[email protected]