googalize your search with directinfo documents directinfo documents - new features author: kiril...
TRANSCRIPT
Googalize your Search with DirectInfo Documents
DirectInfo Documents - New Features
Author:
Kiril Rusev
Software ArchitectSemantec Bulgaria OOD
Semantec GmbHBenzstr. 32D-71083 Herrenberg, Germanywww.semantec.de
Motivation - The Answer
Oracle TextIndex
DirectInfo
Document Files
Database Data
Web Contents
Structured Search Results
What is DirectInfo? A framework based on Oracle Text Can index and search into various
data sources Can be extended Can be adjusted to the customer’s
needs
DirectInfo and Oracle Text
Oracle Text
Context indexes withUSER_DATASTORE
Full control over the indexing
Flexible and extensible filtering
Custom defined document grouping
Regular index management
Effective cachingmechanism
Fast and flexiblesearching
A lot of context information
Summarizingcapabilities
Oracle
DirectInfo
DirectInfo Architecture
Search Results
- Text fragments- Document summary- File information- Direct link to every document- ...
DirectInfo
Index Groups
Documents:local files, web content,
email, third partysystems, etc.
DocumentsMeta Info
Text Indexes
Document Cache
Data Retrieval
Crawling
GatheringMeta Info
Indexing
Users
Sending Keywords
Searching
Getting SearchResults List
Preparing TheResults
Getting Results
Direct link to every document
Security
Checkinguser rights
Crawlers
What is DirectInfo Documents? Based on DirectInfo platform A powerful document searching
tool A web based “google-like”
application Easily managed and deployed
What's new? Speed improvement Robustness Manageability Functional improvements LF and search results presentation
improved
Speed improvement – Document Cache
User DatastorePL/SQL Procedure
NullFilter
HTMLHTML
Filtering
HTML
DocumentCache
Store/Retrieve HTML
• Filtering is done only once• The HTML version of the document is cached
Speed improvement – Faster Crawling
DirectInfo
Internet
Local Files
Crawler Interface
File Crawler
Web Crawler
Other…
Crawlers are adjusted according to the target document sources
Robustness – Better Filtering
Before: DatastoreINSO Filter
PDF PDF HTML
XFilter
After: DatastorePDF HTML NULL
Filter
HTML
Filter 1 Filter 2 Filter N…
Manageability - Indexing in Chunks
Before: Dtx_Ddl.Sync_Index Index
Unstoppable !!!
After: Index
Dtx_Ddl.Sync_Index
Dtx_Ddl.Sync_Index
Dtx_Ddl.Sync_Index………
Functional improvements - Duplicated Files Detection
Before:
Found Files Indexed Files
After:
Found Files
Indexed Files
LF and search results presentation improved Deferred fragments loading Skins support, XP look and feel Visual and functional redesign -
HTML Frames Searching made more simple
Future development Defining and searching of meta
data Search results clustering Improved flexibility Improved administration Improved caching Better summarizing