a technical tour of ibm websphere information integrator ... · technical tour. figure 2. an...

25
A Technical Tour of IBM WebSphere Information Integrator Content Edition 1 A Technical Tour of IBM® WebSphere® Information Integrator Content Edition By Sean Johnson Lead Architect, WebSphere Information Integrator Content Edition February, 2005

Upload: others

Post on 01-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 1

A Technical Tour of

IBM® WebSphere® Information Integrator Content Edition

By Sean Johnson Lead Architect, WebSphere Information Integrator Content Edition

February, 2005

Page 2: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 2

Table of contents Introduction......................................................................................................................... 3 Integration services ............................................................................................................. 4

Access services ............................................................................................................... 6 Connectors ...................................................................................................................... 6 Session pools................................................................................................................... 8 HTTP access ................................................................................................................... 8

Federation services.............................................................................................................. 9 Federated search............................................................................................................ 10 Data maps...................................................................................................................... 11 View services ................................................................................................................ 11 Virtual repositories........................................................................................................ 13 Subscription event services........................................................................................... 15

The event path........................................................................................................... 16 Customizing subscription event services using plug-ins .......................................... 17 Statistics .................................................................................................................... 17

Developer and end user services....................................................................................... 18 Web client ..................................................................................................................... 18 Web components........................................................................................................... 19 Application programming interfaces ............................................................................ 21

Integration API.......................................................................................................... 21 Virtual repository API............................................................................................... 21 Subscription event services API ............................................................................... 21 Web services API...................................................................................................... 22 URL addressability ................................................................................................... 22 Connector SDK......................................................................................................... 22

Security ............................................................................................................................. 23 Authentication system................................................................................................... 23 Authorization system .................................................................................................... 24

Summary ........................................................................................................................... 24 Resources .......................................................................................................................... 24 About the author ............................................................................................................... 25 ©Copyright IBM Corporation, 2005. IBM, Domino, Lotus, Lotus Notes, OmniFind, and WebSphere are trademarks or registered trademarks of Internatioal Business Machines Corporation in the US, other countries, or both. Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft is a trademark of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

Page 3: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 3

Introduction WebSphere Information Integrator software provides the means to address the information challenges inherent in a complex, heterogeneous environment that is typical of many businesses. As shown in Figure 1, WebSphere Information Integrator offers a wide range of capabilities—enterprise search, federation, transformation, data placement (including replication and caching) and event publishing—designed to meet varied integration requirements and easily integrate with industry-leading analytical tools, portal environments and packaged applications, application development environments, messaging-oriented middleware, service-oriented architectures (SOAs) and business process software.

The newest member of the WebSphere Information Integrator portfolio is WebSphere Information Integrator (II) Content Edition, which provides the capability to integrate enterprise applications with relevant content, such as documents, images, audio, video, and other unstructured and semi-structured information stored in multiple, disparate repositories throughout the enterprise. This article assumes you have some familiarity with enterprise content management concepts.

Figure 1. WebSphere Information Integrator platform

WebSphere II Content Edition provides a single, Java™-based, bi-directional interface to access many different content repositories and workflow systems, making it easy for application developers to integrate those sources into new or existing enterprise applications. The product includes pre-built Web components, making it even easier to include II Content Edition capabilities into Web applications, including the ability to read and update content. Other capabilities include:

Page 4: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 4

• Cross-repository federated searching • Virtual repositories to work with content from multiple repositories • Cross-repository event services • A data dictionary for mapping metadata fields across repositories • XML import and export into a repository neutral format • Automatic content conversion to browser-ready formats.

This article provides a “technical tour” of the robust, J2EE-based architecture and technology behind WebSphere II Content Edition. As shown in Figure 2, WebSphere II Content Edition’s services oriented architecture can be described in terms of core integration services underlying a rich set of multi-repository federation services with access to the system via developer and end user services all while maintaining strict security for the content being integrated. These areas will be the main stops of our technical tour.

Figure 2. An architectural overview of WebSphere II Content Edition

Integration services Our first stop on the tour is to integration services. These services provide a single, consistent interface to the underlying content repositories, including content, functionality, and workflow capabilities. Integration services expose a superset of content management and workflow functionality and also maintain the awareness of both the available repositories and the functional capabilities of each repository. This means that your client applications are not limited to

Page 5: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 5

a least common denominator of repository capabilities but can discover the capabilities available for any particular repository item. By defining a complete, uniform model through which this functionality can be accessed, applications leveraging II Content Edition can readily expose the full capabilities of existing repositories, regardless of the underlying repository or vendor. Furthermore, applications built on II Content Edition are “future-proofed” against changes to the enterprise infrastructure such as upgrades to back-end systems, migration from one system to another, or acquisition of new systems. The following are some of the operations are available:

• Search for content – Perform parametric and full-text searches against one or multiple content repositories

• Capture content – Add content and metadata to repositories • Control content – Perform library functions such as check-in / check-out and

copy or transfer folders and documents within a repository or across repositories while maintaining properties, versioning information, and other content attributes

• Retrieve content – Retrieve content and associated meta-data values from repositories in the content’s native format or in an XML document

• Update content – Make changes to content and update meta-data values, annotations and security settings while maintaining version control

• Manage content hierarchies – Create and delete folders, file and un-file content in folders, retrieve folder contents, and update folder properties

• Search for work items – Perform parametric searches against one workflow engine or federated searches against multiple workflow engines.

• Create new work items – Initiate new instances of workflow processes and apply meta-data values and content attachments

• Retrieve work items – Retrieve work items and any attached content from an in-box or specific queues or steps in the workflow process

• Update work items – Make changes to work items including meta-data and attachments. Perform actions on the work item such as locks, suspend/resume, dispatching, etc.

• Audit – All actions initiated through WebSphere II Content Edition can be audited at various different levels with all the pertinent information such as the time, the user, the specific action taken and item being accessed.

• Maintain security – Ensure users access only authorized content and work items by taking advantage of the security features inherent in the underlying system

• Manage sessions – Log on and log off to content repositories and workflow systems with password encryption over the wire. Handles session pooling.

It is important to understand that WebSphere II Content Edition itself provides access these capabilities and does not provide the implementation; rather, that capability is provided by the backend repository. The components of integration services include:

• Access services, which is the access hub of WebSphere II Content Edition

Page 6: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 6

• Connectors, which are used to communicate with repositories • Session pools for performance and scalability • HTTP access to content

Access services At the heart of the WebSphere Information Integrator Content Edition integration services is an architectural hub called access services, shown in Figure 3. Access services is implemented as a stateful session EJB with one instance per session. The J2EE application server provides EJB clustering to support load balancing and high availability, and distributed network communications to support various network topologies and geographic scenarios. An access services instance defines a single WebSphere II Content Edition session and brokers access to disparate enterprise repositories by relaying application requests to the appropriate repository via connectors. Access services aggregates the results of multi-repository application requests and returns this information to the client application, along with any requested metadata and content in the desired format.

Figure 3. Access services instances in the WebSphere II Content Edition architecture

Access services also serves as a configuration hub, communicating with a configuration server to determine the active configuration of the system. This allows the configuration data to remain in a centralized, fail-safe service while being propagated out to the other services as needed.

Connectors WebSphere II Content Edition must translate the requests made to access services (such as searching or capturing content) to the vendor-specific APIs of content repositories and workflow engines. This translation is done by connectors, which also normalize the results of those operations and return the data to access services. WebSphere II Content

Page 7: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 7

Edition includes connectors for a wide variety of popular content repositories and workflow engines. They are also extensible to support unique or non-standard implementations. If you want to develop a new connector, there is a connector SDK to help you do that, which you can read more about on page 22. Connectors are available in the product to the following repositories:

• IBM DB2 Content Manager and Content Manager On Demand • IBM DB2 WebSphere MQ Workflow • IBM Lotus Notes® Domino® and Domino Document Manager • FileNet (Content Services, Image Services, Image Services Resource Adapter, P8

Content Manager, and P8 Business Process Manager) • EMC Documentum • Microsoft® Index Server/NTFS • Open Text LiveLink • Stellent Content Server • Interwoven TeamSite • Hummgingbird Enterprise DM

Connectors are normally hosted within a container stateful session EJB. As with Access Services, the J2EE application server provides services to the connector for clustering to support load balancing, and high availability, and distributed network communications to support various network topologies and geographic scenarios. You can move connectors outside the EJB container to support different API access paradigms, network topologies, and geographic scenarios. As shown in Figure 4, there are two ways to do this:

• Use the Remote Method Invocation (RMI) proxy connector to proxy the connector requests to an RMI server that is hosting the “real” connector in another VM on the same machine or on another server.

• Using the Simple Object Access Protocol (SOAP) proxy connector to use HTTP-based communication of the request to a remote web server, allowing the request to be easily proxied through firewalls and over the Internet.

The same standard connector is used, unchanged and unaware of these proxy connectors. Configuring a connector to use either of these proxies is as simple as turning the option on and providing the connection URLs in the connector’s configuration options in the configuration tool provide with WebSphere II Content Edition.

Page 8: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 8

Figure 4. Proxying connectors

Session pools

WebSphere Information Integrator Content Edition takes a stateful approach to establishing sessions with underlying repositories and workflow systems. Considering that most repositories in the marketplace offer only session-based APIs and that establishing a repository session is one of the most expensive operations for most repositories, it makes sense to leverage a stateful model.

Session pooling is recommended for connections with high startup costs in high-demand applications that service a broad community of users, such as public websites and large-scale intranets. WebSphere II Content Edition’s session pools are easily configured within the administration tool and can be used to:

• Reuse repository connections per named user within an application or among different applications

• Limit the repository connections consumed by a given WebSphere Information Integrator Content Edition application during peak activity

• Minimize the number of repository logins/logouts required to perform a given number of repository actions

• Prevent "leaking" of repository login connections by client applications that terminate unexpectedly

• Minimize creation and destruction of session EJBs in the application server

HTTP access As enterprise-level applications replace legacy departmental-level applications in many organizations, it can be challenging from a network perspective to integrate these old and new applications. This requires a flexible enterprise content integration platform that can

Page 9: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 9

effectively deal with decentralized and geographically distributed users and content sources while making the most effective use of the available network topology and bandwidth. For example, you want to ensure that when creating a centralized enterprise application to service users in multiple geographic locations that content is always transferred to end users in the most efficient and direct network route possible and is not hampered by the centralized nature of the application. All connectors in WebSphere Information Integrator Content Edition by default support direct HTTP access to the native (binary) content on the connector server. This means that the route optimization capabilities of TCP/IP can be used to transfer that content to the consuming application. If the underlying repository already supports HTTP access for binary content, this capability is exposed in a uniform manner through WebSphere II Content Edition. If the repository does not support HTTP access for binary content, then this service is provided seamlessly to the repository connector in a secure manner by the WebSphere II Content Edition platform. In this way a centralized enterprise application, such as the one in Figure 5, that is integrated with users and content sources that are geographically distributed can still deliver access to relevant content in the most efficient manner.

Figure 5. HTTP access in a geographically distributed environment

Federation services Our next stop on the tour is all the value-added federation services that WebSphere Information Integrator Content Edition provides. Built on the integration services described above, these services make it easier to deal with multiple sources of content and workflow automation at the same time. Federation services include:

• Federated search for performing a single for all relevant content across many repositories, described on page 10 .

• Data maps, which translate between the disparate indexing schemas of each repository, described on page 11.

Page 10: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 10

• View services for on-the-fly rendering of content, described on page 11. • Virtual repositories for virtually reorganizing content to support new business

initiatives, described on page 13. • Subscription event services for providing event notification of changes in the

repositories, described on page 15.

Federated search When your organization has many disparate sources of content and different workflow systems, it can be challenging to provide searching that spans content and processes. One approach to searching across all the content and processes in an enterprise is federated searching, or query brokering. A federated search leverages the search capabilities inherent in the underlying content repositories and workflow systems. A single query is specified and passed to the federated search system. The federated search system translates the search criteria into a format appropriate for each underlying content source and dispatches, or brokers, the search request to each source. The results from the search are then aggregated by the federated search system. Federated searching may also involve post search processing, such as sorting, that is performed on the combined result set. The final result of a federated search is a single aggregated result set that appears as if it came from one system that housed all the enterprise’s content. WebSphere II Content Edition provides a mechanism for performing federated searches and working with the results that does not require the creation or maintenance of a centralized full-text or metadata index. Instead, WebSphere II Content Edition leverages the search capabilities of the underlying repositories to provide real-time access to all enterprise content. Applications can then work with the content in the search results by using the WebSphere II Content Edition APIs. In a federated search, access services receives the search request, determines the repositories that will participate in the search and dispatches query requests to each repository so the queries can occur in parallel. An instance of server result set, a server-side result set cache maintained in the EJB tier that acts as the aggregation point for results coming back from the multiple repositories in a federated search, is created for each search performed in the system and is persisted for the lifetime of the client result set. The server-side result set provides post-processing of the results, such as aggregated sorting, and provides a configurable cursor-like mechanism for result rows to be retrieved to the consuming application as they are needed, thus decreasing network utilization and increasing performance. A complementary approach to federated search of enterprise-wide information assets is to use search technology that creates a centralized index of the enterprise’s content and metadata that can then be searched. This approach is implemented in search engines such as WebSphere Information Integrator OmniFind™ Edition. Different usage scenarios will require either federated search or indexed search. Also enterprise content integration and search technology can be used together to provide a content integration platform for

Page 11: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 11

crawling and building up indexes and then allowing the user to take actions in the repository on the content that resulted from an indexed search.

Data maps Data maps provide a data dictionary to map metadata across different repositories. This data-mapping feature allows the differing metadata schemas of each repository in the organization to be mapped to a common data model using the data map designer, shown in Figure 6. With data maps, a client can perform a single search against all repositories, update content meta-data or transfer content from one repository to another, all without having to be aware of the individual repositories’ schemas.

Figure 6. Data map designer

View services With view services, you can deploy content-based solutions in true thin-client environments without requiring vendor-specific client software, plug-ins or viewers. View services includes two Web-friendly options for viewing content:

• Content is processed on the server and HTML and browser-ready images provided to the Web client.

• The client downloads a small Java viewer applet on demand to the client so that the processing capabilities of the client can be leveraged for manipulating images.

The server-side conversion and processing of content and images is implemented as a service provided by a stateless session EJB. Documents and image files are converted on the fly into a browser-readable format such as HTML and GIF and PNG images. Over

Page 12: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 12

150 different office automation document formats, such as Microsoft Word, Excel, and PowerPoint, can be converted into HTML. View services also converts image formats not natively handled by web browsers such as TIFF and MO:DCA images into JPEG and PNG images. Finally, both the server-side conversion process and the client-side viewer applet offer image manipulation functions such as lighten, darken, invert, enhance, rotate, zoom, and page navigation. View services is delivered with default image processing, image conversion, and document conversion engines. In addition, a service provider interface (SPI) provides a set of methods that are typically implemented for image and document processing engines. In the event that a file type is not supported by the default engines delivered with view services, or if you have a preferred engine, you can use the service provider interface to substitute or augment the conversion engines delivered with view services. Conversion and processing engines are associated with content formats through the format’s standard MIME type with the view services designer, shown in Figure 7. This enables view services to automatically select the appropriate converter or processor for a given piece of content.

Figure 7. View services designer

In addition to the server-side content conversion, a Java viewer component, shown in Figure , is available with view services to provide an alternative deployment option for viewing content in a web environment. In this scenario, the image file is sent directly to the Java viewer component without server-side conversion. Annotation viewing, creation and modification is provided through this viewer for cases where the underlying

Page 13: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 13

repository supports annotations. The viewer component is provided as a JavaBean for embedding in custom Java applications or as a signed applet for immediate use in web applications.

Figure 8. Viewer applet enables manipulation of content on the client

Virtual repositories

WebSphere Information Integrator Content Edition integration services provide uniform access to content, regardless of where or how it is stored. The virtual repository concept further builds on this integration capability. Using virtual repositories, applications can create managed sets of content and work items relevant to a specific business context, and associated with specific users, customers or activities—even if that content is spread across disparate back-end systems. This relieves enterprise application developers of the burden of managing relationships to related and supporting content. Content can be treated as if it exists in a “virtual repository” created solely for a particular business process, business object or topic such as a user in a portal, an opportunity in a CRM system, or an invoice in an ERP system.

With virtual repositories, content and meta-data is not replicated; it is simply accessed more quickly from virtual repository views that contain references to content from multiple sources. The virtual repository may contain direct or query-based references to content from any repository or workflow engine that is accessible via WebSphere II Content Edition. In addition to content, a virtual repository can contain organizational structures, both ones that exist only within the virtual repository as well as references to the existing folders and taxonomies in one or more repositories. If needed, you can also supplement the metadata schema and the security policy of the content in the repository

Page 14: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 14

with meta-data and security that is appropriate to the business context of the virtual repository. As shown in Figure 9, you can create virtual repositories from the following:

• Repository folder – The contents of the repository folder are retrieved dynamically from the linked folder in its repository, so that additions and deletions from the folder are always reflected

• Work queue – The contents of the work queue are retrieved dynamically from a step in a workflow engine or a personal in-box.

• Virtual folder – A virtual folder is a container for links to content, repository folders, work items, URLs, and other items within the virtual repository. Virtual folders provide the user with the ability to organize content so that it can be efficiently browsed and accessed in a hierarchy.

• Smart folder – A smart folder contains the contents of a saved federated search on a defined set of repositories. Its contents are updated dynamically. Each time the folder is opened, the search is executed, and the resulting items are displayed.

• Content item – A content item is a link to a piece of content in any WebSphere II Content Edition-accessible repository. A content item may be dynamically linked to the latest version of the content in the repository or may be statically linked to a specific version of the content.

• Work item – A work item is a link to a work item in a workflow engine that is accessible from WebSphere II Content Edition.

• Hyperlink – Hyperlink items can be used to provide pointers to relevant information that is available via a web browser, both internal and external to the enterprise.

Page 15: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 15

Figure 9. A virtual repository created from a variety of content and workflow sources

Subscription event services WebSphere Information Integrator Content Edition’s subscription event services provides a subscription-based event notification service for content, folders, work items, work queues, queries, federated queries and item finders. This service enables an application to be notified when the state of a piece of content or work item changes or when new content is created in the enterprise. With subscription event services, once an items is subscribed to, one or more event handlers will be notified any time a change is made to that item whether the change is made through WebSphere II Content Edition or through the native tools or APIs of the repository or workflow system. Like the rest of WebSphere II Content Edition, subscription event services uses a J2EE service-oriented architecture, consisting primarily of a set of Message-driven Enterprise JavaBeans that send, receive and process JMS (Java Message Service) messages. JMX (Java Management Extensions) is used for timing and statistics collection. Runtime management and administration is done through a JMX-aware JSP/Servlet web application. Central to subscription event services are subscriptions and subscription groups. A subscription is a request to be notified of a change to a particular item in a repository. A subscription can be very fine grained, such as to a specific piece of content, or work item, or to changes in the results of a specific query. A subscription can also be broader such as a subscription to be notified if the content in a specific folder has changed or if different results are returned from a specific query. No matter their scope, subscriptions are

Page 16: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 16

members of a subscription group, which defines how they will be processed in a chain of activities called an event path.

The event path An event path defines how subscriptions in the group will be processed to determine if an event has occurred and then how they will be handled once an event occurs. This includes how often to process the subscriptions, what content monitor to use to determine if any change has occurred, what filters to apply to the change events and finally what event handlers should be notified of the event. Each step in the event path is handled by one or more message-driven Enterprise JavaBeans responding to JMS messages.

• Timer factory – Java Management Extension (JMX) service that manages the creation and deletion of subscription group timers in accordance with the current configuration of the system

• Subscription group timer – JMX service that initiates the processing of a subscription group based on its processing interval and checks for configuration updates to the subscription group based on its heartbeat interval

• Subscription group processor – Message-driven Enterprise JavaBean that iterates over the active subscriptions in a group creating a JMS subscription event for each

• Content monitor – Message-driven Enterprise JavaBean that employs a content monitor plug-in to determine if change has occurred in the subscribed to items.

• Event filters – Message-driven Enterprise JavaBean that employs one or more event filter plug-ins to determine if the change that has occurred is interesting and should be propagated on to the event handlers

• Event handlers – Message-driven Enterprise JavaBean that employs one or more event handler plug-ins to take appropriate action on a given subscription event

Figure 5. The event path through subscription event services

Page 17: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 17

Customizing subscription event services using plug-ins Subscription event services is a framework for creating event-driven or event-enabled applications and as such, subscription event services employs a plug-in architecture to allow the system to be easily customized and extended. While many application scenarios can be handled with the provided plug-ins, some applications will require custom plug-ins or customizations to the provided plug-ins, all of which come with the source code. There are a few different types of plug-ins used in the system: content monitor plug-in, event filter plug-in, and event handler plug-in. A content monitor plug-in must know how to detect change in the subscribed to item. Sample content monitors are provided for detecting changes to items such as content, folders, work items, work queues, queries, federated queries and item finders. Any change can be detected such as additions, deletions, metadata property changes, new versions, security modifications and many more. You can modify the sample monitors to customize them for your own needs or create your own custom content monitors from scratch. The role of an event filter plug-in is to ensure that only pertinent events are propagated to the event handlers. Lots of changes may occur in repository or workflow items that are not pertinent to the business context. Numerous types of filters can be imagined and created as plug-ins and sample event filters are provided to:

• Filter on the specific metadata that has changed • Filter for a specific metadata value • Filter based on the last time an event was generated for the subscription.

Event handler plug-ins execute appropriate actions as a result of a subscription events occurring. The types of actions that can be taken are truly open ended and can include use of the full facilities of the hosting J2EE application server such as:

• Database access through JDBC • Business logic access through Enterprise JavaBeans, • Enterprise information system access through the J2EE Connector Architecture • Sending asynchronous messages via Java Message Service • Invoking web services with JAX-RPC • Other key services such as JavaMail, RMI and CORBA.

Sample event handlers are provided for sending email notifications, logging event notifications and persisting event notifications to the subscription so that they may be retrieved at a later time by the application, such as a portal, that was interested in the subscription. Of course custom event handlers can be created to take the appropriate action for your specific application.

Statistics Detailed statistics are captured to JMX beans at each step of the event path. These statistics include what is occurring in the system such as:

Page 18: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 18

• How many subscriptions are being processed • How many changes are occurring • How many filter and handling events are occurring • System performance statistics such as the least, average and most time elapsed per

plug-in in monitoring for changes, filtering events and handling events. You can see these statistics in the subscription event services administrator web application or in any JMX-aware management console.

Developer and end user services All of the integration and federation services described thus far have to be available to the developers and end users that wish to take advantage of them which brings us no our next stop on the tour. The developer and end user services deliver the capabilities of WebSphere Information Integrator to the applications that need them. These services include:

• An out-of-the-box Web client for knowledge workers • Web components for quickly building custom web applications • Application programming interfaces

Web client

WebSphere Information Integrator Content Edition includes an out-of-the-box web client that is created using the same web components provided to developers of custom applications. This client, shown in Figure 6, gives users a rich and intuitive console for working with content from multiple back end repositories, and for arranging that content according to their needs.

Key features provided by the web client allow users to:

• Execute form-based, free-form, and context-sensitive keyword searches • Save searches • Explore content by browsing repository folders • Preview content in browser-ready formats • Maintain credentials for multiple back-end repositories in the WebSphere II

Content Edition single sign-on system or the enterprise Active Directory or LDAP directory

• One click to 'watch this item', to watch items for property changes or to watch a folder for item additions/deletions

• View alerts created from watched items, with details on the alert and a link to the watched item

• Create private or shared set of content where you can link items from multiple repositories. These can be used to create arbitrary, nested groupings of content, such as by project, customer, or work team

Page 19: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 19

Figure 6. WebSphere II Content Edition web client

Whether deployed as is or customized through the web components and the numerous extension mechanisms, the WebSphere Information Integrator Content Edition Web client is a powerful tool for the knowledge worker to bridge the information gap when faced with multiple silos of information in the enterprise.

Web components Web components are a suite of Web-accessible components that you can use to integrate content-centric features into Web applications. You can integrate them into existing portal and e-business applications, or use them to create sophisticated J2EE applications from scratch. Web components provide access to features from the core WebSphere Information Integrator Content Edition integration platform such as creating, browsing, viewing, and searching content, as well as virtual repository features including single sign-on, permissions, and virtual repositories. Some things you can do with Web components include:

• Rapid application development using the provided component framework. The Web components are implemented on top of an extensible web component model. It is possible to derive new components that incorporate domain-specific functionality from the existing suite of Web components, or to create entirely new

Page 20: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 20

components. Components communicate with one another using a flexible event mechanism and therefore support rapidly creating a custom solution using a mixture of communicating out-of-the-box, customized and custom Web components.

• Incorporate them into existing J2EE applications. The Web components can run side-by-side with other Servlets, JavaServer Pages and other J2EE server components in any existing J2EE application.

• Incorporate them as Java Portlets into a Java Portal. The Web components can be accessed individually or in coordinated groups from a Java Portal (JSR 168 compliant portal) as Java Portlets.

• Incorporate them into remote or non-J2EE web applications. You can integrate Web components into ASP, PHP, Perl, CGI or other web applications. When used in this fashion, you deploy individual web components at known URLs in a J2EE application server or Servlet engine. The non-Java application can then invoke a component at its URL, furnishing the necessary parameters in the HTTP request. The application can display the response directly, or parse the response and present it in some other form.

All Web components have a completely customizable look and feel. A Web component is a logical concept rather than a single physical entity. Each Web component is implemented in the Model-View-Controller (MVC) paradigm. In this paradigm presentation logic is separated from business logic, which dramatically improves maintainability. A JavaBean serves as the model, an Apache Struts-based event model acts as the controller and one or more JavaServer Pages (JSP) and eXtensible Stylesheet Language Transformations (XSLT) templates serve to provide the view. Each component provides a default user interface that may be used out-of-the-box or easily modified for more complex requirements. The user interface for each Web component is controlled entirely by one or more source-licensed templates, most of which require only HTML and cascading style sheet experience to modify. Even more dramatic changes can be made with some knowledge of JavaServer Pages, XSLT and ECMAScript (JavaScript). ECMAScript and HTML are used as the result of the view transformation when using the out-of-the-box templates but alternative views can be rendered such as WML or XML. Many web components provide access to repository content, content management and workflow functionality. The remaining components provide for web-based user authentication and supplemental access controls. The Web components access the same APIs that are available for custom application development; in doing so they provide a higher layer of abstraction for quick and easy development of web applications. The following is a sample of the functionality provided:

• Repository navigation (browse) – Represents a hierarchical "tree" view of content folders, work queues and in-boxes providing methods for selecting, expanding, and contracting the hierarchical organizational structures in repositories.

Page 21: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 21

• Search – Various different search components are provided to support many kinds of searches such as simple parametric, web-style search, full-text search, template-driven search forms and advanced search builders. Searches may also be created and saved in virtual repositories for later use with these components.

• List – Displays the contents of a repository or virtual repository folder, a work queue or in-box or the results from a search.

• Content creator – Allows content to be created or copied into a repository. • Work item creator – Allows new workflow processes to be initiated. • Update – Allows users to update content meta-data and content versions. • View – Allows users to view browser-ready conversions of document or image

content. • Content security – Repository level content security may be updated and

administered.

Application programming interfaces Whether you are customizing an existing packaged application or developing solutions from scratch you need a rich set of APIs tailored to your own development technologies to help you get the job done. WebSphere Information Integrator Content Edition provides a selection of APIs for building applications that need access to integrated, virtualized content, and content management and workflow capabilities.

Integration API The integration API is an easily mastered object-oriented Java API that provides content management and workflow capabilities to your applications. The API and the implementation classes of the API do not change based on the repository providing the services being delivered by the API. This means you can create content-enabled or content management applications that access multiple repositories simultaneously for federation and that are portable across repositories used for different purposes or from different vendors.

Virtual repository API The virtual repository API is a Java API that is built on top of the various virtual repository services and the integration API. The virtual repository API defines a "virtual repository" metaphor for working with managed content, where each virtual repository contains links to content and workflow objects in disparate back-end systems, displayed and organized in an application-centric way. Through this API, applications can:

• Create virtual repositories • Add links within virtual repositories to content on the Internet or to content

accessible by WebSphere Information Integrator Content Edition. • Save virtual repositories persistently.

Subscription event services API The subscription event services API is an easy-to-use Java API for creating and managing subscriptions. Once a reference to the WebSphere Information Integrator item of interest is retrieved with the integration API, the reference is simply passed to the

Page 22: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 22

subscription event services API to create a subscription for that item. Subscriptions are created in a specific subscription group, which determines how and how often the subscription will be processed. Subscriptions can have extensive metadata associated with them that is used in the subscription processing. This typically includes information that the event handlers will use in performing notification of subscription events, for example; email addresses, phone or fax numbers, instant messaging identifiers, URLs, etc.

Web services API The Web Services API (WSAPI) allows enterprise business applications to be integrated with content and content management services that are accessed over the Internet or WAN and through firewalls. WSAPI exposes a very similar set of services as the Java-based Integration API but uses the SOAP and WSDL standards and is accessible over HTTP, HTTP/TLS and SSL. WSAPI is interoperable with Microsoft .NET and Java SOAP development tools. The interfaces are described using the Web Services Description Language (WSDL).

URL addressability Witing WebSphere II Content Edition, all items such as content, folders, work items, and queues have a Universal Resource Name (URN) as a unique identifier. That identifier can be used at any time to retrieve the item through a REST (REpresentational State Transfer)-style web service called URL Addressability. With the URN provided by WebSphere II Content Edition, an application can construct a simple URL to retrieve any item through a standard HTTP (Web) request to the WebSphere II Content Edition server as long as the user or application has sufficient security rights on the underlying content. Both the native (binary) content and XML representations of the item and its metadata can be retrieved. In the XML representation other repository items are referenced by their URL using XLink. Very simple and loosely coupled integrations can be created between WebSphere Information Integrator Content Edition and other applications using this REST architecture. This facilitates creating integrations that are not aware of the WebSphere II Content Edition integration API including using the URL in a hyperlink tag in HTML, sending the URL in email or storing and using the URL in any other application that can handle URLs or HTML.

Connector SDK To create connectors to repositories or workflow engines that are not provided by IBM, you can use the connector software development kit (SDK) provided with WebSphere II Cntent Edition. You can use the connector SDK to extend existing connectors, or to rapidly and easily develop and implement new connectors to other systems, such as electronic document management, imaging, enterprise report management, web content management, PLM, structured or semi-structured data sources and custom repositories. The connector SDK provides access to the connector service provider interface (SPI). The SPI defines the responsibilities from WebSphere II Content Edition to a content

Page 23: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 23

management system or workflow engine and allows the connector to communicate the capabilities of the underlying system to WebSphere II Content Edition.

Security The tour would not be complete without addressing the important question of security, which brings us to our final topic. With WebSphere II Content Edition, user authentication and subsequent authorization are controlled by the underlying data sources, and their security model is in force and respected at all times. Sessions are created for each user with each data source by the connector using data source’s own authentication mechanism. Authorization of access to the information in the data source is then controlled for that session by the data source. In general, this security philosophy is the best approach to ensuring that the investment made in securing the underlying data sources is preserved; however, there are limitations in when dealing with multiple repositories in a federated environment. One issue is that users will be asked to authenticate themselves to each individual data source to be able to accomplish some federated operation such as a federated query or to make a copy of content. A second issue surfaces when dealing with virtual repositories created to support a new business initiative. Often the security requirements of the new business initiative won’t match the security model and permissions in place on the data in the real repositories. In fact, there may be a completely different user community than the one the real repository is aware of, such as a group of users from a partner extranet. To solve these two issues, WebSphere II Content Edition provides two supplemental security systems: the authentication system and the authorization system. Use of these systems is optional and merely serves to supplement, not supplant, the existing security infrastructure in place with each of the existing data sources.

Authentication system The WebSphere Information Integrator Content Edition authentication system provides a mechanism to provide single sign-on authentication to users in a multi-user, multi-repository application environment. Single sign-on means that the user authenticates once to the single sign-on system which handles subsequent authentication rather than authenticating individually to each data source being accessed. The authentication system includes a Service Provider Interface (SPI) for creating pluggable implementations that interface with existing authentication mechanisms such as Active Directory, LDAP or Kerberos. Implementations are available out-of-the-box for Active Directory and LDAP, as well as a reference implementation that requires no external naming directory. Each of these provided implementations uses a secure “password vault” approach to single sign-on ensuring that single sign-on is available across all data sources. Other approaches are also viable and are supported with the pluggable nature of the system.

Page 24: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 24

Authorization system In cases where the access controls provided by data sources are not sufficient or are not specific to the task at hand, the authorization system provides fine-grained access control over virtual repository nodes and other application objects. This access control is provided to supplement the existing repository access controls that are already in place as well as for application objects that exist outside a secured repository. The authorization system includes a Service Provider Interface (SPI) for implementing arbitrary permission models and a reference implementation of the SPI that provides a hierarchical, rules-based permission model able to permission any defined hierarchy of actions over any hierarchy of objects with many different rule types supported. The authorization system is very extensible and therefore rather complex, the recommendation is to use it to create application-specific security models that expose just the aspects of the authorization system that are needed for the specific access controls the application requires.

Summary Real-time access to all forms of information--regardless of where it resides--is essential to many organizations. That means being able to leverage existing enterprise information to support new initiatives, such as portals, collaborative commerce, customer relationship management, records management and other key applications. Yet organizations have inadvertently created silos of data and automation that inhibit employees and business processes from accessing information when needed. That's because multiple and different systems are used throughout the enterprise to store and manage information. Industry analysts say that 80% of data captured and stored within organizations is in the form of unstructured data (documents, images, reports, product diagrams, Web pages, audio, video, and other content) so often times an application will need a content-oriented access paradigm for working with the information stored across the enterprise. While data integration tools, such as WebSphere Information Integrator, have succeeded in enabling better access to enterprise information using a relational (SQL) data paradigm, there is still the issue of providing access to that same information using the semantics that are more common to unstructured content such as object-oriented content management APIs and content-oriented user interfaces. Working stand-alone or in conjunction with WebSphere Information Integrator, WebSphere Information Integrator Content Edition creates an environment where employees, customers, and business partners have access to all the information they need in real-time.

Resources Additional information on WebSphere Information Integrator Content Edition can be found at http://www.ibm.com/software/data/integration/db2ii/editions_content.html.

Page 25: A Technical Tour of IBM WebSphere Information Integrator ... · technical tour. Figure 2. An architectural overview of WebSphere II Content Edition Integration services Our first

A Technical Tour of IBM WebSphere Information Integrator Content Edition 25

About the author

Sean Johnson is a lead architect on the WebSphere Information Integrator Content Edition team. Sean has been a lead architect on the WebSphere II Content Edition product since its first inception and has 10 years of experience in the information integration, content management and workflow markets.