directions in web content management - dennis david & family

31
Directions in Web Content Management An Overview and Primer for Business and Information Technology Managers Bart Miller Dennis David Burntsand Inc. Copyright © 2002 Burntsand Inc. All Rights Reserved.

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Directions in Web Content Management

An Overview and Primer

for Business and Information

Technology Managers

Bart Miller

Dennis David

Burntsand Inc.

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Table of Contents1. INTRODUCTION ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 1 PURPOSE •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 1 SCOPE •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 12. SOLUTION DESIGN •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 2 THE SOLUTION ARCHITECTURE ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 23. CONTENT CREATION AND CONTENT MANAGEMENT •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 6 XML VERSUS HTML ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 6 When to use XML ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 7 When to use HTML ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 7 XML FEATURES •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 8 XML Support ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 8 Re-use •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 8 Chunking ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 9 Specialized XMLs ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 9 XML Transformation •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 9 CONTENT CREATION ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 13 Authoring Tools •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 13 WORKFLOW ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 14 Flexibility •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 14 Instant Publishing ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 15 Handling Exceptions •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 15 SECURITY IN CREATION AND MANAGEMENT ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 15 LOCALIZATION •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 16 What are Globalization, Regionalization, and Localization? •••••••••••••••••••••••••••••••••••••••••• 16 Multi-language Content Relationships •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 174. DELIVERY AND PRESENTATION ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 18 DELIVERY •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 18 PRESENTATION •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 19 Personalization •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 19 Publishing to other media •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 20 Search ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 20 Ensuring performance •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 245. SYNDICATION •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 25 FORMAT "STANDARDIZATION" ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 25 FEATURES ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 266. FUTURE DIRECTIONS ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 27 PORTALS AND PORTLETS •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 27 ALL JAVA; ALL MICROSOFT ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 27 DIGITAL ASSET MANAGEMENT AND DATA DRIVEN IMAGES •••••••••••••••••••••••••••••••••••••••••••••• 287. SUMMARY •••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••• 29

1

Copyright © 2002 Burntsand Inc. All Rights Reserved.

1. IntroductionBusiness owners have become swamped with requests to get information across different countries and clutures to reach employees, customers, partners, and suppliers. Business managers and IT directors have responded by placing huge amounts of information on the Web. The pressure to make existing information accessible has resulted in companies launching a hurried hodgepodge of Web initiatives, often department by department. One-off departmental solutions threaten the company’s ability to ensure a consistent corporate brand, increase administration time and cost, and often force the departmental Web teams to be experts in writing, presentation, and code.

Business managers need to get product information to customers, the sales force, partners, and suppliers. Marketing executives wish to ensure communication of a consistent corporate message. Geographically dispersed branch offices demand a localized message catering to the culture, language, pricing, and specialized solution requirements of their locale. The legal and compliance team shares concerns regarding security and accountability. The IT group is asked to coordinate content feeds from disparate departments and systems from both inside and outside the organization, administer the systems, manage content, code, and presentation updates, and plan for the future.

Many of the solutions embarked upon in the last couple of years have been departmental point solutions solving a particular business manager’s immediate need for better communication. This has been at the expense of long term planning, efficient business process, and a unified message. Without a solution that can centralize management, while enabling decentralized and flexible content creation, these companies are crippled. The evolving marketplace places value in the ability to deliver a rich, dynamic, and personalized experience. With planning, there are solutions available that allow a large company to be nimble and a smaller company to have a publishing platform that will be strong today and into the future.

Purpose

This paper is intended for hands-on business managers and business-savvy IT directors seeking information and knowledge concerning critical technical and process issues and current solutions for managing their Web assets into the future. This paper avoids "bleeding edge" technology but rather draws on leading edge experience in the creating, managing, and publishing of words, graphics, presentation, and code to the Web. Recommendations are made for best practices for conquering the major enterprise Web content management issues of today and into the future.

Scope

The top Web content management issues are examined at a level that business managers and IT directors will find useful to begin planning an enterprise Web content management solution.

2

Copyright © 2002 Burntsand Inc. All Rights Reserved.

2. Solution DesignA dynamic, localized Web experience has become de rigueur.

Customers, suppliers, partners, and employees expect to see current information that is relevant to them in a way that respects their culture and language and your relationship with them. Suppliers, partners and customers require up-to-date product literature. Current prices and products available in one’s region are expected. Content flows in from partners and suppliers and content creators both internal and external to the company. Syndicated industry news feeds, contributions from authors outside of the company, and parts specifications all need to be managed with a flexible process and appropriate security. Today the organization must provide content that is relevant to interested parties based on the context of the engagement. Press releases must be presented on the site according to a schedule or, at times, instantly. Performance to one’s Web browser must be fast.

Business process associated with content creation and management are established to support the business owners’ requirements and to facilitate the administration of the Websites. Separation of code, content, and presentation allows authors to write, coders to code, and the creative team to concentrate on look and feel. Content is likely to be pulled from multiple repositories and databases, but should be "searchable" through one interface. And the functionality of legacy applications should be surfaced through a common Web portal interface.

Today these are the base requirements for a Web experience. Any solution should be designed from the start to meet these requirements or provide a platform that will enable modular addition of this and other future functionality. For example, an organization might not be ready to add multi-language content, but the solution designed should have the facility to do so when the organization is ready to act. Personalization, central management of multiple repositories, and syndication are all examples of functionality that a solution should be able to handle today whether or not they are utilized in the first phases of the system.

3

Copyright © 2002 Burntsand Inc. All Rights Reserved.

The Solution Architecture

There are a couple of standard architectures used that are the best choices for designing a Web content management and publishing solution for today’s requirements. In the past, departmental solutions pulled all content from the department’s file-based system (see Figure 1-1). The enterprise today demands the ability to pull content from multiple department CMS as well as from databases and perhaps from syndicated sources. Today’s enterprise solutions are n-tiered solutions utilizing a strong Web content management system (WCM) such as Documentum, a database, and either a Java application server or Microsoft .Net platform, with access to multiple content stores, databases, legacy applications, and syndicated content.

A Java application server like BEA WebLogic or IBM WebSphere or a Microsoft .Net server is used to dynamically assemble the content and present it to the users (see Figure 1-2). Presentation of dynamic, personalized, or localized content does not mean that every request needs to be passed all the way back to the content management system. Caching of the content may be done at the application server level as well as a level above this with a content distribution product such as Akamai or Marimba. This content is stored and managed in a WCM, such as Documentum,and, perhaps, other repositories such as a database, file system, or even a "home grown" legacy content management system. Java Server Pagers (JSPs) and Active Server Pages (ASP’s – Windows) and other code that is used to assemble and present the content using an application server can be stored in a content management system, however it is usually better to maintain a separate code repository using a Code Versioning system (CVS) or Visual Source Safe because content and code behave and are utilized in different ways.

A technology soution that is thoughtfully designed works hand-in-glove with the business process it is required to support. Effective Web Content Management should encompass the processes of content creation, content management, content publishing, and content presentation.

5

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Figure 1-2 Enterprise Web Content Management

6

Copyright © 2002 Burntsand Inc. All Rights Reserved.

3. Content Creation andContent Management Content creation is generally considered part of content management, but the two are separated here to highlight the criticality

of content creation issues. Content creation, as treated here, includes not only the tools used to write and capture content, but also the format in which the content is saved. Content management refers to all the business process required to shepherd content from its origin to its retirement and includes important constructs such as workflow, security, and the lifecycle.

One of the most important choices in Web content creation is the format in which writers create content. All writers should be able to create content in a template-based, code free environment. XML or HTML (respectively Extensible Markup Language and Hypertext Markup Language) are the most common creation formats today. The choice between XML and HTML is examined in the section below, but it bears noting that XML provides the flexibility that facilitates personalization, syndication, commerce, and sharing of content with partners. The format choice for content also touches on one of the most important content management issues of today. This is the separation of text, images, presentation (layout), and code. Separation allows people to do what they do best. Writers should be masters of writing on their topic. The creative team should concentrate only on look and feel of the Website. This leaves technical folks to concentrate on coding applications, making personalization and commerce work, and administering the applications. Marketing experts and business owners should be able to set up scenarios and manage the strategy for the overall site. Too many organizations have writers writing around code and have business managers playing with HTML. The tools exist today to facilitate a separation of code and content and organizations that use these tools achieve an ROI on their investment quite quickly.

XML versus HTML

How should the text that writers author be stored?

This simple question points to issues in many areas from how the content will be used (Only on the Web? Sent to wireless devices? Will it be syndicated out? Will parts be re-used over and over? Will it be translated?) to questions about the authors (Will all the writing be done within the Intranet? Will there be third party writers? Do the writers specialize in one type of content or are they generalists?). Answers to these questions and some others help us determine whether XML or HTML is the best choice for saving Web content (see Table 1-1).

XML and HTML appear similar to the casual eye having both evolved from SGML (Standard Generalized Markup Language – first adopted widely for technical documentation associated with complex products such as jet aircraft). The big difference is that HTML has evolved into a markup language that describes the look and feel of a Web page. XML on the other hand does not describe how the page looks but rather it defines the words that make up the content. This separation of structure and display has some important advantages in making XML more portable and usable in many different types of applications.

7

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Another distinction regarding HTML is that it is a specific markup language that contains a fixed set of elements and attributes. HTML has a limited repertoire of structural tags like headings, lists, and links, some tags for encoding formatting information like text attributes and layout, and very few tags for encoding types of information content. Content creators have resorted to using elements and tags far beyond their original intention making it difficult for search engines and automated processes to exploit Web information because of the lack of reliable semantic encoding.

XML can address these limitations and give the Web a much stronger capability for electronic commerce. Because XML is not limited to a fixed set of elements and attributes any element and tag combination can be defined in an XML schema or DTD (Document Type Definition – essentially, complex layout rules) and can be as simple or complex as required. To facilitate the usability of these new structures or elements requires a common schema between the creators and consumers of the content. Early adopters included certain industry groups, such as the Air Transport Association and regulatory bodies, for example, the Federal Aviation Administration, were some of the first to define standard schemas and tags enabling companies to easily and reliably exchange information.

When to use XMLXML does not describe how a page looks, how it acts or what it does. XML describes the content it contains. This distinction results in a document that is portable making XML documents easier to re-use and to share with other organizations and entities. In addition, personalization and localization are facilitated as the content is strictly described with tags as well as metadata. XML is preferred for commerce and syndication as well.

The rules defining an XML document are contained in an XML schema or DTDThe schema may define the common business rules of the organization or some organizing body. At anytime during the editing process the XML document can bevalidated against this schema. The rules cover not only the valid elements but also their relationship, their order and whether they are required. There are no <Company_Description> or <Sales_Price> tags in HTML, but these tags can be created with XML providing rich content description. Specialized XMLs may provide a lingua franca for specific industries or types of documents. Because of the capability to produce and enforce strict document structures, these elements can be chunked into separate objects that can easily be shared. Different individuals can also edit a document consisting of separate objects concurrently.

A Website that is dynamic, relies on syndicating content, re-uses chunks of content in different documents, or is heavily weighted to commerce will benefit greatly from the use of XML.

When to use HTMLBecause most currently-used browsers only support HTML, XML intended for the Web needs to be translated to HTML prior to its placement on the Website using an XSL style sheet. Of course, creating the content in HTML eliminates this step. Because HTML has a fixed set of elements it does not require a separate rules file or schema. The syntax is more or less consistent regardless of the browser being used.

As a result of its simplicity, HTML is easy to create and is faster to process. A static Website refers to a site with content that will be rarely updated, changed according to a user’s behavior or profile, shared, or utilized for commerce. Today, HTML is often a better choice for organizations with a small amount of content that changes infrequently.

8

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Table 3-1 Choosing XML Or HTML For Web Content

XML FeaturesXML SupportAn important development in the XML standard is the replacement of the DTD (a legacy of SGML) with XML Schemas for defining datatypes. Schemas offer a range of new features over the DTD including: • Richer pre-defined datatypes. Booleans, numbers, dates and times, URIs, integers, decimal numbers, real numbers, intervals of time, etc. Accepted standards for pre-defined datatypes mean that my definition of something called <Date> is the same as your definition. • In addition to these simple, predefined types, there are facilities for creating other types and aggregate types. XML does not restrict users to pre-defined types. Thus the Walt Disney Company might create a tag called <Mouse_Ears> that is useful and understood throughout the firm. • User-defined types. Custom named data types can be created that are inherited from what are called archetypes. • Attribute grouping. This allows common attributes that apply to all elements in a schema to be explicitly assigned as a group. • Namespace support. This allows the co-existence of multiple schemas without name conflicts between those schemas.

Re-useOne of the features facilitated by XML is reuse. A blurb identified as <Legal_Disclaimer> can be managed separately from all the documents to which it is attached. Thus, when the legal team finds that there is a reason to update all the disclaimers with a new phrase, any document using the <Legal_Dislaimer> tag will inherit the latest legal blurb. The re-use of XML can be troublesome if the content is not well managed. When objects are nested within other objects and each is shared across the organization, business rules governing the modification of shared objects need to be well planned. Scenarios where one part of the organization accepts a change while another part does not will result in new object creation and a severing of the sharing. Central management of company-specific XML tags, such as <Mouse_Ears> is recommended.

Language Advantages Disadvantages

XML • Can be chunked into separate objects that can be shared and edited individually.• Validation enforced through a schema or DTD.• Separation of structure from display lends to portability of content. • Requires a common schema or DTD between the creator and consumer for data interchange.• Requires translation to HTML for use by most browsers.• Language still evolving forcing developers to chase a moving target

HTML • Largely consistent syntax• Little infrastructure required with no agreement necessary between the creator and consumer regarding rules for structure or display.• Supported by all browsers ß Because of limited elements many elements are used in non-standard ways• Structure not enforced during editing process•Structure does not lend itself to being divided into separate objects beyond the use of include statements

Language Advantages Disadvantages

XML

HTML

• Can be chunked into separate objects that can be shared and edited individually.• Validation enforced through a schema or DTD.• Separation of structure from display lends to portability of content.

• Requires translation to HTML for use by most browsers.• Language still evolving forcing developers to chase a moving target

Language Advantages Disadvantages

XML

HTML

• Can be chunked into separate objects that can be shared and edited individually.• Validation enforced through a schema or DTD.• Separation of structure from display lends to portability of content.

• Largely consistent syntax• Little infrastructure required with no agreement necessary between the creator and consumer regarding rules for structure or display.• Supported by all browsers

• Because of limited elements many elements are used in no-standard ways• Structure not enforced during editing process• Structure does not lend itself to being devided into separate objects beyond the use of include statements

• Requires a common schema or DTD between the creator and consumer for data interchange.• Requires translation to HTML for use by most browsers.• Language still evolving forcing developers to chase a moving target

9

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Chunking XML eases the breaking apart of documents into parts that make sense by themselves. This is called chunking. The chunking of a document or object based on element tags is the backbone of any XML strategy. Anypiece or grouping of the overall content can be chunked into a "unit" if it is tagged correctly and in a fashion that conforms to the document structure. If the content were in the form of a book then tags such as <section>, <chapter>, <paragraph> would be logical chunking points. The document in the repository is stored as multiple objects and combined into a single instance when edited. The intelligence for linking back to the multiple objects is maintained in a virtual structure allowing the user to edit parts of a document without locking the entire document.

Specialized XMLsAn XML document is only usable if the creator and consumer of content are using similar or derived schemas since it is the schema that tells the application how to "read" the document by defining its elements. Special "XMLs", or XML schemas, have been created for the exchange of content and data across multiple organizations or even one organization. This common usage allows for syndication where content is bought or shared from an outside source and displayed within another Website. The following specialized XML-type formats have garnered significant support: • The Reuters-developed NewsML allows news providers to combine and re-combine pictures, video, text, graphics and audio files, creating custom information for different audiences and different output devices. • ebXML (Electronic Business using eXtensible Markup Language), sponsored by UN/CEFACT and OASIS, is a modular suite of specifications that enables enterprises of any size and in any geographical location to conduct business over the Internet. • Universal Description, Discovery and Integration (UDDI) is a specification for distributed Web-based information registries of Web services. This directory is analogous to the Yellow Pages allowing an organization to find applications accessible over the Web. • SOAP is a framework that allows one program to invoke service interfaces across the Internet, without the need to share a common programming language or distributed object infrastructure. • CML is an XML-based language for describing the management of molecular information on computer networks. CML will use a Java-based viewer to view and manipulate molecules in 2 and 3 Dimensions. • MathML is a W3C sponsored XML application used to define mathematical formulas, expressions and notation on the Web.

Whether using XML or HTML, the content creation application is an important consideration in the management of content.

XML TransformationIf most of the content on your Website is created in XML format it will need to be transformed into HTML, WML (Wireless Markup Language – for cell phones and other mobile devices) and PDF (Portable Document Format from Adobe Systems) for presentation. Several approaches to this transformation are outlined below.

• Single pipeline • Multiple pipeline • Combination pipeline / Early Transformation

10

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Single Pipeline

In the single pipeline approach (see Figure 3-1), the Web application's JSP pages generate client-specific markup by applying transformations to incoming XML data. Each type of client requires a different style sheet and the bulk of the development costs are associated with creating and maintaining these stylesheets.

This approach defers generation of both the static and dynamic portions of a response to runtime. The runtime costs are associated with:

• Parsing the XML data

• Parsing the style sheet

• Applying the transformation

Figure 3-1 Single Pipeline XML Transformation

11

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Multiple Pipeline

The multiple pipeline approach (see Figure 3-2) uses a set of client-specific JSP pages to generate output. As compared with using Extensible Style Language Transformations (XSLT), this approach keeps the work of creating the static content in the development phase with the dynamic content generation occurring at runtime.

Aside from creating the client-specific JSP pages, additional development costs are incurred in creating and maintaining server-side objects that represent the application's data abstractions. This step is not required in the single pipeline approach. Nevertheless the multiple pipeline approach can be more cost effective than the single pipeline for the following reasons:

• Data abstractions can be reused by different kinds of JSP pages.

• Data abstractions typically change at a much lower rate than presentation.

• Executing a JSP page to generate markup is much more efficient than performing an XSLT transformation to generate the same markup.

Figure 3-2 Multiple Pipeline XML Transformation

12

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Combination Pipeline

You can combine the single and multiple pipeline approaches (see Figure 3-3). If your international clients require localized presentation (language, units of measure, etc.), you probably should use one pipeline for each set. To generate dialects of a language, you can apply XSLT transformations to that language's pipeline.

Figure 3-3 Early Creation XML Transformation

13

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Content Creation

The separation of code from content begins with the authoring environment. Experience has shown that this should include a forms-based editor or templates. A forms-based editor allows a writer to choose from pre-defined templates, such as "Press Release" or "Product Description". Non-technical editors may create XML/HTML content using a form. Users fill in the form with the right text, which is stored as XML/HTML content. These content templates may be associated with rules files and presentation files. Rules files define the form that should be presented to the users to create the content and presentation files determine the look and feel of the content to aid in previewing. If the content is part of a highly dynamic or personalized site a single piece of content by itself might not make sense. A context may be provided to writers or editors in which to preview their content. For example a writer might be producing a piece for a Web page that has a header, footer and a two-part body section. This writer might be assigned to create only the first body section, but needs to preview the second body section. The rules and presentation files facilitate this previewing. Templates enable a writer to fill in a form without the distraction of working around code. In addition, it is easier to re-use text if it can be separated from the presentation code.

Most strong Web content management systems have editors that allow creation of templates for specific types of content. In addition, there are a number of authoring tools that may be integrated with the Web content management system. A standalone tool may be useful to an organization due to a richer authoring environment or simply because a majority of influential content creators within the organization already use it.

Authoring ToolsMany content creation applications now have at least some XML support including the latest versions of Microsoft FrontPage and Microsoft Word. Choosing the right XML tool or tools involves measuring the level of XML support in the application and the strategy of the toolmaker going forward. Some applications were built around XML while others have evolved from prior support of SGML or propriety schemas.

Arbortext’s Epic evolved from an SGML authoring tool while Softquad’s XMetaL application was created for the purpose of editing XML content. XMetaL’s native editing format is XML. Some applications edit in proprietary formats and output to XML. Others maintain a virtual object model such as Document Object Model (DOM) while editing. Some applications are now including XML schemas.

There are a number of mature HTML tools on the market including Macromedia’s Dreamweaver and Microsoft’s Frontpage. Dreamweaver supports the WebDAV standard for interfacing with Web content management systems such as Documentum, Vignette, Interwoven, and others.

Content creation, whether done in-house or by third party authors, must be integrated into the Web content management business process and into the workflow.

14

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Workflow

Workflow refers to the tasks associated with the evolution of content from its creation to its retirement - its lifecycle. It is the business process that starts with identifying what needs to be created, such as a product drawing or written product specifications, moves through editing and approvals, continues to publishing and is removed and archived at retirement. Web content management systems facilitate the mapping and automation of this workflow to the extent needed and no more than is needed. A business manager should be able to create, edit, manage, and override a workflow through a graphical user interface or GUI.

FlexibilityA business process may be mapped into a workflow but, like a business process, it must be flexible. Tasks may be redirected or spawn other tasks. The act of walking through and mapping an existing business workflow such as bank’s release of a new lending product’s literature should result in the re-examining and often streamlining of the procedure. A Web content management system’s workflow should further streamline and automate the procedure’s hand-offs.

Single process workflows may be chained together with other workflows or suspended and later restarted. Tasks often run in parallel and are sent to multiple parties. These tasks may require different acceptance rules where one or all parties are required for approval. Rejections must also be handled in different ways. Tasks may be rejected back to the previous owner or back to the beginning of the workflow. They may also require a different set of tasks to be completed prior to continuing along the main path.

Tasks may be routed outside of the Intranet or beyond a domain or firewall. Functions need to be attached to the workflow to route objects to these parties. Their response must be sent back to the workflow. Connections to other products or interfaces need to be available or easily created. Workgroup applications such as Lotus Notes may be part of the workflow.

As organizations continually change employees, positions within an institution or firm may change as well. Thus tasks should be sent to roles rather than individuals. A level of abstraction provided by roles and aliases provides flexibility to meet these changes. The ability to route tasks to groups such as "Graphics Team" or "News Editors" allows for shared responsibilities and some load balancing.

All of these features should be studied when comparing a Web content management application’s workflow functionality.

15

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Instant PublishingThere are times when content must be instantly published by passing the standard workflow. Web content management systems use a couple of mechanisms to facilitate the "We’ve Got To Have It On The Site Now!" moments. One uses a workflow in conjunction with a document state mechanism such as a lifecycle. Publication is based on a piece of content’s state. Changing the content’s state in the lifecycle allows a user to promote the content from a preliminary state to a publish state immediately and never enter the workflow. Alternately an emergency workflow leaving out certain tasks may be created and selected by the user when a document needs to be instantly published. A workflow may be set to publish a piece of content at a specific time handling things like the need to have a press release published the minute the stock market opens – or closes.

Handling ExceptionsJust as a workflow strategy must allow for emergencies it must also expect exceptions. One common exception is a tasks rejection. The workflow must not come to a stop simply because a participant in the workflow refuses to undertake an assigned task. If the participant explicitly refuses the task, there needs to be logic to continue the workflow. If the refusal is implicit, a time trigger needs to re-start or escalate the workflow. Escalation is based on some set of metrics that are reached while within the workflow process that did not exist previously. The duration of a document within a workflow may allow it to skip certain upcoming tasks. The completion of a workflow by another related document might trigger some type of escalation. The ease of implementing a rejection strategy or escalations with a minimum amount of code or a complete lack of custom code is highly desirable. An exception state is one useful mechanism that allows sophisticated business logic to be attached. Having multiple "hooks to hang code on" allows the developer to circumnavigate any interface limitations.

Security in creation and management

In a strong Web content management system all content has some type of security. This security is usually handled by a set of permissions. The security of a piece of content controls its availability. Security can be at the directory or folder level or at the object level.

Lifecycles relate to security in that a piece of content’s availability often changes over its lifetime. For example, the writer who authored a news article needs to be able to have access to that article during the creation process. However, once that article is published, it often does not make sense to allow an author to alter or remove that content – especially if the author is a contractor to the company. Having the ability to integrate this availability or permission level to a piece of content’s state change simplifies the management of the security model. This is closely tied to the group to which a user belongs.

16

Copyright © 2002 Burntsand Inc. All Rights Reserved.

The permission level or access to content at any point in time is often different for individuals and groups. For example during that draft stage the owner could have delete privileges while the world at large may not have any privileges at all or when a document is up for approval the reviewer may have only read permission while the owner has write permission and only an administrator can delete the document. This functionality is accomplished through an Access Control List (ACL), which lists each group and their particular permission level. This list is then attached to a piece of content. Every piece of content can have a different list and a document can have the list or ACL replaced by a new one as it goes through its lifecycle.

Localization

As organizations do business across cultural, regional, and geographic boundaries their need to provide information that is specialized for the locale increases. Today, businesses are expected to tailor products specifically to a market and their Web information should respect the culture and language of these locales. A number of terms have been used that attempt to define these requirements, however, the terms may be confusing.

What are Globalization, Regionalization, and Localization?When referring to Websites these terms describe different perspectives of the same problem. That is the need to make your content not only visible but also relevant and attractive on an international scale to many different audiences:

• Globalization refers to the spread of business beyond local areas. In Web content management it is the framework necessary to support regionalization and localization as defined below. Support for different character coding such as double-byte characters or monitoring time and measurement conversions is an important part of globalization. A Web content management system’s way of handling text and graphics, in different layers so that the text can be changed without altering the graphic, is an area often over looked but vital to many globalization initiatives. For example, Adobe’s Altercast server technology provides the capacity to develop data-driven images for the enterprise.

• Regionalization refers to the support for a layer of abstraction or logic separate but riding on top of localization. The grouping of locales into a region with a common attribute such as main religion or language is a common usage.

• Localization covers anything that involves altering specific aspects of a piece of content for a local market. This can mean the correct currency, decimal characters, date format, custom, or language (French Canadian versus that used in metropolitan France). Few Web content management systems have a strong capability today. Documentum is a leader in this regard.

17

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Multi-language Content RelationshipsThe relationship of the same content in multiple languages needs to be modeled in a construct haing aspects of rendition and relate relationships, but also requiring a facility to attach business logic. The Web content management system must have the ability to model different language relationships such as Mandarin and Cantonese or English and French in France and Canada. Cultural usage should be respected as well.

For example, in China the local dialect maybe Cantonese but the official language is Mandarin. Depending on what is being displayed the user experience may be quite different depending on the dialect in which the information is presented. News may be thought of as less serious if it is presented in Cantonese while lighter content may seem standoffish if presented in Mandarin. In France, English may be used as a fall back language but in Canada French and English must both be equally available. In essence, support of locales enables the multi-language requirement to be brought down to the sub-page level.

18

Copyright © 2002 Burntsand Inc. All Rights Reserved.

4. Delivery and PresentationDelivery and presentation refer respectively to the process of deploying content to an application server or Webserver and sending the appropriate content to the end user’s browser. Administration, security, performance and personalization are heavily impacted by a choice of products, solution architecture, and application design.

Delivery

Website that receive a higher volume of traffic often require multiple application servers and Webservers to speed content to users and to provide some level of fail over if a piece of hardware fails. A Website deployment application provides streamlined replication and synchronization of a Website from a single source to multiple Web servers across the globe. Consistent content must be delivered across multiple servers.

An IT decision maker is able to control how and when content and applications are distributed across heterogeneous server platforms and through firewalls as appropriate. Functionality that should be expected includes:

• Centralized administration and near real-time monitoring of content distribution

• Support for multi-stage deployment and multi-version rollback. That is, it provides verification of the receipt of content by all servers before the content is "turned on." An operator can automatically roll back an installation or update when necessary.

• Scalability, in the sense of handling both large files and large numbers of files

• Cross-platform support across Microsoft or Unix preserving file attributes such as symbolic links in Unix. Similarly, permission attributes are preserved.

• Ability to automatically repair damaged applications and data files

• Byte-level differencing, so that during updates, only the content that have changed are downloaded

• Bandwidth management, including compression, incremental updating, and bandwidth throttling

• Authentication, channel signing, encryption, and integration with leading public key infrastructure solutions

19

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Presentation

Content presentation refers to the content and its look and feel that is available to a user on the Website. As discussed here, it encompasses the transformation, personalization, speed of response, and availability of information on the site. A Web content management system’s administration of each of these is critical to achieving an optimal cost-of-ownership while maximizing the value that a Website can deliver.

PersonalizationThe display of content in a targeted manner for an individual or type of individual is a function of personalization. The intent is for the user to feel an increased level of interaction with the content provided. Its effect is often described as "enriching the user experience". The reality is that personalization can provide the right information to the right Website user saving time and money or increasing revenue.

Personalization can be broken into four types: 1) Explicit 2)Implicit, 3) "My", and 4) Scenarios. There is overlap among these. Explicit personalization enables the identified user to offer a profile that the Website may use to send associated content. Implicit personalization attempts to "learn" a user’s needs and direct them to certain sections of the Website based on current or past activity. The "My" type refers to the explicit personalization sub-category commonly associated with the "My Yahoo" portal implementation. The identified user is usually provided with a list of topics or components they wish included in their view of the Website’s content. Scenarios are a sub-category that can include implicit and explicit facets. Scenarios analyze a time series of user activities to direct targeted content to the user based on where he falls within the developing situation. Predetermined routes are created for the consumer to follow usually without their knowledge. A simple example of a scenario is one in which a consumer is sent an email offering special pricing or terms for a financial services product after declining to purchase it at list from a Investment specialty Website.

A Website is able to target content for personalization with the aid of metadata. Metadata categorizes or "attributes" the content in ways that may determine its suitability to a particular user. This categorization often cannot be determined directly from the content. A profile of the document is created. The metadata is entered in multiple attributes and programming logic is then executed on this metadata.

Adding personalization to a Website involves attributing a piece of content and building personalization rules or logic. Today, a personalization engine should provide the business user with an interface for creating rules and scenarios.

20

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Publishing to other media If publishing to other media is a requirement, then the original content should be in XML so that it is in fact, media. This content would then go through a transformation into the appropriate format and resolution needed to present it via a specified medium.. It may be transformed into HTML, WML, PDF or some proprietary language or format for CD-Rom publishing. Enabling this functionality is a foundational requirement for content management systems.

SearchSearching content from a system using one content repository, one Web server, and static pages is as simple as spidering the content and indexing the metadata. A single departmental solution (see Figure 4-1) often has these simple requirements.

Issues:

• Relevance of results

• Access control

• Speed

• Content lifecycle

21

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Figure 4-1 Departmental Search

23

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Figure 4-2 Enterprise Search

24

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Ensuring performanceAs a Website grows in size, complexity, and popularity performance concerns increase. The integration of authoring, content management, presentation, delivery, and networking becomes critical. Slapping a solution together with improper knowledge, sizing, or products that do not work well together can lead to high services costs and poor end user performance.

Caching content for a static site is a simple endeavor, but caching for a personalized site utilizing chunked XML content can be a bit more interesting. Content may be cached in multiple places including through a content distribution product such as Akamai or Marimba or with the Java or Microsoft .Net application components.

A content distribution application delivers content to servers that are remotely distributed from the enterprise in order to be physically closer to end users (thereby increasing performance). These solutions are offered as products or services depending on the detailed requirements of the organization.

If the application server uses JSPs on the Website to assemble and present the content, caching tags in the JSPs is an option. BEA Weblogic provides caching. Caching tags allow content generated by the JSP code between the cache tags to be cached. This allows the cached content to be used instead of processing the JSP code. Each cache is identified by the cache-key. Caches can be defined at application or session level. Timeouts can be defined for caches, so that caches are re-computed after certain time. Caches can also be flushed at individual level or application wide level. This basically allows intelligent caching of processed content.

25

Copyright © 2002 Burntsand Inc. All Rights Reserved.

5. SyndicationIn general, syndication is the supply of material for reuse and integration with other material, often through a paid service subscription. The most common example of syndication is in newspapers, where such content as wire-service news, comics, columns, horoscopes, and crossword puzzles are usually syndicated content. This model has been transferred to the Web where the need for content rivals anything that came before.

Format "standardization"

Syndication requires some standardization in content formatting. In particular there is the Rich Site Summary format (RSS), which is a portal content language. RSS is a lightweight syndication format that has gained widespread acceptance in its brief life. RSS feeds carry an array of content types: news headlines, discussion forums, software announcements, and data. A myriad of applications based on or extending RSS are becoming available."My.UserLand", an RSS-based portal, creates archived snapshots of content on an hourly basis turning itself into an RSS "aggregator".

NewsML, developed by Reuters, provides another standard for news items. Simple Object Access Protocall (SOAP – an XML tagging standard) encoding defines a set of rules for mapping programmatic types to XML. This includes rules for mapping compound data structures, array types, and reference types. With respect to compound data structures, all data is serialized as elements. This is useful if the data is very structured or contains complex relationships.

Organizations planning to receive or deliver syndicated content may wish to investigate using RSS or some other syndication standard for the creation of part or all of its content.

26

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Features

Different delivery schemes are available, with some of the latest WCM versions offing sets of services for package delivery, including scheduled syndication "push," on-demand syndicator "push," and subscriber "pull" services.

• If the subscriber specifies pull delivery, the subscriber always makes the requests and the syndicator always responds -- content is delivered only when the subscriber requests it.

• Push delivery, on the other hand, requires the subscriber to run a Web server to handle pushed deliveries, which come in the form of an HTTP request from the syndicator.

• "On-demand" syndication enables subscribers to get information updates when they want but requires the syndicator to deliver it.

Similar to a one-stop shop is an aggregator. Aggregators resell content from a range of publishers and providers, this content is then packaged to match the audience profiles of a particular Website. Prominent aggregators include Comtex, Wavo, ScreamingMedia, NewsEdge, and iSyndicate. Moreover.com takes it one step further by aggregating links to other sites. It is not uncommon for businesses to become aggregators of multiple syndication streams simply for its own company or partners.

Bottom line: Syndication functionality and services are fast becoming a base feature of enterprise-scaleable content management systems.

27

Copyright © 2002 Burntsand Inc. All Rights Reserved.

6. Future Directions Portals and Portlets

A portal is loosely defined as a Website that provides access to features and functionality of various underlying Web applications often on one page. The magic of a portal is the ability to surface functionality of backend applications like a content management system, stock ticker, and personalized news through a familiar Web portal interface. Each application window on the Webpage is referred to as a portlet or gadget.

Virtually all content management systems may provide a Web interface for content creation and administration today, but a portlet interface is something new. A portlet interface requires a content management system to present its functionality with the look and feel common to the portal. This familiarity ensures that users will require little education in using whatever functionality is surfaced through the Web portal interface. The portlet inteface to Web content management systems does not generally provide all the functionality of the system, but is rather an interface for a user handling their most frequent tasks. Because the portlet interface is somewhat restrictive, it is best used for those requiring limited content management functionality over the Web.

ATG, BEA’s WebLogic, IBM WebSphere, Plumtree, Epicentric, and Sun’s iPlanet are the top portal providers today.

All Java; All Microsoft

A major undercurrent in today’s Web content management marketplace is rise of the all Java or all Microsoft solution. These are new solutions that strip down content management functionality to the bare requirements for Web content management. They are usually cheaper solutions that are readily customizable but do not provide the functionality required for most enterprise Web content management solutions today – let alone the re-use and re-purposing of content across multiple formats and distribution media.

28

Copyright © 2002 Burntsand Inc. All Rights Reserved.

Digital Asset Management and Data Driven Images

Enterprise Web content management systems are expanding their functionality to enable the seamless management of rich media assets and their metadata. Functionality includes media-specific indexing, browsing and search capabilities, easy-to-use editing, automated file transformation features (e.g. Photoshop to JPEG), and the ability to fully integrate and manage these digital assets with other types of content. Documentum is integrating DAM (Digital Asset Management) directly into it platform. Other vendors are partnering with pure play DAM vedors such as Virage, MediaBin, and Artesia.

Adobe also has a product called Altercast that enables database data to be added to images. Product pricing shown on Web catalog images can be changed for a market, for a sale, or for a different currency without touching the image through this automated server product.

29

Copyright © 2002 Burntsand Inc. All Rights Reserved.

7. Summary As the value of the Web became evident over the last five years, business managers were pressured to make volumes of content available over the Web to customers, suppliers, partners, and employees at a breakneck pace. Business units and departments responded with individual solutions that met the immediate need to communicate, but compromised a broader company vision. Silos of departmental content could not be leveraged across business units, search did not go across all the organization’s content, writers were taking on HTML tasks, administration was becoming unwieldy, and the company was unable to deliver on the personalization and localization requirements of the customer.

One-off departmental solutions and departmental content management systems were ineffective against the demands of today’s Web consumer. A small number of enterprise content management vendors, such as Documentum, had the vision and have evolved to meet the demands. Content management technology enables a decentralized control of sections on the website. Today it is possible to have an enterprise Web presence with a unified vision and a centralized framework, while decreasing administration costs and leveraging content assets across multiple formats and media.