47626319 ca best practices 9

ATG Content Administration Best Practices

ATG COMMERCE 9

P&T Engineering HOW TO : CA Concepts and Best Practices 9.0

CA Best Practices 9.0

Authors:Emmanuel ParasirakisPepper MintSenior Software Developers - ATG

• CA Best Practices 9.0• Purpose• Chapter 1 - Understanding the Core CA Components• ° Publishing Repository

° Version Manager ° Workflow Engine° Deployment Server

• Chapter 2 - Projects and Workflows• ° Content Administration Workflow

° - Production-Only CA Workflow- - Author

- Content Review- Deployment Approval- Deployment- Deployment Verification- Revert Deployment

- Staging and Production CA Workflow• Chapter 3 - VersionManager, VersionRepositories, and VersionFileSystems• ° Branches

° Workspaces° Snapshots° VersionRepository° VersionFileSystem

• Chapter 4 - Deployment• ° Full Deployments

° One-Off Deployment Targets• Part II - Implementation• Chapter 5 - Planning a CA Implementation• ° Where to Start

° Development Environment° Requirements Gathering° The Content° Importing the Data° - Using an Import Workflow for Automatic Imports° Versioned Repositories and File Systems° - Managing Automated Edits with User Edits - Catalog Structure

- VersionManager API° Deployment° - Asset Locks

- Asset Conflicts- Asset Conflict Resolution- Project Dependency Check- Online vs Switched Deployments- DAF Deployment- DAF Deployment Scalability and Failover- DAF Deployment Configuration

• Appendix• ° Appendix A - Checklist for Starting a Content Administration Implementation

Purpose

The purpose of this document is to familiarize those planning on an ATG Content Administrationimplementation with key concepts around CA and best practices for a CA implementation. This documentis meant to assist in guiding planning, design, development, and testing of a CA implementation. It aimsto guide developers through the common tasks that all CA implementers face. It also attempts to helpdevelopers avoid potential errors in their implementations.

Chapter 1 - Understanding the Core CAComponents

ATG Content Administration's (CA) main function is to move the latest version of data from the CAserver to a production server that has been marked for deployment. On the CA side, there are 4 maincomponents which work together to provide all the functionality. These components are:

• PublishingRepository: Stores information about projects and workspaces.• VersionManager/VersionRepository/VersionFileSystem components: Manage all the versions of all

the assets. These components work in conjunction with one another.• Workflow Engine: Manages the flow of each project through a workflow.• DeploymentServer: Manages the asset deployment process for each project.

Publishing Repository

The PublishingRepository contains all the project and workflow instance data. When a project is created,a new process and project object are instantiated and stored into the repository. The process itemreferences the project item. In most workflows, the process/project distinction is invisible and we justrefer to "the project" to describe both, but we still use separate process and project objects in theimplementation. In addition, there are several supporting repository items which may get associatedwith the project. These include history items, deployment information, and workflow task information.In addition, this repository contains all the data which is operated on via the workflow engine. The mainworkflow item descriptor type is individualWorkflow. In CA, the PublishingRepository is found at /atg/epub/PublishingRepository. However, throughout CA we use the secure version of this repositorywhich is found at /atg/epub/SecuredPublishingRepository.

Version Manager

The VersionManager component serves as the manager of all versioned repositories, workspaces,and snapshots, along with the main branch. The VersionManager works in conjunction with othercomponents, such as VersionRepository and VersionFile Systems, to manage assets. VersionRepositorycomponents are used by VersionManager to manage all non-file asset versions which its repositorydefinition files define and VersionFileSystems are used to manage versioned file assets. Typically youwill have one VersionManager and one or more VersionRepositories and/or VersionFileSystems. In CA,the VersionManager component is found at /atg/epub/version/VersionManagerService. Within itsproperties file you will find a listing of VersionRepositories and/or VersionFileSystems.

Workflow Engine

The workflow engine manages aspects of the project within the workflow, including the workflow'sstate and any associated activities. When a project is created, the workflow which that project will useis specified and a workflow instance is created. The workflow instance information contains the statewithin the workflow that the current project is in. All workflows have a subject, and in CA, the subjectis always a process repository item. Remember that a process and project are created and a processreferences a project. The workflow engine component is found at /atg/epub/workflow/process/WorkflowProcessManager.

Deployment Server

The DeploymentServer is the core component which executes all workflow functions requiringdeployment. In 2006.3 and later, this component farms out the actual deployment of the data to a non-CA-specific deployment component called the DeploymentManager which is found in the DAF layer.The DeploymentServer manages targets, connections to agents, and executes all the functionalityrelated to asset merging and branch management. The deployment server is found at /atg/epub/DeploymentServer. The DeploymentManager is found at /atg/deployment/DeploymentManager.

All the components described above work in conjunction with one another to provide CA's functionality. Inthe upcoming chapters we will discuss the relationship between all these components and how they workwith one another. After that we will describe how to design and build a CA system.

Chapter 2 - Projects and Workflows

Before proceeding with this document, it is assumed that the user understands all the basic workflowconstructs such as tasks, outcomes, actions, forks, events, conditions, etc. If not, see the ContentAdministration Programming Guide, Chapter 9, Adapting Workflows to Content Management Projects.

One of the key concepts in CA is the project. The purpose of a project is to associate groups of changesthat have some relationship to one another. Sometimes the relationship is explicit, other times it is not,other times it may be just because the user wants the changes grouped together. The project is basicallythe checkin unit of the CA versioning system. A project also defines a unit of deployment. This is a setof changes that will be deployed to the target servers at the same time. Targets are covered later in thisdocument, but in short a target is a system that receives data during a deployment.

When a project is created, two additional things happen: a workspace is created and a workflow isinstantiated. A workspace is a versioning construct and will be covered in the next chapter. A workflow isthe predefined steps the project must go through in order to modify and deploy assets to the productionsite. When installing and configuring CA, you may provide the option for creating a number of differenttype projects. Each type of project has an workflow associated with it and is therefore the workflow isspecified by selecting a link on the BCC home page which creates a new project. When a workflow isinstantiated it will be in the initial task. When a user selects an outcome for that task, the workflow willmove to the next task executing any actions along the selected branch. One way to think about it is thatthe project is the subject of the workflow.

Internally, a project actually consists of two objects, a process object and a project object. The processhas a property which is a reference to a project object. This separates out the high-level workflow(process) from the checkin and deployment unit (project). In most ATG products (including the baseContent Administration install) the process and project move in tandem and are largely indistinguishable,but we retain the distinction for flexibility. Knowing this, it is actually the process object that is thesubject of the workflow, but we use the term "project" to refer to these two objects collectively.

Content Administration Workflow

The workflow which ships with ATG CA is the following:/Content Administration/CA.wdl

In the BCC this workflow shows up as "Content Administration Project". It is used by both the processand the project objects. This workflow guides a particular process/project through the authoring stage,past review, to deployment to both the staging and production systems, and then to approval andcheckin.

You should generally not modify any part of this workflow starting with the deployment stages, as this iscomplex and the source of many problems that ATG customers have. It is fairly straightforward to addauthor or review stages. Also, deployment stages can be added, but should be treated as a completepackage and modeled on the staging or production deployment workflow sections. Any modificationshould be done in consultation with ATG Services or Support. It is very easy to alter the workflow in away that subtly breaks Content Administration, for example by causing data integrity problems or failingto release asset locks.

The CA workflow is known as a "late" workflow, because it checks in the assets after they have beendeployed to all the targets. Prior to 9.0, it was possible to have "early" workflows, where the assetswere checked in prior to deployment. Due to simplification and performance work in 9.0, only the lateCA workflow is supported, and early staged workflows will not work in 9.0 and later. Late workflows arepowerful because they give the user full control over deployments to each target, and allow you to testchanges by deploying to a staging system prior to checkin.

There are two versions of the CA workflow. Content Administration ships with a workflow that onlydeploys to a production system. This is useful in development environments, where "production" isactually a development test server. We include an additional version of the CA workflow that deploys toa staging server prior to deploying to production. To run with this workflow, include "-layer staging" inthe Content Administration startup. The rest of this section will primarily focus on the production-onlyworkflow, followed by a description of the changes in the staging and production workflow

Production-Only CA Workflow

The production-only CA workflow includes these tasks or stages: author, content review, deploymentapproval, deployment, and deployment verification, and revert deployment. Following is a closer look ateach of these.

Author

The author stage is where any actual editing takes place. Once a user is finished editing (or they wantto deploy to a staging server to test their changes) they choose the review outcome. At that point,the assets and project are set to not be editable (though they can still be edited in other projects) anda check for asset conflicts is performed. In order to allow the user to fully discard the changes in theproject, the delete outcome is included in the workflow. Various stages later in the workflow have thepotential to return the workflow to author stage, in order to make further changes.

Content Review

During the review stage, a content reviewer (typically not the same as the author) reviews the changeson the CA server or a preview server using the CA Preview capabilities. If they approve the changes, theworkflow moves forward to deployment. If the changes are rejected, the workflow returns to the authorstage. The project can also be deleted at this stage.

Deployment Approval

If the approve and deploy outcome is chosen, the project will immediately be deployed to the target inquestion (in this example, "MyProductionTarget"). In other words, the workflow will advance to the nextstage, locks will be created for the assets in the project, and the deployment system will be notified tostart the deployment. Note that the deployment attempted here is an incremental deployment. If thetarget is in an unknown state and a full deployment is required, the CA server will throw deploymenterrors if an incremental deployment is requested. If the target is in such a state, chose the approveoutcome and run a full deployment from the Admin Console.

If the approve outcome is chosen, the deployment will not be kicked off at this time, and must be startedor scheduled via the Admin Console in the BCC. However, the workflow will advance to the next stage,and wait there for the deployment to finish.

If the deployment is rejected, any asset locks will be unlocked, and the workflow will return to the authorstage. Note that on the first run through of the workflow, there will be no assets locked when entering theauthor state. Assets are locked whenever a deployment occurs. However, if we are at this stage due to anearlier revert, there will be locked assets which must be unlocked before editing.

Deployment

This stage of the workflow is not a task, where outcomes are chosen by a user. Instead, the workflowwaits for the deployment to complete. Once the deployment complete event has been received, if thedeployment was successful the workflow moves forward to verification. If the deployment had errors, theworkflow returns to the deployment approval stage for another attempt, or so it can be returned to theauthor stage for editing.

Deployment Verification

At this point in the workflow, it is expected that the user will check their deployment target to ensurethat the data is showing up as expected and there are no errors. Once they have taken a look, the usercan choose to either accept the successful deployment as permanent or revert the changes, possibly toreauthor them.

If the deployment is accepted, the workflow double-checks that the project has been deployed, and thenchecks in the data edited in the project. The data is saved to the main line and will be visible throughoutCA and in any new projects started after this point. Also, checkin releases any asset locks for assets inthis project.

If the deployment is rejected, a revert deployment is immediately kicked off and the workflow moves tothe revert deployment stage. The revert deployment will incrementally undo the changes from the earlierforward deployment.

Revert Deployment

Much like the deployment stage above, this workflow stage starts by waiting for the revert deployment tocomplete. If the revert deployment completes successfully, the workflow returns to the author stage, sothat further changes can be made, hopefully fixing the problems that showed up after deployment.

If the revert deployment fails, the workflow is returned to the verify stage, where the project can bechecked in or the revert can be attempted once more.

Staging and Production CA Workflow

If you run Content Administration with the "-layer staging" argument, you see a workflow that includesa staging deployment prior to the production deployment. The staging and production deployments aresequential: staging must be successfully deployed before we deploy to production. In the other direction,a production deployment must be reverted before a staging deployment can be reverted, and both mustbe reverted before editing (author stage) can occur again. Checkin only occurs after both staging andproduction have been successfully deployed.

The additional steps in the staging layer CA workflow are largely identical to the above deploymentapproval, deployment, deployment verification, and revert deployment stages. Here they are:

Note that the project is not checked in and the process is not closed once the staging deployment hasbeen verified. These steps do not happen until a matching production deployment has been verified, andso remain at the end of the overall workflow.

Also, note that now reverting or rejecting a staging deployment returns to the author stage, much likereverting from production in the production-only workflow. However, now reverting from production doesnot return the workflow to authoring, but rather returns to the verify staging phase, so that staging canbe reverted or production can be re-deployed.

There is a pattern in this workflow. In general, you only return to an earlier stage by undoing theactions at the current stage. Starting in 9.0, Content Administration generally requires that a particulardeployment action (say, deploying to production) must be undone before the previous action (say, deployto staging) can be re-done or undone. Deployments are run sequentially, and are unrolled in this mannerto preserve data integrity. For example, if we are deploying to production, then we know that the stagingserver has the correct data. More importantly, if we are checking in, then we know that staging andproduction have the deployed data. This is necessary to maintain data integrity on the target systems.

Finally, it is important to understand that CA will lock all assets in the project prior to deployment to thestaging server. The locks will be held until the workflow is either moved back to the author state, or theassets are checked in after deployment to the production target. During the time that the assets arelocked, any other projects that contain an asset that is in this project will be unable to lock those sameassets or deploy them.

Chapter 3 - VersionManager,VersionRepositories, and VersionFileSystems

The VersionManager is the component responsible for managing all the containers which manageversioned assets. This includes VersionRepositories and VersionFileSystems. The VersionManager alsoprovides the API for accessing versioning information about assets. Finally, the VersionManager providesthe constructs which make asset management and deployment possible. These constructs are branches,workspaces, and snapshots.

Branches

The basic concept of a branch is that it tracks every asset. Branches are the equivalent of lines in otherversioning systems. In VersionRepositories and VersionFileSystems there is always a branch which isreferred to as "main". The main branch contains all the assets that have been checked in. Every asset onevery branch has one and only one head. The head asset on the main branch is the last version of theasset to be checked in.

Previous to Content Administration 9.0, branches were also used to track data on each target. However,in 9.0 this usage was eliminated to speed up performance and enhance reliability, meaning that 9.0 andlater versioned repositories have a single "main" branch.

Workspaces

A workspace is a construct which manages a set of assets being created or modified. Only the assets thatare being modified are added to the workspace. When a asset is added to a workspace (checked out), aseparate copy is created in the workspace, based on the current head copy in the main branch. When theworkspace is checked in (another way of saying all the assets in the project being checked in), the assetsare put onto the main branch, and these new assets become the head versions.

There is a one to one mapping between workspaces and projects, and each project object has a referenceto the workspace it uses. When a project is created, both a workflow instance and a new workspaceare constructed and mapped to the project. It is the workspace that actually contains the modificationsto the assets. When the project is deployed to a target, the deployment mechanism pulls data fromthe workspace to determine what to deploy. After deployment, the workspace is checked in once thedeployment has been approved.

Workspaces can be queried and accessed just like the main branch. Queries against the workspace pullitems from the workspace if they are being modified there, otherwise assets are pulled from the mainbranch. So, working in a workspace (working in a project) gives the user the current view of the mainbranch plus any changes that have been made in the workspace.

Snapshots

A snapshot is, like the name implies, a snapshot view of the data state on the main branch at a particularpoint in time. Snapshots are created when each project/workspace is checked in, to mark the statewhen that project was finished. This state is used for full deployments and deployments that revert laterprojects.

Snapshots are not normally seen during the use of CA, but they are described here to provide developersand administrators a better understanding of internal CA operation.

VersionRepository

A VersionRepository is a GSARepository that can manage multiple versions of the same repository asset.When operations are executed by a VersionManager, such as checking in a workspace, that operationis ultimately translated into calls in the VersionRepositories that are configured in the VersionManager. The VersionRepository exposes the same APIs as a SQL Repository, but it requires a project contextbe specified in order to execute operations which modify assets. A project context is not needed to viewversioned data.

The VersionRepository implementation stores a separate row in the database for each version of aparticular repository item (asset). Asset versions are numbered starting at 1. In addition, metadatais stored with each particular version, like the checkin date, previous version, and so on. To present aview of the data that appears the same as a versionless SQL repository, the VersionRepository modifiesrequests to only expose head versions or workspace versions as appropriate. A VersionRepositoryautomatically creates a wrapped GSARepository which stores the underlying specific versions for eachrepository item asset. Usually this Repository has a name ending in "_ver" and can be viewed in theNucleus Component Browser. From there, you can run queries to see the underlying version objects thatmake up an item. Application code should never access this wrapped repository directly.

VersionFileSystem

In addition to repository items, CA can also manage file assets. File assets can be static files which maybe deployed to a web server doc root. Another example of file assets are scenarios, targeters, segments,slots, etc. These are commonly referred to as the personalization assets. Content Administration has out-of-the-box components that manage these sorts of files. In addition, it is possible to add components tomanage other sorts of files, for example JSP files.

On the CA server, we manage versioned files by storing them in a VersionRepository, just likerepository item assets. All file assets are stored in /atg/epub/file/PublishingFileRepository.File assets are exposed via VersionedFileSystems. A VersionFileSystem defines a logical file systemwhose assets can be versioned. Personalization assets are managed through the /atg/epub/file/ConfigFileSystem versioned file system, and web server file assets are managed through /atg/epub/file/WWWFileSystem. These and any customer-defined versioned file systems all store their data inthe PublishingFileRepository, but provide different views on this data, so that different sorts of files canbe deployed to different destinations. For example, web assets are typically deployed to the web server,while personalization assets are deployed to the filesystem of the running ATG production servers.

To improve performance and save database space, the actual file data is not stored within thePublishingFileRepository, but rather on the local filesystem. A specialized property descriptor managesthe storage of this data, making it appear as if the file data resides in the database, despite being storedon the CA server's filesystem. These files are typically stored in Publishing/versionFileStore in the homedirectory. This directory should not be modified directly. Metadata about the file is stored in the databaserow associated with the file version.

The VersionManager is aware of all the versioned file systems in CA, in much the same way that it isaware of the versioned repositories.

Chapter 4 - Deployment

The deployment system in CA is responsible for moving the assets from the CA server to each of thedefined targets. In 2006.3 and later, the /atg/epub/DeploymentServer component is responsible forassembling the assets that are to be deployed. Once they are assembled, the assets are handed to acomponent known as the DeploymentManager which was new in 2006.3. The DeploymentManager is acomponent which moves data efficiently between two servers. It replaces the old deployment systemwhich created manifest files to move data between servers. The DeploymentManager component is partof the DAF layer.

The DeploymentManager communicates with a software agent on the target system to transfer the assetsto that system.

When a deployment is initiated, the DeploymentServer is responsible for determining which assets needto be deployed. The DeploymentServer determines which assets need to be moved, using informationfrom the particular project being deployed: the project data itself and the current snapshot. To determinewhat to deploy, a diff is run between the current state of the CA server and what has been previouslydeployed to the target. For revert deployments, the diff is simply in the other direction, between thecurrent state of the target and a previous CA snapshot. Once the DeploymentServer has determined whatassets need to be added, updated, or deleted, the list is handed off to the DeploymentManager, whichactually moves the data.

Full Deployments

A full deployment wipes all the data on the target and reinitializes it. A full deployment will deploy thetarget's current snapshot on the main branch, plus data from any still-open projects that had beenpreviously deployed to the target. Content Administration tracks which targets have received whichsnapshots and open project deployments. After a successful full deployment, any irregularities or datainconsistencies on the target should be gone and the target should be in a stable and known state.

A full deployment is typically used when the state of the target is unknown, due to errors, changesin configuration, or because the target is new. Some installations run regular (for example nightlyor weekly) full deployments as a backup stability measure, but this is generally not required. Full

deployments are usually much slower than incremental deployments, because the entire current data setmust be copied to the target and its agents.

A full deployment can be run instead of a regular incremental deployment during a project workflow.To do this, choose "approve" instead of "approve and deploy", and then run the full deployment fromthe admin console, choosing the project in question from the "To Do" tab and specifying full instead ofincremental deployment. When run this way, the full deployment will incorporate data from the project inquestion, and move its workflow forward.

A full deployment can also be run from the target overview page, by clicking the "full deployment"button. This re-deploys the target data based on the Content Administration server's record of whatsnapshot and which projects should be on that target.

One-Off Deployment Targets

So far we have discussed deployments to staging or production targets that happen in the context ofworkflows. However, 9.0 introduced another kind of target, the "one-off target". These targets canreceive a special kind of deployment, known as one-off deployments. They cannot be deployed to in thecontext of workflows. When creating a target, the user may specify whether the target is a workflowtarget or a one-off target.

The primary difference between one-off targets and workflow targets is that CA does not ensure dataintegrity on one-off targets. This means that one-off deployments can be performed at any point in theworkflow, and also that one-off deployments do not lock assets. One-off incremental deployments canbe triggered directly from the header of the project overview page. One-off full deployments can betriggered from the target overview page.

The intended uses of one-off targets are two fold, as test systems and for long term projects. With a one-off target, a user can quickly get information onto a test server, without making their project non-editableor worrying about asset locking or other projects. For long term projects, a one-off target may be usedto stage edited assets without locking the assets, allowing other short term projects to continue and bepublished to the production site.

At the same time, because one-off targets are not consistent in terms of data, it is not guaranteed thatthese targets will behave the same way as production one the project is deployed. For example, if oneuser deploys their project to a one-off target, and then a different user immediately deploys an earlierproject with some overlapping assets, some (but perhaps not all) of the first user's changes will not bepresent when they are reviewing the one-off target. One-off targets are good for quick tests, but they arenot the final word. For that, it is best to use a staging system or check a preview server.

For this reason, it is important to frequently restore data integrity on one-off targets via full deployments.Also, it may make sense to divide one-off targets so that only a couple users are working with each one,to prevent conflicts. Alternatively, separating out user domains (so each user is only working on onedistinct portion of data) may prevent content editors from stepping on each others' toes.

Part II - Implementation

Chapter 5 - Planning a CA Implementation

This next part of the document is meant to serve as a guide for designing and developing a CA server.It attempts to describe what order things should happen and things to be aware of. It also tries to coverpossible areas of configuration and customization.

The first step in planning a CA implementation is scheduling the appropriate amount of time based on thecomplexity of the system. As a point of reference, ATG typically will plan for 1-3 development resourcesto work on CA full time for approximately three quarters of the overall production system developmentproject. Also, work on a CA implementation should begin roughly at the same time as the productionsystem if possible. If not, the CA rollout should begin as a separate project after the original site hasgone live. A common mistake made during implementation is the assumption that CA is simply an add-onand will just work. CA is an internally facing application that has users and administrators (sometimes the

same person has both roles). Therefore, it also has its set of requirements and should be planned witha full development cycle, beginning with requirements, then design, implementation, and then testedsimilarly to an externally facing site.

The number of resources required also depends on the complexity of the system. Before we can addressthat, some initial work needs to be done to gather requirements from the users and assemble use cases.Other factors may already be known at this point, such as the source of the initial set of data. This isoften another system within the organization.Given the above, below we list the common types of modifications and customizations which CA users do.This can be used as a guideline for helping determine time and resources. If any of the items below donot apply to your implementation, then it will reduce the overall implementation time and resources.

• Configuring custom version repositories and/or file systems• Building custom workflows to map to your organization's business processes• Configuring targets other than staging and production in your environment• Importing initial data into the system• Building a custom automated import process to import data from an external system for ongoing

updates• Customizing the user interface for the CA users

Note that information on Customizing the user interface for CA is covered in the Content AdministrationProgramming Guide and not addressed in this document.

One of the most important things that must be done besides gathering requirements is testing all the usecases that were gathered at the beginning of the project. This must be done before the CA site goes live.

Where to Start

Initially, there are several things that can occur. These are requirements gathering from the users,workflow definition, and repository and file system configuration. Workflow definition is only required ifyou plan to not use any of the workflows shipped with the product.

The purpose of the requirements gathering is to implement a system that meets the user's needs. Therequirements should consist of use cases that detail the steps exactly as the user would do them.

In the case that the workflows shipped with the product do not meet your business process requirements,they can be customized, though ATG strongly recommends that any customizations be relatively minimal.If customization is to occur, it is recommended that the new workflow is created by copying an existingone and editing it, keeping the overall structure in place. This ensures that the starting point for theworkflow includes ATG's recommendations. Also, ATG strongly recommends that you consult with ourServices or Support departments to validate any workflow changes.

Development Environment

When setting up CA, it is recommended that there be two environments, a development environment anda production environment. Each environment should contain the CA server(s) and deployment targetservers. The development environment is for the development of the CA application, and deploys to thetarget severs in the development environment. Usually, each developer will have their own developmentenvironment. The production CA environment is the system that, during development, is updated toinclude new code changes as development work proceeds. This system will ultimately be the one that theCA users do their work on and that deploys content to the production web site.

Requirements Gathering

The purposes of requirements gathering are to first, assist in building a system which providesfunctionality that the users desire. Secondly, it provides a basis for testing the CA application prioravailability.

The actors in this process are the users of the CA application and the system integrator/developers.

The requirements should include use cases. The use cases include all the detail of what the user is tryingto accomplish and what steps they will need to execute in order to complete the task. The requirements

and use cases can then be used to determine which portions of CA need to be enhanced and with whichportions integration is required.

Another portion of requirements gathering is to determine the process that the users go through topublish content. Most organizations use some process to get content onto their sites, however in manycases that process may not be documented. Once it is documented, the people building out the CAimplementation will have a better understanding of how they will need to adapt CA to the businessprocesses of the organization.

The Content

Once the initial phase of requirements gathering is complete, the next area to focus on is the content.This area includes designing and developing the repositories that the content will be found.

Note: One of the most common places that errors begin is in the repository definition file. Many issuescan be prevented by taking care that this file is configured correctly for all repositories. If anything ismisconfigured in the definition file, it often results in an error somewhere down the line. This could bewhile a user is modifying an asset or even as late as deployment.

Below is a list of the most common items to check for when validating a repository definition file.

• Data-type correctness, be sure that the data-type and the column match up correctly between thedefinition file and the database ddl

• Constraints, be sure that any constraints match up, a common example is that a database column ismarked as NOT NULL, but the associated property is marked as required="false".

• Be sure that for every property that references another item type that the references attribute isconfigured for that property:

<property name="myProperty"><attribute name="references" value="true" data-type="boolean"/></property>

• Be sure that the versionable attribute is only set to false on item descriptors which you intend tonot have deployed. Not ensuring this will result in data which you expect to be on the target to bemissing. Also, using this attribute requires that any data which is in a non-versioned item descriptorto also exist on the target prior to any deployments. This is especially true of versioned itemsreference the non-versioned data.

The other type of content aside from repositories are files.

Files sometimes take the form of static content that is deployed to a webserver's doc root. One of the filesystems on the CA server known as the WWWFileSystem represents the files that are to be deployed tothe web server doc root. As this file system is already configured, there is no additional work required touse the file asset functionality for deploying to a web server's doc root.

Similarly, configuration (for example, properties) files can be managed on the CA server using theConfigFileSystem, and then deployed to each of the application server installations on the target. Again,this is pre-configured with the CA product.

The CA product is not limited to only these uses. In fact, the product can deploy any file content toany location or to multiple locations on the destination cluster. To do this, it is necessary to create newtypes in the PublishingFileRepository and new file system components on the CA server, following theinstructions in the documentation.

The primary consideration for new file systems is where the data will end up once deployed. While allthe file systems store their data in the PublishingFileRepository, the implementation requires a differentfile system component for a different directory location on the agent (or across multiple agents).For example, if you are using CA to manage JSP files in three different web applications, then youwill probably want to create three different file system components on CA to track these files, withcorresponding types in the PublishingFileRepository.

Importing the Data

Importing the data into CA can often require additional planning, mainly when faced with a requirementthat data needs be imported on some time interval on a recurring basis. First we begin with the simplecase which is importing the initial data. This is a one time process that occurs when the site is first setup.The key thing to be aware of here is whether or not any targets have been configured yet. Note: Thepreferred way of importing the initial set of data is to do it before any targets have been configured in CA.By importing data before any targets are created, the system allows for imports directly into workspaceswithout needing to create a project. This is not a requirement, but sometimes makes the initial importingstep easier. If you choose to create the targets first, then you can still import data. The one thing thatyou will need to be aware of is that you will need to do it within the context of a project. The reason forthis is that once a target is created, it can only recognize changes within a project for deployment.

In order to import within the context of a project, you must be sure that either a project is created as theimport begins or that the assets are imported into an existing project. When using startSQLRepository,there is a parameter, process, that allow you to specify the name of a new project that will be created forthe assets to be imported into. If you have an existing project that you want to import into, you will needto find the project's workspace name and use the -workspace parameter. By default, the workspace namefor a project is the string "epub" followed by the project id of the project. For example "epub-prj12345".

If you are using the repositoryLoader, the same concepts apply. In order to specify to therepositoryLoader that a project should be created at the start of an import, then the createProcessesproperty on the /atg/epub/file/VersionedLoaderEventListener should be true (the default is false). Otherproperties can also be set here which control various aspects of the import. They are:

* processNamePrefix=Content Administration Import* appendTimeToProcessName=true* processDescription=Imported by the RepositoryLoader* timeFormat=MMMMM dd, yyyy hh:mm:ss aaa* workflowName=/Content Administration/editAssets.wdl* userAuthority=/atg/dynamo/security/UserAuthority* personaPrefix=Admin$user$* userName=admin* password=admin* checkinOnCompletion=true

The above property values are the defaults. For more information on these properties, see the "ImportingVersioned Repository Data" section of the "Repository Loader" chapter of the ATG Repository Guide.

Using an Import Workflow for Automatic Imports

It is possible to use the CA workflow for imports. The way this works is that the import runs against aproject that is in its author stage. Then, users move the workflow through the deployment stages as theywould in a normal project.

However, often customers want to have an automated way to import and deploy data to the targets. Todo this they create an import-specific workflow that skips a number of user approval steps. The sourceof the data is often an external system which is responsible for managing said data. Very often, say ona nightly basis, there is a need for any updates on the external system to propagate to the externallyfacing website. In order to get the data to the external site, the requirement is to import the data into CAand push it to the target. It is fairly common that this data not go through an approval process.

The key to this workflow is building it to have only a single task and outcome when the process runswithout error. Custom code is often written to do the work of importing the data and then firing anoutcome which will cause the assets to be checked in and deployed. The workflow is quite simple when itruns through the positive (non-error) case and a bit more involved in the case when an error occurs.

Below are sections of a workflow which is specific to automatic imports.

The beginning of the import workflow cotains a stage where the import will occur. While their areoutcomes at this stage, they will only be accessed by users if the import workflow fails at some point. Theinitial run through will use code to automate the workflow advance.

Because the import needs to be triggered from code, we cannot depend on the usual CA UI for startingthe workflow. Instead, the import code will need to automatically create the process and start theworkflow. Here is example code that does this. We describe each section following the code.

TransactionDemarcation td = new TransactionDemarcation();boolean rollback = true;try{ td.begin(getTransactionManager()); assumeUserIdentity("publishing"); String projectName = getProjectName() + " " + getDateString(); ProcessHome processHome = ProjectConstants.getPersistentHomes().getProcessHome(); Process process = processHome.createProcessForImport(projectName, "/Content Administration/import.wdl", false); String wkspName = process.getProject().getWorkspace(); Workspace wksp = getVersionManager().getWorkspaceByName(wkspName); WorkingContext.pushDevelopmentLine(wksp); // import specific code goes here advanceWorkflow(); rollback = false;}catch(VersionException e) { throw e;}catch(PublishingException e){ throw e;}catch (TransactionDemarcationException e) { throw e;}catch (CreateException e) { hrow e;}catch (WorkflowException e) \ throw e;}catch (ActionException e) { throw e;}finally {

WorkingContext.popDevelopmentLine(); releaseUserIdentity(); try { td.end(rollback); } catch (TransactionDemarcationException tde) { throw tde; }}

The first thing that happens is that a transaction is created. You will want to do this because if somethingfails during the import, rolling back the transaction will revert all the changes and not commit the projectcreation.

Next, calling assumeUserIdentity() will allow the thread to assume the security characteristics of a userdefined in the profile repository. Doing this allows for accessing secure resources during the import, suchas the PublishingRepository. This method looks like this:

private boolean assumeUserIdentity(String pUserName) { if (getUserAuthority() == null) return false; User newUser = new User(); Persona persona = (Persona) getUserAuthority().getPersona("Profile$login$" + pUserName); if (persona == null) return false; // create a temporary User object for the identity newUser.addPersona(persona); // replace the current User object ThreadSecurityManager.setThreadUser(newUser); return true;}

Now the project is created by getting the object which exposes the API method and callingcreateProcessForImport(). It is passed a process name, the workflow that the project is to use. When theprocess is created, it creates the underlying workspace, and starts the workflow. The workflow beginsexecuting immediately and because the first element in it is an action, it is executed. The create projectaction will create the process' project object. The workflow will then advance to the next element which isa task. A task is a wait state which means the workflow will stop and wait here. Once the workflow stopsexecuting, control is returned back to the createProcessForImport() method and then to the code above.

One last thing must be done before we can start importing repository items and/or files into the CAsystem. That is to set the project context. The project context is set on the current thread and tellsthe system into what project to make the changes. In order to do this, we need to get the project'sworkspace and then call WorkingContext.pushDevelopmentLine(wksp). Note that workspace property ona project is a workspace name and it must be first resolved against the VersionManager.

At this point any custom code to read data from other systems or files to create, update, or delete assetsis done here. Note that if the import can be very large, that transactions should be batched.

After all the import work is done, we are going to want to advance the workflow. To do this, theadvanceWorkflow() method is called which looks like this:

public static final String TASK_ELEMENT_ID = "4.1.1";

private void advanceWorkflow() { RepositoryItem processWorkflow = project.getWorkflow(); String processName = processWorkflow.getPropertyValue(PROCESS_NAME).toString(); String subjectId = process.getId(); try { // an alternative would be to use the global workflow view

WorkflowView wv = getWorkflowMANAGER().getWorkflowView (ThreadSecurityManager.currentUser());

wv.fireTaskOutcome(processName, WorkflowConstants.DEFAULT_WORKFLOW_SEGMENT, subjectId, TASK_ELEMENT_ID, ActionsConstants.ERROR_RESPONSE_DEFAULT); } catch (MissingWorkflowDescriptionException e) { throw e; } catch (ActorAccessException e) { throw e; } catch (ActionException e) { throw e; } catch (UnsupportedOperationException e) { throw e; } }

In the above code we are programmatically advancing the workflow to the next stage. We do this bygetting a component call the WorkflowView. This component exposes the workflow APIs. It is a sessionscoped component. Once we have this object, the fireTaskOutcome() method should be called. Thefirst three parameters are values that come from the process level workflow, mainly the process name,segment name, and subject id (the process id). The fourth parameter is the id of the outcome to fire.In the /Content Administration/import.wdl workflow that id is 4.1.1. You can see the outcome ids bylooking at the workflow manager component at /atg/epub/workflow/process/WorkflowProcessManagerand selecting the workflow you are interested in. The last parameter tells the workflow engine how tohandle errors, and should always be set to ActionConstants.ERROR_RESPONSE_DEFAULT.

When the workflow is advanced, any elements after the outcome will be executed. In this workflow, theseactions would set the editable property to false in the project, check that all the assets are up to dateand no conflicts exist, and then approve and deploy the project to the staging and then production sites.The approve and deploy action causes the project to be queued for deployment. The deployment actuallyoccurs in a separate thread in the background.

After starting the deployment to staging, the import workflow waits for the deployment to finish, and theneither proceeds to production in the case of success or returns control to the user in the case of failure.

In the event of failure, the user is given various options to retry the deployment or return to the importtask.

If the deployment is retried, then before advancing the workflow, the user must go to the adminconsole (BCC) and stop and cancel the failed deployment. If this is not done, then the next deploymentwill get queued but will not start until the current one is stopped. Once either of the retry outcomes

are selected, then the project is approved for deployment and the workflow goes back to thewaitForStagingDeploymentToComplete task.

You will notice that there is a returnToImportTask outcome as part of each of these tasks. This is in theworkflow as a way to exit the wait state if a user determines it necessary. For example, if deploymentfails but does not advance the workflow, a user can select it in the BCC and return the workflow to theImport task and manually retry to advance the workflow. Note that by selecting this outcome, it does notterminate any previously started deployment that may be running. This option is not designed for normaluse, but only to recover from unexpected events, and should not affect the project beyond changingstate. Note that returning to the import task does not release any asset locks. Once the import hasdeployed, it holds locks on the deployed assets until any issues are resolved, even if we return to aneditable state for hand fixes.

The production deployment follows a similar pattern. Once the production deployment completessuccessfully, the workflow will continue and the assets will be checked in and the project and processcompleted. Note that beyond the initial Import task, assuming everything executes successfully, theworkflow does not require any other manual intervention. In the case of an error occurring, the workflowis designed to allow a user to interact with it in the BCC to resolve the issue.

Finally, once the import is complete and no errors have occurred, the project context should be unset bycalling WorkingContext.popDevelopmentLine(), the user identity for the thread released. Here is the codefor releaseUserIdentity():

private void releaseUserIdentity(){ ThreadSecurityManager.setThreadUser(null);}

Versioned Repositories and File Systems

Repositories and file systems go hand in hand with the VersionManager. Together they all make up thepart of CA which manages all the assets. We often find that the source of issues with CA is because of amisconfiguration of one of these components. Very often that source is the versioned repository definitionfile. This is understandable, given the immense amount of configuration information in it. Because of this,it will benefit all those designing and testing this product to be very disciplined when modifying this file.

When possible, the structure of the item descriptors should determine the table layout in the database.By this I mean that if a custom repository is being built, start with defining the repository definition fileand then the table structure will derive itself from that. The reason for this that using specialized schemasrequires additional configuration in the repository definition file. The less specialized configurationthat needs to be defined in the definition file, the less chance of an issue occurring because ofmisconfiguration.

In some cases, starting from a repository definition file is not an option. If that is the case, then thedefinition file needs to be created from an existing schema. Doing this forces the developer to define itemdescriptors which may have specific requirements.

Managing Automated Edits with User Edits - Catalog Structure

In many implementations, there exists the problem of managing automated changes to data and userchanges to data. It is common that the automated changes happen on a set of properties on an item andthe user changes happen on a different set of properties. Because asset conflicts are detected on the itemlevel and not the property level, the scenario above often causes asset conflicts to occur. The solution tothe asset conflict is to resolve it which can be done easily by users, but not so easily by code.

In order to reduce the chances of a conflict occurring in the scenario above, the solution is to havethe properties that are edited by the automated system in a separate item descriptor type than thoseedited by users. By doing this, those edits are done in two separate items and therefore will not causea conflict. For example, say a product catalog's product item descriptor is managed both by a user andan automated import. The automated import manages the display name, description, and price, and theusers manage all the other properties manually. It would be beneficial to create another item descriptor

that contains the display name, description, and price properties. We will call this autoProduct. Theproduct item descriptor would then be modified to contain a reference to the autoProduct item and thedisplay name, description, and price properties in the product item descriptor would get converted to aderived property which would pull its values from the autoProduct item.

In the above scenario, when the autoProduct is changed, it will never cause a conflict with the productitem because the changes are made independently of one another and will not cause a conflict. Thisdesign is very common and should be used if there is any concern that an automated import anddeployment will not be able to deploy because of asset conflicts.

VersionManager API

The VersionManager API exposes the method to access versioning related information. This includesinformation on versioned assets and information about workspaces. The starting point for the API is theVersionManager which can be found at /atg/epub/version/VersionManagerService. This component is oftype atg.versionmanager.VersionManager.

The operation that is most often used in the API is getting assets from the VersionManager. Getting anasset in terms of the VersionManager returns objects such as AssetVersion and WorkingVersion. Theseobjects contain within them references to the actual version of the repository item or file. From anAssetVersion, all the versions of a particular asset can be found.

In addition to asset information, the VersionManager API can be used to access workspaces andtherefore be used for tasks such as automated import of data. The operations of interest in aWorkspace are checkout, checkin, reverting changes, getting the set of assets in the workspace, andgetting a specific asset in the workspace. In general, these types of operations are done through theatg.epub.project.Project object, but it is these lower level objects which contain the actual logic.

Deployment

There are several aspects to CA deployment. There is deploying a project from its workflow, usingthe "Admin Console" to deploy and rollback, using the project page to run one-off deployments, andautomatically scheduled deployments. Understanding each of these and how they interact with oneanother is key to managing your deployments.

Beginning with workflows, there are two types of actions in the workflows which deploy. They are"Approve project" and "Approve and deploy project".

The "Approve project" action will approve a project for deployment to a specific target. Approval will lockthe assets in the project. More details on asset locks will be provided later. Approval for deployment willnot actually deploy the projects, but will identify it as being ready for deployment. The project will getdeployed either the next time the RecurringDeploymentService runs, or when manually deployed via theBCC. The RecurringDeploymentService nucleus component can be found at /atg/epub/deployment/RecurringDeploymentService. Its purpose is to check for any projects waiting to be deployed anddeploy them. Deployments started by the RecurringDeploymentService are incremental deployments, andif the target is in an unknown state (requiring a full deployment) these deployments will throw an error.

The other action, "Approve and deploy project" will approve and also initiate the deployment immediately.Again, deployments kicked off from this workflow action are incremental deployments.

Once a deployment starts, it will send an event on success or failure and the workflow will continue ineither case, once it catches the deployment event. One thing to be aware of when a deployment failsis that the state of the data on the target is unknown. Some of the data may have been deployed andcommitted. If the deployment fails, then the workflow is configured so that it will return to a previoustask.

When this happens, the problems should be addressed in the BCC Admin Console. The Admin Consolehas various controls for resuming, stopping, rolling back, or cancelling deployments.

The first thing that happens on deployment failure is that the deployment engine considers automaticallyrolling back the deployment. Added in 9.0, the automatic rollback feature will trigger in certaincases when a deployment fails. Cases where this will not occur are described below. The CA serverimmediately attempts to roll the problem deployment back. After a successfull automatic rollback, the

deployment remains in a failure state and can be resumed or canceled as needed. Cancelling at this pointwill leave the target in a known state and a full deployment will not be required.

Automatic rollback will not happen if the deployment was a full or rollback deployment, or if thedeployment is the second phase of a switched deployment, or if the deployment had already altered morethan a certain number (50,000 by default) of assets. In the case of these large or complex deployments,it is better if an administrator intervenes before any automatic action. It is possible to turn off automaticdeployments using the DeploymentServer's autoRollbackDeploymentOnFailure property, or modify thethreshold with the autoRollbackCommittedThresholdCount property.

If a deployment is in a failure state (either after a successful automatic rollback, or if automatic rollbackdid not run or failed), the Admin Console provides buttons to resume or stop the deployment. Resumingis the desired first step, and it may fail again or succeed. In the case of success, you can continueadvancing the workflow by selecting the tasks in the BCC to move it forward. If it fails again, then thedeployment must be stopped. After it is stopped, it can be resumed, rolled back or cancelled. Resuminghere is the same as resuming above. A rollback will attempt to revert the state of the target to whatit was before the deployment began. It does this by doing a deployment which will remove any assetsthat were added, adding assets that were removed, and setting the values of assets that were updatedback to their previous values. It is possible that the rollback will also fail. If this happens then the entiredeployment must be cancelled and the state on the target must be restored by doing a full deployment tothe target.

Resuming and rolling back a failed deployment are preferable to cancelling the deployment, as the cancelaction will require a full deployment to get the system back into a known state.

If a deployment has failed and the target has ended up in an unknown state, you will need to do a fulldeployment to stabilize the target. Go to the "Admin Console" and select the target that failed. Once youhave stopped and cancelled the deployment, click on the "Full Deployment" button on the target statuspage to start a full deployment.

Asset Locks

In order to maintain a consistent set of deployed assets on all the targets, CA uses the concept of lockingassets. Assets are locked when a project is approved for deployment to a target. For example, if project1 contains assets A, B, and C and project 2 contains assets C, D, and F (note the common asset of C),when project 1 is deployed to its first target, say staging, it is locked so that the deployment of anotherproject cannot overwrite any changes to any of the assets in project 1. In fact, assets A, B, and C are alllocked. If a user tries to deploy project 2, they will get a message stating that they cannot deploy untilthe other project which contains the common assets is unlocked.

After project 1 is deployed to staging and then to production, its assets will remain locked until one oftwo things happen. First, if the deployment is verified, the assets are checked in, the locks are releasedand the user can then advance the workflow for project 2. The other way asset locks are released is byreverting the deployed assets off all the targets they were deployed to. Below is an example of a revertaction.

Note that the workflow contains an action to release the asset locks when returning to the author stage.

Starting with 9.0, additional functionality was added to track asset locks. On the project page, thereis now a asset lock tab that lists the other projects that are locking assets in the project being viewed.Clicking on any one of these projects opens a page which details the particular shared assets. There isno way to directly unlock the assets from these pages, but a user can navigate to the projects that arelocking the assets and either revert or complete them.

The purpose behind asset locks is to ensure data integrity across multiple targets. Without asset locks,it is possible that different versions of assets are on different targets after all the projects have beendeployed. For example, if project 1 were deployed to staging, then project 2 deployed to staging andproduction, and finally project 1 deployed to production, different versions of asset C would be on stagingand production. That is because project 1 was deployed to production last and project 2 was deployedto staging last. If asset C had a property in it that was changed in both projects, say displayName, thenthe displayName would be different on staging and on production. Asset locks ensure that once a projectis deployed to a target, that it is deployed through to all targets before any other project containingoverlapping assets is deployed.

Asset Conflicts

Similar in a way to asset locks, asset conflicts occur when two projects contain the same asset and oneproject's assets are checked in before the other project's assets are checked in. In this example, asset Ais checked out into project 1 and 2. Some changes are made in each project to asset A. Project 1 is thendeployed to staging and production and checked in. Once the checkin happens, project 2 will not be ableto be deployed unless the conflict is resolved. The asset listing in the BCC for project 2 will display a linkto allow the asset to be resolved. Clicking this link will bring the user to a page displaying both versionsof the asset from projects 1 and 2. They can then select either version or select which properties fromeach version they desire to keep. Once this is done, the conflict is resolved and project 2 can be deployedand checked in.

It is very important for users to understand asset conflicts; why they happen and how they can beresolved. It is also important for them to understand that if they can avoid editing a large number of thesame assets in more than one project, it will allow them to avoid resolving large numbers or assets priorto deployment.

Asset Conflict Resolution

It's important to understand the concept of a workspace (analogous to a project in BCC) beforeunderstanding conflict resolution. A workspace is a little sandbox in which a user makes changesto versioned assets. Changes being done in one workspace cannot be seen in another workspace.This means that two users can make changes to an asset without knowing about each other's edits.VersionManager detects this case and warns the user who is checking in last, of the fact that someoneelse has already checked in a change. This is called conflict detection. Internally, conflict detection isbased on the changes to the head version of the asset. Head version is the version currently checked in.Each version has a reference to its predecessor version. This can be best explained with an example:

The above diagram explains how conflict resolution is detected. V1 was the head of the branch when V2and V3 were checked out into separate workspaces. Workspaces here are indicated by red (upper halfof drawing) and blue color (lower half). The asset that was checked out in the "blue" workspace was thefirst one to be checked in. This caused the head of the branch to become V3 from V1. When an attemptwas made by the "red" workspace to check in, version manager detected that the predecessor versionof the asset V1 is no longer the head of branch, head of branch now is V3; it throws an error askingthe application to resolve this conflict before checking in the project. Conflict resolution is performed bythe application (i.e. BCC). It asks users for which changes to keep from the asset checked in the "blue"workspace. Note that conflict resolution itself cannot be done in a reliable way by any algorithm since itinvolves decision making by a human.

Project Dependency Check

In versions prior to 9.0, deployment would be preceded by a dependency check to ensure that projectswere not being deployed out-of-order in a manner which would break data integrity. Due to changesin 9.0 to remove versioned repository branches, and remove support for early staged workflows, thedependency check is no longer needed and is not performed.

Online vs Switched Deployments

The two ways to configure your system for deployments are online deployments and switcheddeployments (also referred to as switching data sources). An online deployment is one in which thedeployment happens directly on the live database of the target site. With switched deployments, thedeployment of data is done to an offline copy of the database. When it completes, the offline databasebecomes the live database and the live database becomes the offline database. At this time, thedeployment is done a second time to the offline database to synchronize the data between the twodatabases.

In general, online deployments are good for development environments where the target sites arenot live or used for critical applications. The advantage of using online deployment is that it is quickto setup and overall deployment time is shorter because the deployment is only executed once. Thedisadvantages are that as data is committed to the target site pieces of the data become available duringthe deployment. This has the effect of putting the target site in an unknown state during the deployment.In a development environment this is acceptable because the target site is most likely not being usedfor any critical applications. In a production environment however, this is not the preferred approach, asthe site is being accessed by users and if they access data that is partially committed then the result isunknown. Therefore, online deployments should never be used for a production system.

Another problem with online deployments is that if an online deployment fails, that leaves the targetsite in an unknown state usually for an extended period of time while the problem is diagnosed anda solution is determined. Again, for a development target it is not usually an issue because it is not acritical system. However, for a production site this is unacceptable, as loss of a production site can havesignificant business repercussions.

Switched deployments solve the above mentioned problems for critical systems by doing two things.First, they make all the deployed data available atomically. This means that all the new data is madeavailable to the site all at the same time regardless of how much data was moved. Secondly, they providethe safety net of never touching the live database to deploy data. This provides the insurance that if adeployment was to fail that the production site would still be live and running in a known state.

Production systems should always use switched deployments regardless of how large or small they arebecause they guarantee site availability even if a problem was to occur. Further, systems may periodicallyrequire a full deployment of all the data , and that deployment may take some number of minutes, evenfor a small site. With a system configured for online deployments, the site would need to be taken downduring such a deployment. This is because a full deployment first deletes all the data on the target siteand then rewrites it all out to the target site. Because all the data is deleted during a full deployment, thesite will be in an unknown state during that time. A system configured for switched deployments couldrun while keeping the site up and available because it will delete all the data on the offline system first,rewrite it to the target site, and then switch making all the data available at once.

DAF Deployment

In 2006.3, a portion of the deployment system was rewritten to allow it to scale and perform betterthan previous versions. This upgrade to the deployment functionality replaced only a portion of CAdeployment. It mainly replaced the portion that moves the data from one server to another, also knownas "manifest deployment." The logic which determines which assets are for deployment has not changedand remains as part of the CA product.

Rather than packaging the data in a manifest file and sending it to the targets to deploy, for repositoryitems DAF Deployment makes a connection to the source and destination repositories from the sameserver instance and copies the data. For file assets, the source server instance makes socket connectionsto the destination server instances and sends the files across.

The PublishingAgent although not used to receive the manifest file any longer, is still required to maintainthe status of the target and also to allow for caches to be invalidated during a deployment.

DAF Deployment Scalability and Failover

Another feature to be aware of with DAF Deployment is its ability to scale and distribute the deploymentload across multiple server instances. For example, when two CA server instances are configured in thesame cluster and a deployment is initiated, each of the two instances will request a segment of datato deploy and then deploy it. They will continue to request segments of data until all of it has beendeployed. By running the CA instances on separate hardware, the data is deployed faster. Note thatadditional instances can be started to run the deployments even faster.

Also in the multi-server case, if a non-primary instance (the instance that the deployment was initiatedon) becomes unavailable, the deployment will continue and the work that was assigned to the instancethat is no longer available will be reassigned and restarted.

DAF Deployment Configuration

For the repositories which ship with the product, the majority of the configuration is already done. Twosteps that must be done are to configure the staging and production data sources and to add the stagingand production repositories to the /atg/registry/ContentRepositories.initialRepositories property.

For any custom repositories, users must be aware that there is a repository component configured inthe CA instance for the versioned repository, another for the repository which stores the data for thestaging instance, and a third for the production instance. Each of these are configured to use a separatedata source and a separate client lock manager. Looking at the configuration for a staging or productionrepository which ships with the product will give you an idea for the proper configuration.

When a second repository is created to communicate with a database, that component is logically partof the same cluster as all other repositories of the same type that communicate with the same database.Because you now have more than one repository that may be reading and writing to/from the databasethe cache mode for all the items need to have their cache mode set to be something other than simple.Locked cache mode will work, but you need to be sure to configure the repositories on the CA serverto communicate with the ServerLockManager on the staging/production server. The correct cachingmode for a given CA deployment will vary depending on the given system. Details on cache modes, andsuggested settings are provided in the Repository Guide, Section 10 SQL Repository Caching. See thediagram below.

In the above diagram, you see that there is a FakeXADataSource_production andJTDataSource_production components defined which are configured to communicate with the productionserver's database. Note that these components are defined in the DAF.Deployment module. In addition, aClientLockManager_production is defined to communicate with the ServerLockManager on the productionserver instance. This lock manager component and the data source components are configured on the CAserver instance but are logically part of the production server cluster and therefore must be configuredagainst that server's components.

Appendix

Appendix A - Checklist for Starting a Content AdministrationImplementation

Content Administration project fundamentals

• Establish requirements that include both full and incremental deployments for system, which willoccur on an ongoing basis

• A CA project requires appropriate time to execute and must be designed into a site. The webapplication cannot be CA enabled in the final week prior to going live. (Should review timerequirements to make application CA ready, functional and tested with Professional Services.)

• CA preparation for an application should begin when the schema, virtual file system and repositoriesare all fixed. These three steps should begin very early in the CA project:1. Build front end application2. Create versioned modules and go through process of making application ready for CA

3. Script the initial CA imports, keep handy if necessary to reinitialize a broken CA environment4. Test deployments

Spec and review deployment topology

• Review location of file and repository assets• Keep it simple• Conduct review of deployment topology and avoid:

° deployment server used as target° multiple deployment servers deploying to the same target° changing targets in future

Spec and review any automated import process

• Plan and schedule time for appropriate specification development, development, and testing of tools• Carefully consider how much is manual, vs. programmatic, vs. automated; consider how workflow

execution happens• Consider recovery process due to broken imports; what is status of project after broken import?

General Workflow Considerations

• Use OOTB workflows whenever possible• Spec and test any changes early in CA planning• Build rollbacks into workflow process• Need robust and not overly complex workflows• Workflows are not easily modifiable in production

Spec and review BCC use cases and modifications to standard workflows

• Customize BCC only when business process absolutely requires a change• Ensure that appropriate business process and checks are in place for deploying assets Avoid

automated deployment workflows, unless a business necessity. Review any automated deploymentworkflows to ensure all error conditions are handled appropriately.

Consider recovery and disaster usecases

• Anticipate corruption of Content Administration DB• Anticipate problems with any automated import process• Anticipate interesting BCC abuse by end users• Expect concurrent updating by multiple BCC users including reverts• Spec and test revert / undo functions• Try to ensure human recoverable situations that don't require SQL / DB modifications to fix CA -

ensure that workflow allows for delete project after a disaster or other failure

Size the versioned and un-versioned content and establish time estimates for deployment activity

• Determine the update rate estimates (e.g. 1/day) and the number of assets - know whatexpectations are for system

• What is the size of file assets?• What are the time estimates for full and incremental deployment?

Review schema and repositories to ensure they are deployable

• Add references attribute to the repository definition file• Conduct delete all items test on every repository that has versioned content

Other checks

• Install all required hotfixes for given version of product• Supported platforms• Network check, bandwidth, firewalls, general topology• Ensure customer understands long term technical expertise requirements, CA expertise is essential

47626319 ca best practices 9

Documents