business requirements and goals.doc

Hanoi Telecom Company (HTC) ESM Proposal

CA Technology Services© 2006 Computer Associates International (Thailand) Co., Ltd.

Project Name eHealth Proposal

Customer HTC

Last Saved Date June. 1st, 2006

Revision 1.0

CONFIDENTIALIssue Date: 5 June 2006 Issue Number: 2Issued By: Apisak Hajima

Name Organizational Position Action Date Reviewed

Apisak Hajima Consultant Manager, CA-Thailand

Author

CONFIDENTIALIssue Date: 31 May, 2006 of 46Issued By: Apisak Hajima

Change History:

Revision Date

Last Revision By Reason for Change

Related Project Documents:

HTC requirements document for Enterprise Systems Management (ESM)

CA product documentations

GlossaryTerm Definition of term


TABLE OF CONTENTS

1. BUSINESS REQUIREMENTS AND GOALS.....................................................................................4

1.1. BUSINESS DRIVERS.........................................................................................................................41.2. SOLUTION REQUIREMENTS.............................................................................................................41.3. SOLUTION ARCHITECTURE BOUNDARY.........................................................................................5

2. BUSINESS MODEL............................................................................................................................6

2.1. ORGANIZATIONAL CONTEXT..........................................................................................................62.2. SOLUTION OVERVIEW....................................................................................................................7

3. SOLUTION ARCHITECTURE MODEL.........................................................................................8

3.1. SOLUTION COMPONENTS..............................................................................................................323.2. SOLUTION COMPONENTS..............................................................................................................323.3. HARDWARE REQUIREMENT...........................................................................................................36

4. SAMPLE REPORT PRODUCE BY CA’S EHEALTH SUITES..................................................37

4.1. MPLS BANDWIDTH UTILIZATION................................................................................................374.2. SERVER UTILIZATION...................................................................................................................384.3. SERVICE PERFORMANCE...............................................................................................................384.4. APPLICATION RESPONSE................................................................................................................39

5. QUALITY ATTRIBUTES................................................................................................................40

5.1. RELIABILITY.................................................................................................................................405.2. AVAILABILITY..............................................................................................................................405.3. SERVICEABILITY...........................................................................................................................405.4. BEST PRACTICE COMMENTARY....................................................................................................405.5. ARCHITECTURE COMMENTARY....................................................................................................40

6. TRAINING.........................................................................................................................................41

6.1. EHEALTH AND SPECTRUM END USER..........................................................................................416.2. EHEALTH AND SPECTRUM OPERATIONS & ADMINISTRATOR......................................................41

7. PHASING AND ROADMAP............................................................................................................43

7.1. TECHNICAL ASSUMPTIONS...........................................................................................................437.2. SOLUTION PHASES AND TECHNICAL MILESTONES.......................................................................437.3. MAJOR TECHNICAL ACTIVITIES...................................................................................................44

8. Appendix..............................................................................................................................................45


1. BUSINESS REQUIREMENTS AND GOALS

Hanoi Telecom Company (HTC) actually deploys a nation-wide CDMA2000 network in Vietnam. In order to support the network and business operation, HTC have deployed an IP Data Core Network with POP in HNI, HCM and DNG. The core network will mainly provide connectivity for HTC agencies around the country to the centralized Billing and Customer Care System in Hanoi, connectivity between HTC offices for the back-office activities (IP Phone, Corporate ERP, DNS, Email, Intranet, etc.) as well as access to value-added service servers (SMSC, VMS, WAP GW, etc.) in Hanoi Datacenter from nation-wide switches.

HTC have also the main datacenter in HNI and a datacenter front-end site in HCM. The servers and applications in the datacenters belong to 3 main groups: Telecommunication Services (SMS Gateway, WAP Gateway, Video Streaming, etc.); Billing and Customer Care; and Corporate Back-Office.

This document is a proposal made by CA for the CDMA2000 networks of Hanoi Telecom Company. It provides a high level description of the solution in terms of functions and software based on CA’s eHealth Suite for mobile wireless networks. Detailed architecture design and implementation plan will not be covered

1.1. BUSINESS DRIVERS

Due to current and expected rapid growth of the business and demands on Information Technologies, HTC must become proactive rather than reactive in monitoring and managing the environment. It is no longer acceptable to fix a problem in a short amount of time. Today’s business environment requires that situations that could lead to problems be detected and corrected avoiding impact on the users of the services.

HTC has begun to look into end-to-end management. They have identified performance management as the key criteria but by implementing Fault, Performance, Availability and Capacity Management, they aim to have a central view of all IT services provided. In case that a problem occurs, it will be detected automatically, and affected services will be easily identified, providing them a means for proactive management.

1.2. SOLUTION REQUIREMENTS

HTC support staff is having difficulty reacting to system outages that occur throughout highly distributed their environment. Technicians are responding to availability incidents in a reactionary mode. Causes of these failures many times could be better addressed by proactively monitoring the related infrastructure and correcting the situations that lead to the outages before the outage occurs.

Certain objectives of the proposed solution are as follows:

Availability, Fault and performance management of HTC network and IT infrastructure.

Real-time fault/performance problem detection capability (device physical problem, threshold crossing, etc.)


Performance Tuning – aim to improve better network service – proactively checks, reduce bottlenecks, tune and optimize systems, improve Quality of Service and optimize the network.

Troubleshooting - Increase network’s reliability and availability, allow user to accomplish work more effectively and maximize productivity.

Integrated with 3rd party management software to get information about the managed devices, software systems from 3rd party database. Required 3rd party management software includes Cisco CiscoWorks/ Nokia Network Voyager/ Oracle Enterprise Manager.

Planning – network statistics gathered could aid us in the performance planning requirements.

Server centric environment (3 tie architecture)

o Critically relies on network LAN/WAN to function.

o Very different from host-based environment

Reduced Resources to support - Limited availability of experience Network personnel to manage a large LAN or WAN. It is best to have a simple to use and well integrated tools to automate network management and improve the productivity of existing personnel.

Distributed / aggregated data collection architecture. Multiple collection servers shall be able to be deployed in different segments and send backward information to the centralized database.

GUI console and Web interface for both fault and performance management features. The interface will provide several features to easy the alarm management including filtering, sorting, etc…

1.3. SOLUTION ARCHITECTURE BOUNDARY

The following items are considered “out of the scope” for this engagement:

Implementation of a fail-over environment.

Implementation of distributed environment.

Implementation of issue tracking and resolution processes.

Integration to trouble ticketing and alert application.


2. BUSINESS MODEL

2.1. ORGANIZATIONAL CONTEXT

HTC has a centrally managed distributed environment. The primary central site is located in HNI. All mission critical systems are located in the central site. Management and monitoring are to be done at the central data center. Components to be monitored include networks, servers, application and possibly response time of clients.

HTC have also the main datacenter in HNI and a datacenter front-end site in HCM. The servers and applications in the datacenters belong to 3 main groups: Telecommunication Services (SMS Gateway, WAP Gateway, Video Streaming, etc.); Billing and Customer Care; and Corporate Back-Office.

HNI Office & Corporate Intranet

LAN

LL 2Mbps

GE

FE

Contact Center

Flagship Shop

FEHNI-CR01

Leased-Line

Leased-Line

FE

Network (ATM Switch)

DNG Office LAN

LL 2Mbps

Flagship Shop

LL 2Mbps

DNG-CR01

FE Network (ATM Switch)

HNI MainDatacenter

HCM Office & Corporate

Intranet LAN

LL 2Mbps

HCM DC LAN

GE

FE

Flagship Shop

FE

Dealer Access NW

HCM-CR01

FE

Network (ATM Switch)

HCM FEDatacenter

Leased-Line


2.2. SOLUTION OVERVIEW

The proposed solution will centrally monitor and manage the network, server, and systems environments. The solution will provide management with a GUI view of the current state of the environment customized to their needs. The system can also be used to monitored response time of in house developed applications by injecting response agent to the client PC.

Network and System specific performance and availability reports will be scheduled for execution, and published to the eHealth Web Portal and IT Services consumption.

Real time reports and configuration/control of network performance data, and network device information are available to Network Services through an administrative client. Historical network reports, reports of collected data, and configuration/control of Information Portal and Service Level Management are available to IT Operations via web access.


3. SOLUTION ARCHITECTURE MODELThe proposed solution is based on the established eHealth Suite. Solution will address fault/performance management and reporting for network and systems.

3.1 EHEALTH SUITE OVERVIEW

The CA’s eHealth Suite is an integrated fault and performance management software solution that manages the components of the entire infrastructure—networks, systems and applications. It receives performance information either by SNMP polling from network devices, SystemEdge agents, Service Availability, RMON2 data from probes, “push” technologies from Application Response agents, or by importing it from external element and network management systems. It also receives SNMP traps from network devices, systems, and applications.

3.2 PERFORMANCE REPORTING AND DATA ANALYSIS

Performance and fault information is stored in the eHealth database for processing by sophisticated analysis and reporting engines. These engines provide one of eHealth’s competitive advantages—the ability to “normalize” data to present performance and fault information in a consistent, common way to users by abstracting the underlying technical nuances of the managed devices, systems, and applications into a uniform display. In today’s multi-technology networks, operators need management software with the intelligence to not only recognize the key performance and fault management information contained within the devices, systems, and applications, but also to present this information in a uniform way, no matter which vendor supplies the equipment. eHealth Suite provides this function for several hundred devices providing out-of-the-box support for different network technologies, applications, and databases from hundreds of vendors.


eHealth allows operators to group monitored devices, systems, and applications into multiple logical groups to accurately represent the infrastructure being managed. Groups can represent the geographic distribution of the monitored infrastructure, the infrastructure of a particular customer, or aggregate devices of a particular type.

eHealth reports are run on demand or on a scheduled basis, either on individual devices or on groups of devices. Reports may contain threshold values for particular performance statistics, as well as a display of the historical average of the performance statistic over a given baseline period. The display of this information helps the operator understand the context of the most recent performance relative to past performance so trends can be identified. In addition, the reports provide actionable performance information targeted to different users and can be viewed on the Web, in PDF or ASCII format.

3.3 FAULT MANAGEMENT

eHealth Live Health is one of the Suite’s fault management products. It applies unique, proprietary, algorithms to the performance data stored in eHealth’s database that send SNMP traps when performance violates user-defined thresholds. By setting these thresholds based on past performance, rather than on arbitrary fixed values, operators can be notified when performance deviates from expected behaviour, and can take action before performance degrades to the point where service is impacted. Live Health thus leverages performance management data to provide proactive fault management like no other software package.

Proactive Service Assurance

The key to true proactive service assurance is to identify problems before faults occur. The ability to identify trends in service degradation, before customers are affected, depends on identifying abnormal performance. To do so requires an understanding of


the normal expected performance. eHealth gathers and analyzes critical performance parameters from the entire range of devices comprising today’s data networks, allowing service providers the invaluable capability to identify and correct service degradation indicated by abnormal performance parameters. Proactive service assurance is critical in reducing OPEX.

Live Health provides out of the box exception profiles consisting of predefined intelligent rules and applies intelligent algorithms to the historical performance data gathered in the eHealth database to enable operations teams to identify trends in performance that indicate service degradation. Live Health’s Time Over/Under Threshold (TOT) and Deviation From Normal algorithms are powerful tools that allow broadband service providers to maintain high levels of service assurance.

TOT identifies when abnormal performance conditions have persisted beyond predetermined acceptable limits. Because Live Health detects faults and potential outages for any eHealth-trendable variable, TOT can be applied to any element, variable, and technology.


Deviation From Normal goes a step further by comparing current performance against a historical baseline. Measuring performance against a baseline unique to particular environments is an even more powerful level of analysis than gauging over a flat threshold.

Various reports allow you to be proactive: Exception Summary of Health Report provides details for identifies those elements that have experienced problems and the leading exceptions


Situation To Watch of Health Report provides details for addressing issues before they will impact the critical processes

My Health Report enables you to customize multi-chart views of your own selected critical statistics for monitoring the service quality of your infrastructure


Easy Troubleshooting

Live Status Diagram of Live Health also provides a high-level view of the current status of monitored elements in the infrastructure. With Live Status, color-coded icons indicate element status and highlight the business topology, allowing you to quickly determine where the trouble spots are. When you spot a problem, you can quickly drill down for details to understand the nature of the problem.

Live Exceptions Browser can display traps/alarms/exceptions for the entire organization or for a specific subject. Live Exceptions keeps an alarm raised until the problem is resolved or until a Live Exceptions administrator manually clears the alarm. You can also drill down for details to understand the nature of the problem.


Various Reports helps you for easy troubleshooting:

o Summary Section and Element Details section of Health Report analyse problems by providing detailed information like health index, utilization distributions, change leaders, etc

o Top N Report identifies problem elements by showing the top ranking of a list of elements which meet your defined criteria


o At-a-Glance report drilldowns on problem elements by providing a single-page report that correlates key performance statistics for a specified time interval for a particular technology


o Trend Report correlates variable patterns by showing the combinations of user required statistical variables


NETWORK FAULT MANAGEMENT - SPECTRUM

Overview

SPECTRUM performs the three aspects critical to service assurance: Proactive Monitoring, Intelligent Analysis and Service Restoration - features that allow you to SEE CLEARLY, THINK QUICKLY, and ACT EFFECTIVELY.

PROACTIVE MONITORING in the SPECTRUM Assurance Server begins with automatic discovery of your network devices, systems, applications, and security elements within your information technology infrastructure. This discovery process provides your chosen SPECTRUM solution with information regarding the physical and logical (VLAN) connections between the devices and applications within the infrastructure. Your SPECTRUM suite then communicates with devices and applications in the infrastructure using a variety of industry-standard protocols and vendor-specific proprietary protocols to monitor performance and availability. All of our SPECTRUM solutions contain management modules that collect data and diagnose problems with over 1000 hardware devices and software applications from a large number of vendors. SPECTRUM uses these management modules to provide access to detailed information on devices' configuration, performance, and operating activity.


INTELLIGENT ANALYSIS in the SPECTRUM Assurance Server utilizes the real-time infrastructure model and historical statistics created during data collection. SPECTRUM normalizes large volumes of data, correlating hundreds of thousands of problem notifications, and suppresses meaningless or repetitive alarms. The SPECTRUM solution you choose will use our patented Inductive Modeling Technology™ to consolidate, correlate, and associate problem notifications and alarms, with an understanding of the relationships between infrastructure elements, to deliver real-time analysis of the root-cause of the infrastructure failure. SPECTRUM's ability to identify infrastructure problems in real-time allows our customers to anticipate infrastructure performance problems before they affect end users.

SERVICE RESTORATION in the SPECTRUM Assurance Server uses the results from the previous monitoring and analysis to resolve identified infrastructure problems. Your SPECTRUM solution will first seek to automate corrective action through direct communication with the affected infrastructure component. If automated correction is not possible, SPECTRUM suggests manual remedies. The SPECTRUM solution's automated repair capabilities and integration with technical support systems will dramatically reduce training requirements as technical personnel will be able to manage infrastructure components from multiple vendors through one software package.

Key Fea tures

Fault Isolation — Suppress those hundreds of unnecessary alarms Root-cause Analysis — Pinpoint the problem down to the port level and accelerate

repair Service Level Management — Patented solutions ensures compliance with

departmental and customer Service Level Agreements Auto-Discovery — Know what's out there...and where it is Performance Management — Ensure the quality of services you deliver to the business Configuration Management — Manage adds, moves, and changes with greater

efficiency Standards-based Data Access — Integrate with all of your existing IT and service

investments Multi-User Role-based Interfaces — Provide "Anywhere-Anytime" access Support for Voice, Video, & Data Convergence — Ready when you are to tackle the

on-demand services Scalable, Distributed, & Fault-tolerant for Global Reach - Don't get caught using

limited point tools Systems & Security Management Integration — Deliver a unified view of the IT

infrastructure Flexible Event Management System — Add your own intelligence to SPECTRUM Alarm Impact Analysis — Prioritize critical alarms based on "what" and "who" is

impacted New Device Support — Manage anything from anybody - legacy to bleeding-edge

Key Benef i t s

Keeps your business on-line, available and performing Ensures compliance with business-unit Service Level Agreements Assures the reliability and performance of E-Commerce initiatives Actively manages newly defined business processes in ERP deployments


Integrates complex, multi-vendor environments in the event of mergers and acquisitions Supplies real-time feedback into CRM systems Automates IT Asset information gathering for maintenance savings Proven Experience in critical non-stop, downtime-free environments

Spectrum Service Manager

SPECTRUM Service Manager is a key ingredient when your organization is seeking better IT alignment and business process fusion. SPECTRUM’s Service Level Intelligence™ is leveraged to provide real-time and historical management of business processes, SLAs and customers. Rather than continuing to manage within a particular vertical technology silo (LAN vs.WAN, Network vs.Application), SPECTRUM Service Manager enables horizontal, cross-silo management in alignment with business process reliability. SPECTRUM Service Manager understands the physical and logical relationships between the availability and performance of IT infrastructure components (network, server, database, application, security) and the critical services and customers they are designed to support. Related IT components are correlated into logical business services such as E-mail, Internet Access, Order Entry, Finance and more.

SPECTRUM Service Manager bridges the gap between operations and customer care Help-Desks by depicting infrastructure reliability problems with service impact and affected customers. Customers and SLAs are tied to the business processes so faults can be prioritized based on the importance of the services that are affected. Real-time alarms are generated, warning of service outages and impending SLA violations — including the root cause — allowing them to be addressed quickly before the business is severely impacted. A service dashboard provides at-a-glance health status. Historical reports show past performance and details of degradations and outages, allowing the business to find new ways to improve the services over time.


Service Management and Root Cause Analysis

SPECTRUM’s auto discovery can automatically find new IP manageable elements and intelligently maps the relationship between each element. SPECTRUM’s Technology Relation Mapping processes understand the physical and logical relationships and performs multi-layered discovery, applicable to transport layer technologies such as Frame Relay and ATM, as well as logical transports such as MPLS-VPN.

The discovered relationships is stored in SPECTRUM’s knowledgebase and will be utilised by SPECTRUM’s Inductive Modelling and Root Cause Analysis to understand root cause of failures and provide impact analysis.

SPECTRUM utilises a service oriented architecture similar to the illustration below:


Example of the Service Topology of an IT Service

With the service topology in SPECTRUM’s knowledgebase, SPECTRUM is able to perform service and customer impact analysis with ease and provide this information to customer support representatives using SPECTRUM’s OneClick Console.


SPECTRUM OneClick Console show how an element failure affecting the different services and service subscribers.

Intelligent Event Correlation and Root Cause Analysis

Event and Alarm Distinction

SPECTRUM clearly defines alarms as specialised events that require attention and follow-up action. This approach helps to minimise the information overload most often seen in network operations centres. SPECTRUM’s event processing platform with an open event integration interface (Southbound Gateway) also enables to consolidation of events at the NOC to provide a single point of correlation and unified view of the infrastructure.

Out of the box, SPECTRUM provides correlation of events for over 1,000 different types of elements. This enables network operators to focus on critical events that demand their attention.

SPECTRUM’s event processing and correlation capabilities are also customisable to ensure it has the flexibilities to meet the unique requirements of different customers.

Rules-based Correlation

Event Rules in SPECTRUM’s Event Management System provide a more complex decision support system to indicate how events should be processed. Event Rules can be applied to look for a series of events to occur on a model in a certain pattern or within a specific time frame or within certain data value ranges. Event Rules can be used to generate other events or even alarms. If events occur such that the preconditions of a rule are met, another event may be generated allowing cascading events; or the event can be logged for later reporting/troubleshooting purposes; or it can be promoted into an actionable alarm. SPECTRUM provides six customizable Event Rule types that form the basis of the Event Management System rule-based engine. Examples of Event Rule


types are: Event Pair, Event Combo, Event Sequence, Event Duration, Event Frequency and Event Condition.

Root Cause Analysis and Impact Analysis

Root cause analysis for fault alarms are performed automatically without the rules writing and it is part of SPECTRUM’s model-based correlation method known as Inductive Modelling Technology (IMT).

To supplement SPECTRUM’s IMT, SPECTRUM has added rules-based correlation and condition correlation, using up to three methods of correlation to allow for a wider scope of automated correlation.

Inductive Modelling Technology

A simple example of IMT in action can be demonstrated by a network router port transition from UP to DOWN. If a port model receives a LINK DOWN trap, it has the intelligence to react by performing a status query to determine if the port is actually down. If it is in fact DOWN, then the system of models will be consulted to determine if the port has lower layer sub-interfaces. If any of the lower layer sub-interfaces are also DOWN, only the condition of the lower layer port will be raised as an alarm. An application of this example can be described by several Frame Relay Data Link Control Identifiers (DLCIs) transitioning to INACTIVE. If the Frame Relay port is DOWN, IMT will suppress the symptomatic DLCI INACTIVE conditions and raise an alarm on the Frame Relay port model. Additionally, when the port transitions to DOWN, IMT will query the


status of the connected Network Elements (NEs) and if those are also DOWN, those conditions will be considered symptomatic of the port DOWN, will be suppressed, and will be identified as impacted by the port DOWN alarm. Root cause and impact are determined through IMT’s ability to both listen and talk to the infrastructure.

Condition Correlation Technology

SPECTRUM’s policy-based Condition Correlation Technology enables correlation for the following:

Disparate event streams or conditions Across sets of managed elements Within managed domains Across sets of managed domains Component conditions as they map to higher concepts such as business services

or customer access

Condition based correlations are very powerful and provide a mechanism to develop Correlation Policies and apply them to Correlation Domains. When applied to Service Level Management, Correlation Policies can be likened to metrics of an SLA; and Correlation Domains can be likened to service, customer or geographical groupings. There are times when the only way to infer a causal relationship between two or more seemingly disparate conditions is when those conditions occur in a common Correlation Domain. These mechanisms are necessary when causal relationships cannot be discovered through interrogations or receipt of events to/from the infrastructure components.


© 2005 Computer Associates International, Inc. (CA). All trademarks, trade names, services marks and logos referenced herein belong to their respective companies.

Slow Web ServerResponse Time

Is Caused By

Loss of Bandwidth

Condition Correlation Example

The scenario is that users are accessing a service hosted at the corporate data centre’s web server farm and are experiencing a performance issue. The reason of the performance degradation is caused by the loss of bandwidth as the one of the load-sharing links to the users is down.

Such complex correlations can be applied to many different types of scenarios, across applications, systems and other IT infrastructure components within the service delivery infrastructure.

SPECTRUM provides the user the flexibility and power to define and refine these correlations to more intelligently pinpoint to the root cause of the problems.

3.5 INFORMED CAPACITY PLANNING

Various Reports helps you for capacity planning:

o Supplemental section of Health Report provides details for identification of under utilized elements

o Capacity Projection of Health Report provides details for estimating the amount of capacity needed by your site in the future


o Capacity Provisioning of Health Report provides details for identification of those elements that may need an upgrade during the specified provisioning lead-time window


o What-If Report provides what-if scenario to adjust factors for capacity and demand to determine how changes will affect the data for preparation before problems occur

3.5 SERVICE LEVEL MANAGEMENT

eHealth Service level reports provide a vital tool for managing Service Level Agreements (SLAs). Using historical data from the eHealth Console, the reports analyze and display service level information for an enterprise, region, department, or business process. Service Level Reports use historical data from the eHealth Console that collects vital information.

Service Level Reports provide you with visibility on all key metrics, including:

o Availability—Percentage of time active and running

o Latency—Transit delay, which is measured by a ping

o Bandwidth Utilization—Traffic processed as a percentage of element speed

o Line Utilization—The sum of the utilization of each interface divided by the number of interfaces

o CPU Utilization—Utilization of each CPU divided by the number of CPUs

o Health Index and Health Exceptions—A composite number assigned to elements based on key performance metrics. When the Health Index exceeds a minimum threshold, an exception is generated

Service Level Reports provide easy-to-read presentations of quality information in precise formats and levels of detail tailored to the needs of each major stakeholder as shown below:


Executive Report

This concise report for senior managers presents a single-page summary of service-level performance across the infrastructure.


IT Manager Report

This report provides day-to-day IT managers with a more detailed picture of trends and service performance by business units and individual devices


Service Customer Report

This report provides a site-specific summary of service-level and performance information for end users


Business Unit Report

This multi-technology summary report is designed for managers of a business process or unit. It provides a comprehensive look at all of the technology components necessary to support that particular business process or unit.


3.1. SOLUTION COMPONENTS

The proposed bills of materials are as follows;

o Spectrum XSight 250 Nodes x 1

o OneClick Operator x 2

o E2E Console x 1

o Universal Elements x 900

o LiveHealth x 1

o SysEdge x 60

o Oracle AIM x 10

o Service Availability x 1

o Application Response x 10

o Servicedesk knowledge tools

o Service desk agent

3.2. SOLUTION COMPONENTS

eHealth E2E Console

The eHealth E2E Console includes:

Console graphical user interface for both end-users and administrators

Polled data collection from certified network devices, eHealth-ready agents, and other certified agents up to the number of elements purchased. NOTE: polled data can be from directly polled data, Data Integration Modules, and data imported through the DataSync API. Includes support for the following:

o Network technologies: LAN/WAN, Router/Switch, Frame Relay, ATM, Remote Access, Wireless LAN, Mobile Wireless, DSL, VoIP, Optical, SAN, Cable, IP QoS, and end-to-end Frame Replay PVCs.

o Systems and applications: workstations and servers with SystemEDGE, CA’s application insight modules, and other CA certified SNMP system agents.

o Response agents: Application Response Agents, Service Availability agents, and Cisco Service Assurance Agents

Data storage in an Oracle database

Sixty SystemEDGE agent and ten Oracle AIM for self-management of the eHealth E2E Console.

Software support for CA agent deployment, configuration, event management, diagnostic reporting, and asset management.


Polled data analysis and report scheduling, which includes Trend, MyHealth, Top N, At-A-Glance, Health, and Service Level reports

900 eHealth Universal elements; additional elements must be purchased separately. Every element that the eHealth Console polls for information must have an eHealth System or Universal element. NOTE: eHealth-ready agents have the appropriate number of eHealth elements bundled.

A single SystemEDGE agent to manage the eHealth System itself.

One Spectrum server for network fault management, and two OneClick console for operator

Universal Elements

Element license for all eHealth supported technologies with the exception of servers and workstations. One license grants the right to collect performance data from one element. An element is any part of an infrastructure that eHealth can analyze. eHealth can analyze a physical element, such as a specific port on a specific card of a specific router. It can also analyze a logical element, which refers to the logical purpose for a device or component, such as a network link. See the certification section of the CA web site for supported devices

LiveHealth

LiveHealth is the ehealth suite’s real time fault management solution to identify outages, potential outages and sources of delay across applications, systems and networks. Live health includes:

o Real-time data analysis from both polled data and snmp traps for intelligent alarm generation

o Live health browser for alarm display

o Live status - a dynamic high-level diagram to quickly understand what’s working and what needs attention

o Live trend - a dynamic reporting tool for presentation of performance data as it is collected

o 1,000 live health elements. Every element that live health provides polled data analysis for requires a live health element. Additional live health elements can be purchased.

SystemEdge

With SystemEDGE agents, you can distribute tedious management tasks away from overworked staff and down to the host systems— closest to where problems occur. SystemEDGE operates autonomously on host workstations and servers, continuously monitoring changing conditions and providing detailed information about the system configuration, status, performance, users, applications, filesystems, and other critical resources. The following are some of the major capabilities:

Self-Monitoring


SystemEDGE detects threshold violations for critical status such as CPU utilization, memory capacity, file system space, and other important data in real-time, sending an SNMP standard trap to Live Health Fault Manager and automatically fixing problems according to your pre-specified instructions.

Process and NT Service Monitoring

SystemEDGE immediately alerts you to failures and can automatically re-start processes and services. SystemEDGE lets you track application impact on system resources—identify CPU hogs, and detect memory-leaking processes and I/O bottlenecks before they bring down your system. SystemEDGE takes NT Events and converts them into SNMP traps that can be sent to any SNMP-compliant manager, giving you SNMP management of your NT servers.

Log File Monitoring

SystemEDGE can watch system logs and text log files by specifying the messages to watch. When matching messages are logged, SystemEDGE notifies you, and can even take action like sending an e-mail or paging technical support staff.

System, Software, and Hardware Assets

SystemEDGE automatically discovers installed software and hardware devices so you always know what you have.

SystemEDGE Extensible, Plug-In Architecture

You can add your own SNMP MIB variables to leverage customized code you already own, and control SNMP management of your homegrown applications. NT Registry and Perfmon extension support let you report on additional registry parameters and performance data.

Service Availability

eHealth Service Availability is a plug-in to eHealth SystemEDGE that monitors the response and availability of critical services. Through the Service Availability plug-in, the SystemEDGE agent performs active test transactions to measure response time and track availability for the following Internet applications:

o Domain Name System (DNS)

o File Transfer Protocol (FTP)

o Hypertext Transfer Protocol (HTTP)

o Secure HTTP (HTTPS)

o Packet Inter-Network Groper (PING)

o Network News Transfer Protocol (NNTP)

o Post Office Protocol version 3 (POP3)

o Simple Mail Transfer Protocol (SMTP)

o Transmission Control Protocol connections (TCPConnect)

o Custom test


Service Availability also enables you to create custom scripts and programs to define service tests for additional services. After you create the custom script or program, you can use it to measure availability from any system within the network.

Service Availability enables you to proactively measure the services you provide. With active, repeatable service tests, you can establish a “service baseline” to ensure service quality and continuously monitor for potential degradations.

Service Availability also helps you identify usage patterns in your network for utilization, capacity, and performance. By understanding the response experience, you can monitor the reliability of your service offerings, proactively isolate areas in need of attention, and guarantee service levels with confidence.


3.3. HARDWARE REQUIREMENT

Two Solaris Based Servero Processor: Ultra Sparco Memory: 4 GB and aboveo Hard Disk: 100 GB (SCSI and RAID 0 preferred) and aboveo Operating System: Solaris 9

One for Spectrum server and the another one for eHealth Server.


4. SAMPLE REPORT PRODUCE BY CA’S EHEALTH SUITES

4.1. MPLS BANDWIDTH UTILIZATION


4.2. SERVER UTILIZATION

4.3. SERVICE PERFORMANCE


4.4. APPLICATION RESPONSE


5. QUALITY ATTRIBUTES

5.1. RELIABILITY

With configuration tuning, eHealth is capable of scaling upwards of several thousand elements per single management server. Furthermore, eHealth management servers can be arranged in a distributed environment, providing virtually a limitless ceiling on the number of nodes that can be managed in client environments.

5.2. AVAILABILITY

All management servers will be a part of company wide backup solution for complete and incremental backups, which also provide backups for “open files” and databases.

This solution defines a high availability design which deploys a failover strategy, as a vital part of availability. CA Best Practice recommends that the primary and backup eHealth servers be installed in a Cluster Environment.

5.3. SERVICEABILITY

Ongoing maintenance will be handled using the Computer Associates SupportConnect web-based support system and current maintenance will be deployed after testing in the lab environment.

CA Solutions include a patch ‘wizard’ for a standardized and reliable format of applying and tracking routine maintenance of the products. Service packs are released on a periodic basis to conform to OS platform and version updates that may occur, and extensive migration plans, along with upgrade utilities, are provided for new version releases to the product.

5.4. BEST PRACTICE COMMENTARY

This solution is architected to meet ITIL Availability and Capacity Management disciplines as well as CA Best Practice.

5.5. ARCHITECTURE COMMENTARY

This solution establishes a solid monitoring and performance management foundation for both servers and network. A phased implementation approach is recommended to first establish monitoring and fault management systems. Second phase include performance monitoring for both servers and network. Solutions high-availability should be a future consideration.


6. TRAINING

6.1. EHEALTH AND SPECTRUM END USER

The eHealth End User class is designed for IT professionals responsible for operations such as fault management, troubleshooting, performance management, and capacity planning and network administration. In this class you will learn techniques to help you more effectively use information in eHealth reports and displays. In addition, you will learn recommended workflow steps to follow so that you can use this information to solve day-to-day infrastructure problems. By using realistic sample data, case studies and hands-on class exercises you will learn how to:

o Optimize IT performance

o Enable informed capacity planning decisions

o Spot trends

o Speed time to problem resolution

o Enable quick roll-out of revenue generating applications and services

o Identify potential problems before they affect your users and customers.

o Use best practices for Web based navigation and report drill down

Course Number

Title (Click on title for details) Duration

UN600 eHealth® End User 2 day(s)

UN402 SPECTRUM®: Level 1 Core Administration

3 day(s)

6.2. EHEALTH AND SPECTRUM OPERATIONS & ADMINISTRATOR

eHealth Operations is a lecture-lab based course designed to train administrators with the basic skills and knowledge needed to manage their eHealth systems primarily through various user interface. Several hands-on labs and workshops will reinforce the lecture by providing students actual product utilisation, discovery, scheduling, and reporting.

o How to properly plan the expansion of your eHealth systemo How to add and delete eHealth licenseso How to discover new managed resources through normal and scheduled-discoveryo How to effectively manage the eHealth Poller configurationo eHealth database management including saving, restoring and roll-up processo How to create Service Profiles and how to apply them to reportso How to build reporting groups and why proper planning is vitalo How to create, customize (through GIU), and generate eHealth reportso Web Server management of users and reportso Best practices of managing your eHealth system


http://gems.ca.com/Gemsmarketing/CourseDesc.asp?crsCode=UN402&clsType=L&ProdRel=7.1%20SP2&ProdGroup=SPE&Solution=&LC=&Datefrom=&DateTo=&Owner=NA-EDUC

http://gems.ca.com/Gemsmarketing/CourseDesc.asp?crsCode=UN402&clsType=L&ProdRel=7.1%20SP2&ProdGroup=SPE&Solution=&LC=&Datefrom=&DateTo=&Owner=NA-EDUC

http://gems.ca.com/Gemsmarketing/CourseDesc.asp?crsCode=UN600&clsType=L&ProdRel=5.7&ProdGroup=EHA&Solution=&LC=&Datefrom=&DateTo=&Owner=NA-EDUC

o Best practices of managing your infrastructure with eHealthCourse Number Title (Click on title for details) Duration

UN601 eHealth® Operations 3 day(s)

UN403 SPECTRUM®: Level 2 Advanced Core and Applications

2 day(s)


http://gems.ca.com/Gemsmarketing/CourseDesc.asp?crsCode=UN601&clsType=L&ProdRel=5.7&ProdGroup=EHA&Solution=&LC=&Datefrom=&DateTo=&Owner=NA-EDUC

7. PHASING AND ROADMAP

7.1. TECHNICAL ASSUMPTIONS

CA’s personnel will be granted access to all hosts, systems, and network devices as necessary for the installation of required software.

All prerequisite hardware and software will be available on-site prior to project start.

All hardware will meet the system requirements as detailed in section 3.3. The client’s IT staff will be available on an as-needed basis for assistance in

troubleshooting and to receive knowledge transfer. Client will provide access to staff who can provide input to instrumentation and

notification requirements There are no Firewalls involved in this implementation Each managed device will be no more than 3 hops away from its’ manager All SNMP Read Only Community Strings are the same across all network

devices This architecture may need to be scaled, horizontally and/or vertically, based on

further findings

7.2. SOLUTION PHASES AND TECHNICAL MILESTONES

This engagement will be implemented using the following phases and milestones:

1. Project Schedule Reviewed with Customer The technical startup phase will launch the project by preparing the project plan

and schedule, and obtaining customer acceptance on both documents.2. Prepare and deliver a Solution Architecture Specification for this solution.3. Implementation Plan

Phase I – Installation Install eHealth Suite Server Install Spectrum Server Install OneClick Clients Install LiveHealth Install SystemEdgePhase II – Configuration Configure eHealth Server Configure Spectrum Server Configure OneClick Clients Configure LiveHealth Configure SystemEdge and AIMs Configure Service Availability Customize and Schedule specific reports

4. Completion of Test Plan The completion of the execution of the test plan will be confirmed by obtaining

customer acceptance of the completed test plan.5. Implemented Components in Production

The components identified in the implementation plan are placed in production in turn as specified in the plan.

6. Complete Review of Implementation The users of the new system have been trained and the complete solution has

been reviewed with the customer.


7.3. MAJOR TECHNICAL ACTIVITIES

The following high-level activities will take place during this engagement:

Technical Startup Perform engagement management setup activities Prepare for engagement Review implementation information with customer Review project schedule with customer

Architecture and Design Conduct briefing with customer project sponsor Conduct kick-off session Gather project related information and requirements Document requirements Modify test plans as required Review implementation plan with customer

Implementation Install/configure eHealth Suite Install/configure Spectrum Server Install/configure OneClick Clients Install/configure LiveHealth Install/configure SystemEdge Install/configure Service Availability

Architecture Review and Planning Review implemented architecture, revise and update as required Expand architecture to incorporate remote locations Discuss obtaining system and network performance metrics

Technical Closure Conduct knowledge transfer for existing CA Network and Systems Management

solution on new reporting capabilities ‘Train the Trainer’ sessions for HTC in preparation for educating their customer base Review implementation with HTC Conduct final review session with HTC


8. APPENDIX