autonomic computing © 2003 ibm corporation thomas studwell autonomic computing - problem...
TRANSCRIPT
Autonomic Computing
© 2003 IBM Corporation
Thomas Studwell Autonomic Computing - Problem [email protected]
Common Base Events
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Objective
To open dialog toward agreement of a Common Base Event specification.
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Agenda
Problems Facing Today's Data Collection The 3 Tuple Canonical Situation Canonical Situation Data Format: Common Base Event
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Problems Facing Today's Data Collection Complexity of eBusiness
Collection of distributed and heterogeneous software and hardware components Variety of Data and Collectors/Adapters
Consume and publish proprietary data formats
Require ad hoc and product specific code Data format and APIs
Design and Standards considerations Standardization of management solution is incomplete
Different skills set to configure, maintain, and tune
Difficult to correlate for e2e problem diagnostics Instrumentation
Many-to-Many
Standards compliance
Customer pain and cost of ownership
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
The 3 Tuple
Complexity of data increases when problem occurs in a multi component solution
Without standards the event data are of little value to autonomic management in problem determination and action in response
To alleviate this, event data are structured in 3 categories The identification of the component that is affected by the situation This is also known as the source of a situation
The identification of the component that is reporting the situation This is also known as the reporter of a situation It may be the same as the source component of the situation
The situation data itself Properties or attributes that describes the situations
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Canonical Situation Situation is defined as the data that a component reports for external consumption
by general/product-specific management applications Situations are commonly communicated through messages logged and or forwarded to a consumer of the data, such as administrative or management tools
Examples of situations includes: memory allocation failure, buffer overflow, i/o failure, etc
Each product reports the situation in their own format, using their own terminology Makes correlating events between products difficult Requires further standardization of notification contents, categorization, and taxonomy of situations
The goal of the common situations is not to drastically change what the components are currently doing, rather, to put some structure and rigor behind how components report situations
Canonical situation, using the common data formats
Canonical representation of the situation is used for analysis
Adapter can be used to convert the data to a canonical situation Create a taxonomy for identifying and classifying situations
Category, Type, Disposition, Scope, Task, etc …
Apply taxonomy to product logs
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Canonical Situation Categories START
These are message that deal with the start up process for a component. Message that indicate that a component has begun the startup process, that it has finished the startup process, or that it has aborted the startup process all fall into this category.
STOP These are message that deal with the shutdown process for a component. Message that indicate that a component has begun to stop, that it has stopped, or that the stopping process has failed all fall into this category.
FEATURE These are messages that announce a feature of a component. Message that indicate things like services being available and services or features being unavailable fall into this category.
DEPENDENCY These are messages that components produce to say that they cannot find some component or feature that they need. Messages that say a resource was not found, or that an application or subsystem that was unavailable, fall into this category.
REQUEST These are messages that a component uses to identify the completion status of a request. Typically these requests are complex management tasks or transactions that a component undertakes on behalf of a requester and not the mainline simple requests or transactions.
CONFIGURE These are messages that components use to identify their configuration. Messages that describe current configuration state and configuration changes fall into this category.
CONNECTThese are messages that components use to identify aspects about a connection to another component. Messages that say a connection failed, that a connection was created, or that a connection was ended all fall into this category.
CREATE These are messages documenting when a component creates an entity. Messages telling that a document got created, or a file was created, or an EJB was created all fall into this category.
REPORT These are the messages that are reported from the component, such as heartbeat or performance information. Data such as current CPU utilization, current memory heap size, etc. would fall into this category.
AVAILABLE These are messages that are reported from the component, regarding its operational state and availability. This situation provides a context for operations that can be performed on the component by distinguishing if a product is installed, operational and ready to process functional requests, or operational and ready/not ready to process management requests.
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Example of Situations Category and Taxonomy
START_SITUATION = (START_SITUATION_NAME, SUCCESS_DISPOSITION, START_SITUATION_QUALIFIER);
START_SITUATION_NAME = “START”;SUCCESS_DISPOSITION = (“SUCCESSFUL” | “UNSUCCESSFUL”);
START_SITUATION_QUALIFIER = (“START INITIATED” | “START COMPLETED” | “RESTART INITIATED” | “STARTING”);
WSVR0200I: Starting application: PlantsByWebSphere START, SUCCESSFUL, START INITIATED
WSVR0221I: Application started: trade3
START, SUCCESSFUL, START COMPLETE
Other examples: WSVR0024I: Server server1 stopped
STOP, SUCCESSFUL, STOP COMPLETED
SRVE0026E: [Servlet Error]-[]: java.lang.IllegalStateException … Primary Message SQL0913N Unsuccessful execution caused by deadlock or timeout Secondary Message
CONNECT, UNSUCCESSFUL, CLOSED
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Distillation of existing work to CBE
Several existing formats were analyzed Common elements define essential elements
Reviewed formats and types include: PD Artifact, TEC Event, Tivoli Log XML, BEI Event, BEI Context, JMX
Notification, SNMP, CIM_AlertIndication, Java 1.4, Apache commons logging, WAS, JRAS
Mappings shown in spec
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
The Data
What component is observer of the Situation
* ComponentIDComponent affected by the problem What Component Observed the Situation
* ComponentIDComponent reporting the problem
extensionNamelocalInstanceIdglobalInstanceIdcreationTimeseverityprioritysituationType
Msg
repeatCountelapsedTimesequenceNumber
msgDataElement
associatedEventscontextDataElements
extendedDataElements
* ComponentID
(policy)
(cor/relation)
(extensibility)
location locationType applicationexecutionEnv
component subComponentcomponentType
instanceId processIdthreadId
For details please refer to the Canonical Situation Data Format: the Common Base Events (ACAB.BO0301)
Situation data
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Canonical Situation Data Format: Common Base Event (CBE)
Canonical Situation Data Format: Common Base EventFacilitates the effective exchange and correlation of data among disparate enterprise applications that support logging, management, problem determination, autonomic computing and e-business functions in an enterprise.
Defines structure of an event sent as the result of a situation, in a consistent and a common format Provides flexibility to allow for adoption to application specific needs
CBE Extensibility Extended Data Element
Allows for product specific/required attributes that are not common across product groups and not accounted for in the CBE
Provides capabilities to add "named" properties name, type, values (or hexValue), and optional children to create a hierarchy of this elements
Provides capabilities to add monitoring and resource usage data
Product Specific Schema Allows to include product specific schema in the "any namespace" of the CBE Schema
<xsd: any namespace="##other" minOccurs="0" maxOccurs="unboundeu
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Common Base Event/Situation Data - Model
ComponentIdentification
location : String
locationType : String
application : String
executionEnvironment : String
component : String
subComponent : String
componentIdType : String
instanceId : String
processId : String
threadId : String
AssociatedEvent
name : String
type : String
MsgDataElement
msgId : String
msgIdType : String
msgCatalogId : String
msgCatalogTokens : String[]
msgCatalog : String
msgLocale : String
ContextDataElement
contextId : Stringtype : Stringname : String
contextValue : String
CommonBaseEvent
extensionName : String
localInstanceId : String
globalInstanceId : String
creationTime : String
severity : short
priority : short
situationType : String
msg : String
repeatCount : short
elapsedTime : String
sequenceNumber : long
version : String = commonbaseevent1_0
11 11
reporterComponentId
11 11
sourceComponentId
0..n
1
0..n
1associatedEvents
0..n
1
0..n
1
resolvedEvents
10..1 10..1
msgDataElement
1
0..n
1
0..n
contextDataElements
ExtendedDataElement
name : String
type : String
values : String[]
hexValue : byte[]
id : String
0..n
1
0..n
1
extendedDataElements
0..n
1
0..n
dataRefs
1
Autonomic Computing
ibm.com/autonomic © July/2003 IBM Corporation
Common Base Event/Situation Data - Model
Common Base Event
extensionName
localInstanceId
globalInstanceId
creationTime
severity
priority
situationType
Msg
repeatCount
elapsedTime
sequenceNumber
msgDataElement
reporterComponentId
sourceComponentId
associatedEvents
contextDataElements
extendedDataElements
Message Data Element
msgId
msgIdType
msgCatalogId
msgCatalogTokens
msgCatalog
msgLocale
Component Identification
location
locationType
application
executionEnvironment
component
subComponent
componentIdType
instanceId
processId
threadId
Component Identification
location
locationType
application
executionEnvironment
component
subComponent
componentIdType
instanceId
processId
threadId
AssociatedEvent
assocationEngineresolvedEvents
AssociatedEngine
name
type...
Context DataElement
contextId
type
name
contextValue
...
Extended DataElement
id, name
type
values
hexValue
dataRefs
…
Extended DataElement …
CommonBaseEvent …