© 2012 ibm corporation tivoli workload automation: planner functionality and recovery actions ©...

25
© 2012 IBM Corporation Tivoli Workload Automation: Planner functionality and recovery actions © 2012 IBM Corporation

Upload: sherman-watkins

Post on 18-Dec-2015

339 views

Category:

Documents


3 download

TRANSCRIPT

© 2012 IBM Corporation

Tivoli Workload Automation:Planner functionality and recovery actions

© 2012 IBM Corporation

© 2012 IBM Corporation

Command Line 8.2 TWS Domain Managers

8.2 TWS Master

Job JobJob

Job

Job

8.2 TWS FTA

TWS Distributed – 8.2.x

Job Scheduling Console

•TMF

Dependency for

JSC connection

•No Open APIs

for external

applications

•no LDAP

support

•Scheduling DB

Schema cannot be

exported

and accessed to

external apps

• Difficult D

B recovery

procedure

© 2012 IBM Corporation

TWS Distributed – 8.6

8.6 FTA8.6 FTA 8.6 FTA8.6 FTA

CLI

8.6 TWS Domain Managers +Backup DM

8.6 TWS Master + Backup Master

WebUI server (TDWC 8.6)

HTTP Load Balancer

DB2 in HA

Job JobJob

Job

Job

© 2012 IBM Corporation

Production Plan Generation

A new script JnextPlan (that replaces the old Jnextday script) creates the production plan.

The following command produces a production plan that starts today at 00:00 (start of day) and ends at 23:59.

JnextPlan

The script allows the creation of a production plan covering multiple days or a few hours. Running the following command results in a production plan that starts today at 00:00 and finishes tomorrow at 23:59.

JnextPlan –for 4800

JnextPlan can be run with a zero minute extension, this updates static information like Workstations, Windows users, Calendars and removes completed Job Streams (Carry Forward must be set to ALL) without adding new job stream instances.

JnextPlan –for 0000

© 2012 IBM Corporation

Production Plan Generation

JnextPlan syntax:

JnextPlan [- from mm/dd/[yy]yy [hhmm [tz | timezone tzname]]] [–to mm/dd/[yy]yy [ hhmm [tz | timezone tzname]]] | [–for [h]hhmm] [–days n]

-from sets the start time of the new production plan. The format of the date is specified in the localopts file; hhmm identifies the hours and the minutes and tz is the time zone. The “start of day” option can be updated with a new command line called “optman” that replace the globalopts file.

-to is the new plan end time; the date format is the same of the from parameter; if it is not specified the default value for this parameter is the "the date and time specified in the -from field + 23 hours and 59 minutes".

-for is the plan extension in terms of time; the format is the following: hhhmm where hhh are the hours and mm are the minutes. If it is not specified the default value is 24 hours.

-days is the plan extension in terms of days.

Default values maintain the backward compatibility!!Use “optman ls” to show the default values (including the “start of day” time) stored in DB (in this release the globalopts file doesn’t exist any more!)

© 2012 IBM Corporation

Jnextday and JnextPlan

reptr –pre Symnewreptr –pre Symnew

schedulrschedulr

compilercompiler

TWS 8.2.xTWS 8.2.x

conman "continue & stop @!@;wait;noaskconman "continue & stop @!@;wait;noask"

wmaeutil.cmd ALL -stopwmaeutil.cmd ALL -stop

stageman

logman

TWS 8.6.0TWS 8.6.0

PlanmanPlanmanPreproduction plan generationPreproduction plan generationSymnew creationSymnew creation

reptr –pre Symnewreptr –pre Symnew

conman “continue & link @!@;noask”conman “continue & link @!@;noask” conman “continue & link @!@;noask”conman “continue & link @!@;noask”

Mak

ePla

n

conman "continue & stop @!@;wait;noaskconman "continue & stop @!@;wait;noask"

Sw

itchP

lan

stageman

planman confirmplanman confirm

conman "continue & startconman "continue & start" conman "continue & startconman "continue & start"

CreatePostReports

reptr -post …/schedlog/M$DATEreptr -post …/schedlog/M$DATErep8 -F …. -i …/schedlog/M$DATErep8 -F …. -i …/schedlog/M$DATE

reptr -post …/schedlog/M$DATEreptr -post …/schedlog/M$DATErep8 -F …. -i …/schedlog/M$DATErep8 -F …. -i …/schedlog/M$DATE

logman UpdateStats

StartAppServer

© 2012 IBM Corporation

Final Schedule & JnextPlan

StartAppServer Checks that WAS is running and starts it if not

MakePlanCreates Symnew and make pre-production report

SwitchPlanStops the TWS agentsRuns Stageman to merge old Symphony and Symnew Confirms the switch of the plan to the plannerStarts the Master and the Symphony distribution

CreatePostReportsCreates post production reports

UpdateStatsRuns logman to update Pre-Production plan and job history and

statisticsIn FINAL runs in parallel to CreatePostReports

MAKEPLAN

SWITCHPLAN

CREATEPOSTREPORTS UPDATESTATS

STARTAPPSERVER

The Final Job Stream is now made up of fivedifferent jobs:

© 2012 IBM Corporation

FINAL Job Stream

© 2012 IBM Corporation

Production Plan status

Planman showinfo This command retrieves the information related to the production

plan status.

© 2012 IBM Corporation

Production Plan Extension

JnextPlan and Production Plan extension When the production plan already exists and a JnextPlan is run, the

production plan is extended (by default for 24 hours). After the extension, the new production plan contains the new instances

related to the extension period and all the job stream instances not yet completed, which are carried forward.

Note: in the TWS 8.6 the Encarryforward keyword is used to specify the “carry forward” property. This keyword is stored in the database and its default value is ALL. During migrating data from a previous version, the value is copied from the previous configuration.

Encarryforward keyword all: ignores if the Carry Forward key is enabled or not in the job streams

definitions, and carries forward all uncompleted job streams. yes: Carries forward only those uncompleted job streams that have the

Carry Forward key enabled .

© 2012 IBM Corporation

Planning considerations

JobStreams not copied into the Current Plan: If they don’t have a Run Cycle If Run Cycle doesn’t result in a ”run” day in the planning

period

Ad-Hoc submission is allowed: If the Job Stream is defined in the database If the Job Stream is not draft and is valid the ON request flag is no longer mandatory (it was already

ignore in previous releases)

© 2012 IBM Corporation

Planning considerations (continued)

DRAFT / ACTIVE definition DRAFT: defined in the database; not used for Production Plan ACTIVE: defined in the database and used for Production Plan Sample Usage scenario: JobStreamA must not run tomorrow Set JobStreamA to DRAFT Extend Current Plan JobStreamA is not included into the Production Plan

VALID FROM definition A JobStream can have multiple versions VALID FROM date specification can differentiate JobStream versions Sample usage scenario:

JobStreamA: JobA -> JobB -> JobC; JobD must be added to workflow.Needs to be ready to run in 2 days when new apps goes live production

Modify JobStreamA and insert new definition like JobA …JobC -> JobDand insert a the VALID FROM date specification

Extend the Plan JobStreamA will have 2 sets of versions in the plan, in accordance with the

dates

© 2012 IBM Corporation

Planning considerations (continued)

Production Plan By default starts at 00:00AM but can be modified By default covers 24 hours (normal workdays) A higher or lower period can be specified Can span few days (during weekends or holidays) or more Once created the first time it is always Extended Extension can varies from a 1 minute to days

Symphony Size Same size and structure of expanded symphony of 8.2.x Longer plans will produce larger Symphony files At lease 512 bytes (1 Symphony record) for each Job

Stream Instance and for each Job Instance

© 2012 IBM Corporation

Jobs

NT Users “Symphony”Job

Streams

Prompts

Workstations

ResourcesCalendars

PlanDatabase

The scheduled workload for one or more production days

The collection of all defined scheduling objects.

JnextPlan

Symphony with more than 24hrs

The Symphony file contains objects needed for production plan period: Workstations, Calendars, Job Streams, Jobs, Dependencies

The JnextPlan runs on Master Scheduler as part of the production plan: JnextPlan extends the Production Plan and create a new Symphony file

The Pre Production Plan contains job stream instances calculated in advance for several days and external dependencies resolved on those instances according to matching criteria

Pre Production(LTP)

Job Stream instances and external dependencies for several days

© 2012 IBM Corporation

Production Plan Extension

Old Symphony

Jo

bs

Ca

len

da

rs

Database

Remove completed job

streams

Add detail for next plan

period

New Symphony

CurrentPlan Extension

10 days

today tomorrow

Pre Production Plan

….J

ob

Stre

am

s

Re

so

urc

es

Wo

rks

tatio

ns

Symnew

© 2012 IBM Corporation

StartAppServer

Checks that WAS is running and starts it if not.

In case of failure:

Rerun the job

© 2012 IBM Corporation

MakePlan

Replans or Extends Pre-Production plan if needed.

Produces the Symnew file.

Generate Pre-Production reports in the joblog.

In case of failure:

■ Global lock may be left set, use planman unlock to reset it.

■ Rerun the job to recover– Pre-Production plan is automatically re-verified and updated.– Symnew is recreated.

© 2012 IBM Corporation

MakePlan

How to stop it:

■ Stopping the job may not stop the processing still running inside WAS or on DB.

■ Force the DB statement closure if a DB statement is running too long and cause Makeplan to abend.

■ Restart WAS is required if processing is still running in WAS and Makeplan does not terminate.

Best Practice:

Check if the database statistics is enabled. If not, it is strongly suggested to schedule the runstatistics script stored in the dbtools TWS directory.

© 2012 IBM Corporation

MakePlan – Error messages

If MakePlan stdlist shows the following messages:

AWSBEH023E Unable to establish communication with the server on host "127.0.0.1" using port "31116".

This error means that the application server (eWAS) is down and MakePlan is not able to continue. In this case, the suggestion is to start the eWAS and check the eWAS logs in order to identify the reason of the eWAS stop.

AWSBEH021E The user "twsuser" is not authorized to access the server on host "127.0.0.1" using port "31116".

This is an authorization error . The suggestion to address this error is to check the twsuser credentials in the useropts file.

AWSJPL018E The database is already locked.

This means that a previous operation of MakePlan is stopped and the global lock is not reset. To recover the situation runs “planman unlock”.

© 2012 IBM Corporation

MakePlan – Error messages

If MakePlan stdlist shows the following messages:

AWSJPL006E An internal error has occurred. A database object "xxxx” cannot be loaded from the database.

In general “xxxx” is an object like workstation, job, job streams. This error means that a connection with the database is broken. In this case check in the SystemOut.log and the ffdc directory the error because additional information related to the database issue is logged.

AWSJPL017E The production plan cannot be created because a previous action on the production plan did not complete successfully. See the message help for more details.

This error means that a previous operation on the preproduction plan is preformed but finished with an error. In general it is present when “ResetPlan -scratch” is performed but not successfully finished.

AWSJPL704E An internal error has occurred. The planner is unable to extend the preproduction plan

This error means that MakePlan is not able to extend the preproduction plan. Different root causes are associated at this issue, in general always related to the database, like no space for the tablespace , full transaction logs. The suggestion is to check more information in the SystemOut.log or in the ffdc directory.

© 2012 IBM Corporation

SwitchPlan

Stops all the CPUs Runs stageman

– To merge old Symphony file with SymNew

– To archive the old Symohony file in schedlog directory Runs planman confirm to update in DB plan status information (e.g. plan end date and

current run number) Restart the master to distribute the Symphony file and restart scheduling.

In case of failure:

1) Planman confirm has not been run yet (check logs and “planman showinfo”) Rerun SwitchPlan

2) Planman confirm has failed Manually run “planman confirm” and “conman start”

3) Planman confirm has been already run (e.g. plan end date has been updated) Run “conman start”

How to stop it:

If conman stop is hanging, just kill conman command. This may impact plan distribution that will need to stop the agents left running before distributing the new Symphony.

© 2012 IBM Corporation

SwitchPlan – Error messages

If SwitchPlan stdlist shows the following messages:

■ STAGEMAN:AWSBHV082E The previous Symphony file and the Symnew file have the same run number. They cannot be merged to form the new Symphony file."

There are several possible causes for the Symphony and Symnew run numbers to be the same:

1. MAKEPLAN did not extend the run number in the Symnew file.

2. SWITCHPLAN was executed before MAKEPLAN

3. The stageman process has been run twice on the same Symnew file without resetting the plan or deleting the Symphony file.

■ AWSJCL054E The command "CONFIRM" has failed.

■ AWSJPL016E An internal error has occurred. A global option "confirm run number" cannot be set

In general, these error messages are present when the last step of the SwitchPlan that is “planman confirm” fails. The suggestion is to analyze the SystemOut.log to check more information and to rerun “planman confirm”.

© 2012 IBM Corporation

UpdateStats

Runs logman to update job statistics and history

Extends the Pre-production plan if its length is shorter then minLen

In case of failure:

■ Rerun the job or manually run “logman <file>” on the latest schedlog file.

■ If not run, the statistics and history will be partial. Pre-Production plan is updated anyway at the beginning of Makeplan.

How to stop it:

■ Kill the job or logman process, the statistics and history will be partial until the job or logman is rerun.

© 2012 IBM Corporation

CreatePostReports

Generate Post-Production reports in the job output

In case of failure:

■ Rerun the job if reports are needed

© 2012 IBM Corporation

Recovery Plan Procedure Symphony Corruption

Follow these steps on the master domain manager:

Set the job limit to 0, using conman or the Tivoli Dynamic Workload Console. This prevents all jobs from starting.

logman –prod Updates the Pre-Production Plan.

planman showinfo Retrieves the start time of the first non-completed job stream instance and the end time of the production plan.

ResetPlan Archives the current Symphony file.

JnextPlan -from –to Creates a new Symphony file for the period in which there are still outstanding jobs. Only incomplete job stream instances are included in the new Symphony file.

Set the job limit to the previous value. The Symphony file is distributed and the production cycle starts again.