Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

TWS Troubleshooting

News Recommended Links TWS integration with TEC TWS Processes Starting and stopping TWS process
Starting and stopping Websphere Conman commands Managing the plan Report commands Utility commands
Listing FTA Typical cases   IBM humor Etc

To contact IBM support you need to have

  1. Tivoli Workload Scheduler installation path. Also called Home of TWS installation (this is a home directory of twsuser). For example:

    # grep tws /etc/passwd
    twsuser:!:4202:4202:Tivoli Workload Admin:/opt/tivoli/TWS:/usr/bin/ksh

  2. Version of OS and patch level.  Command depends on OS. For most Unixes uname -a provides version of the OS. Patch level is more complex thing and it is OS dependent:
  3. Version of TWS and patch level. For example 8.2 FP06 , The relevant information is in the subdirectory version of the TWS directory (assuming the TWS home is /opt/tivoli/TWS/):
    cd /opt/tivoli/TWS/version
    ll
    -rw-r--r--    1 root     system           21 Jan 22 2006  patch.info
    -rwxr-xr-x    1 twsuser  tivoli        10745 Nov 17 2004  version*
    -rwxr-xr-x 1 twsuser tivoli   97614 Nov 17 2004 version.info*
    Then you can run script version to get the data
    cd /opt/tivoli/TWS/version
    /opt/tivoli/TWS/version # ./version
    MAESTRO 8.2 UNIX aix4-r1
    PATCH 8.2.0-TIV-TWS-FP0006 November 2004voli/TWS/version 		
  4. Patch level of components TEC and other relevant components. For example current TEC patch level should be FP8. Run wlookup to get the data.
  5. User under which TWS is running (usually twsuser)

Network communications

In a Tivoli Workload Scheduler network, agents communicate with their domain managers, and domain managers communicate with their parent domain managers. There are basically two types of communications that take place:

Before the start of each new day, the master domain manager creates a production control file called Symphony. Then, Tivoli Workload Scheduler is restarted in the network, and the master domain manager sends a copy of the new Symphony file to each of its automatically-linked agents and subordinate domain managers. The domain managers, in turn, send copies to their automatically-linked agents and subordinate domain managers. Agents and domain managers that are not set up to link automatically are initialized with a copy of Symphony as soon as a link operation is run in Tivoli Workload Scheduler.

Once the network is started, scheduling messages, like job starts and completions, are passed from the agents to their domain managers, through parent domain managers to the master domain manager. The master domain manager then broadcasts the messages throughout the hierarchical tree to update the Symphony files of all domain managers and all fault-tolerant agents running in full status mode.

Network operation

The batchman process on each domain manager and fault-tolerant agent workstation operates autonomously, scanning its Symphony files to resolve dependencies and launch jobs. Batchman launches jobs via the jobman process. On a standard agent, the jobman process responds to launch requests from the domain manager's batchman.

The master domain manager is continuously informed of job launches and completions and is responsible for broadcasting the information to domain managers and fault-tolerant agents so they can resolve any inter-workstation dependencies.

The degree of synchronization among the Symphony files depends on the setting of Full Status and Resolve Dependencies modes in a workstation's definition. Assuming that these modes are turned on, a fault-tolerant agent's Symphony file contains the same information as the master domain manager's (see the section that explains how to manage workstations in the database in the IBM(R) Tivoli(R) Workload Scheduler: Job Scheduling Console User's Guide).

Figure 2. Symphony file synchronization

9.8.1 FTAs not linking to the Master Domain Manager

  If netman is not running on the FTA:
  If netman has not been started, start it from the command line with the StartUp command. Note that this will start only netman, not any other IBM Tivoli Workload Scheduler processes.
  If netman started as root and not as a TWSuser, bring IBM Tivoli Workload Scheduler down normally, and then start up as a Tivoli Workload Scheduler user through the conman command line on the Master or FTA:
  On UNIX:

$ conman unlink

$ conman "shut;wait"

$ ./StartUp

  If netman could not create a standard list directory:
  If the file system is full, open some space in the file system.
  If a file with the same name as the directory already exists, delete the file with the same name as the directory. The directory name would be in a yyyy.mm.dd format.
  If the directory or netman standard list is owned by root and not IBM Tivoli Workload Scheduler, change the ownership of the directory standard list from the command line in UNIX with the chown TWSuser yyyy.mm.dd command. Note that this must be done as the root user.
  If the host file or DNS changes, it means that:
  The host file on the FTA or Master has been changed.
  The DNS entry for the FTA or Master has been changed.
  The host name on the FTA or Master has been changed.
  If the communication processes are hung or the mailman process is down or hung on FTA:
  IBM Tivoli Workload Scheduler was not brought down properly.

Try to always bring IBM Tivoli Workload Scheduler down properly via the conman command line on the Master or FTA as follows:

On UNIX:

$ conman unlink

$ conman "shut;wait"

  If the mailman read corrupted data, try to bring IBM Tivoli Workload Scheduler down normally. If this is not successful, kill the mailman process as follows:

On UNIX:

Run ps -u TWSuser to find the process ID.

root@nti5001:/opt/tivoli/TWS/TWS/bin # ps -u twsuser
      UID    PID    TTY  TIME CMD
     4202 311530      -  0:02 netman
     4202 315512      -  0:04 batchman
     4202 348314      -  0:10 appservman
     4202 413802      -  0:01 mailman
     4202 438290      -  0:50 monman

Run kill process id or if that fails, kill -9 process id to kill the mailman process.

On Windows (commands in the TWShome\Unsupported directory):

Run listproc to find the process ID.

Run killproc process id to kill the mailman process.

  If batchman is hung:

Try to bring IBM Tivoli Workload Scheduler down normally. If not successful, kill the mailman process as explained in the previous bullet.

  If the writer process for FTA is down or hung on the Master, it means that:
  FTA was not properly unlinked from the Master.
  The writer read corrupted data.
  Multiple writers are running for the same FTA.

Use ps -ef | grep maestro to check that the writer processes are running. If there is more than one process for the same FTA, perform the following steps:

i.  Shut down IBM Tivoli Workload Scheduler normally.
ii.  Check the processes for multiple writers again.
iii.  If there are multiple writers, kill them.
  If the netman process is hung:
  If multiple netman processes are running, try shutting down netman properly first. If this is not successful, kill netman using the following commands:

On UNIX:

Use ps -ef | grep maestro to find the running processes.

Issue kill process id or if that fails, kill -9 process id to kill the mailman process.

On Windows (commands in the TWShome\Unsupported directory):

Use listproc to find the process ID.

Run killproc process id to kill the mailman process.

 Hung port/socket; FIN_WAIT2 on netman port.

Use netstat -a | grep netman port for both UNIX and Windows systems to check if netman is listening.

Look for FIN_WAIT2 for the IBM Tivoli Workload Scheduler port.

If FIN_WAIT2 does not time out (approximately 10 minutes), reboot.

Network problems to look for outside of IBM Tivoli Workload Scheduler include:

The router is down in a WAN environment.

The switch or network hub is down on an FTA segment.

There has been a power outage.

There are physical defects in the network card/wiring.

9.8.2 Batchman not up or will not stay up (batchman down)

If the message file has reached 10,000,000 bytes (approximately 9.5 MB):

Check the size of the message files (files whose names end with .msg) in the IBM Tivoli Workload Scheduler home directory ad pobox subdirectory. 48 bytes is the minimum size of these files.

Use the evtsize command to expand temporarily, and then try to start IBM Tivoli Workload Scheduler:

evtsize <filename> <new size in bytes>

For example:

evtsize Mailbox.msg 20000000

If necessary, remove the message file (only after failing with the EVTSIZE and start).

Important: Message files contain important messages being sent between IBM Tivoli Workload Scheduler processes and between IBM Tivoli Workload Scheduler agents. Remove a message file only as a last resort; all data in the message file will be lost. Also never remove message files while any IBM Tivoli Workload Scheduler processes are running.

Jobman not owned by root.

If jobman (in the bin subdirectory of the IBM Tivoli Workload Scheduler home directory) is not owned by root, correct this problem by logging in as root and running the command chown root jobman.

Tip: When checking whether root owns the jobman, also check that the setuid bit is present and the file system containing TWShome is not mounted with the nosetuid option.

Read bad record in Symphony file.

Initialization process interrupted or failed.

Cannot create Jobtable.

Corrupt data in Jobtable.

Message file corruption.

This can be for the following reasons:

Bad data

File system full

Power outage

CPU hardware crash

9.8.3 Jobs not running

Add the IBM Tivoli Workload Scheduler user for FTA in the IBM Tivoli Workload Scheduler user database.

  Password for Windows user has been changed.

Do one of the following:

Change the password in the IBM Tivoli Workload Scheduler user database to a new password.

Note that changes to the IBM Tivoli Workload Scheduler user database will not take effect until Jnextday. process.

Jobs not running on Windows or UNIX.

Batchman down.

Batchman not up or will not stay up (batchman down).

Limit set to 0.

To change the limit to 10 via the conman command line:

For a single FTA:

lc <FTA name>;10

For all FTAs:

lc @;10;noask

FENCE set above the limit.

To change FENCE to 10 via the conman command line:

For all FTAs:

f @;10;noask

If dependencies are not met, it could be for the following reasons:

Start time not reached yet, or UNTIL time has passed.

OPENS file not present yet.

Job FOLLOW not complete.

     Related Topics        Popup    

 
 
  Popup  
 
 
 
 
     
  See Also  
     
 
  Popup  
 
 

9.8.4 Jnextday is hung or still in EXEC state

  Stageman cannot get exclusive access to Symphony.
  Batchman and/or mailman was not stopped before running Jnextday from the command line.
  Jnextday not able stop all FTAs.
  Network segment down and cannot reach all FTAs.
  One or more of the FTAs has crashed.
  Netman not running on all FTAs.
  Jnextday not able to start or initialize all FTAs.
  The Master or FTA was manually started before Jnextday completed stageman.

Reissue a link from the Master to the FTA.

  The Master was not able to start batchman after stageman completed.

See Batchman not up or will not stay up (batchman down).

  The Master was not able to link to FTA.
FTAs not linking to the Master Domain Manager.

This may be due to bad or missing data in the schedule or job. You can perform the following actions:

Check for missing calendars.   Check for missing resources.

This may be due to bad or missing data in the CARRYFORWARD schedule. You can perform the following actions:

Run show jobs or show schedules to find the bad schedule.

9.8.5 Jnextday in ABEND state

Jnextday not completing compiler processes.   Check for missing parameters. Jnextday not completing stageman process.
Add missing data and rerun Jnextday.

Cancel the schedule and rerun Jnextday.

Autotrace feature

This is a built-in flight-recorder-style trace mechanism that logs all activities performed by the IBM Tivoli Workload Scheduler processes. In case of product failure or unexpected behavior, this feature can be extremely effective in finding the cause of the problem and in providing a quick solution.

In case of problems, you are asked to create a trace snap file by issuing some simple commands. The trace snap file is then inspected by the Tivoli support team, which uses the logged information as an efficient problem determination base. The Autotrace feature, already available with Version 8.1, has now been extended to run on additional platforms.

This feature is now available with Tivoli Workload Scheduler on the following platforms:

      AIX
      HP-UX
      Solaris Operating Environment
      Microsoft Windows NT and 2000
      Linux

The tracing system is completely transparent and does not have any impact on file system performances, because it is fully handled in memory. It is automatically started by StartUp, so no further action is required.

Autotrace uses a dynamic link library named libatrc. Normally, this library is located in /usr/Tivoli/TWS/TKG/3.1.5/lib (UNIX) and not in the installation path, as would be expected.

Tip: Each time Autotrace could be an option for problem determination, snap files should be taken as soon as possible. If you need to gather a lot of information quickly, use Metronome, which we cover next.

Metronome script

Metronome is a PERL script that takes a snapshot of Tivoli Workload Scheduler instantly and generates an HTML report. It is a useful tool for the Tivoli Workload Scheduler user for describing a problem to Customer Support. For best results, the tool should be run as soon as the problem is discovered. Metronome copies all the Tivoli Workload Scheduler configuration files in a directory named:

TWShome /snapshots/snap_date _time.

The expected action flow is the following:

1. The user runs Metronome after discovering the problem.
2. The user opens a problem to Customer Support and includes the HTML report found in TWShome /snapshots/snap_date _time /report.html.
3. Customer Support reads the report and either resolves the problem or asks for the package of files produced by the script.
4. Customer Support receives the package of files and resolves the problem.

The Metronome files are copied in the TWShome/bin directory when Tivoli Workload Scheduler is installed.

Format:

perl path/maestro/bin/Metronome.pl [MAESTROHOME=TWS_dir] [-html] [-pack] [-prall]

Where:

  MAESTROHOME is the directory where the script is located if it is different from the installation directory of Tivoli Workload Scheduler.
  -html generates an HTML report.
  -pack creates a package of configuration files.
  -prall prints all variables.

Example 9-19 shows how to run the command from the Tivoli Workload Scheduler home directory.

 

Note: To run the command successfully, you need to follow the instructions for installing Perl in 3.10, Installing Perl5 on Windows, on both UNIX and Windows.

$ metronome.pl

Here is how to run the command from a directory that is not the Tivoli Workload Scheduler home directory.

$ metronome.pl MAESTROHOME=E:\TWS82_driver1\win32app\maestro
MAESTROHOME=E:/TWS82_driver1/win32app/maestro

Old News ;-)

[Sep 12, 2009] Information needed for troubleshooting TWS to TEC connection problems (event are not flowing)

a) *.fmt file for TWS

# find . -name "*.fmt"
./tivoli/lcf/dat/1/PLUS/maestro.fmt

b) A 'wtdumprl' output from TEC (to see if events are flowing and when they stopped plowing)

c) A 'wlsinst -Pp' output from TEC

d) Patch level of TWS and TEC

Recommended Links

The following are some useful online help sites for IBM Tivoli Workload Scheduler:

  Tivoli support
http://www-3.ibm.com/software/sysmgmt/products/support/
  Product documentation
http://publib.boulder.ibm.com/tividd/td/tdmktlist.html
  Tivoli user groups
http://www-3.ibm.com/software/sysmgmt/products/support/Tivoli_User_Groups.h
tml

IBM Tivoli Workload Scheduler Fix Packs

ftp://ftp.software.ibm.com/software/tivoli_support/patches/
http://www3.software.ibm.com/ibmdl/pub/software/tivoli_support/patches/
Public IBM Tivoli Workloads Scheduler mailing list
http://groups.yahoo.com/group/maestro-l



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019